How strong is Claude 2?

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ •

  • @henryholloway5656
    @henryholloway5656 Рік тому +9

    I'm digging your videos man, you're doing really great work

  • @irismaxj
    @irismaxj Рік тому +2

    Thanks for the review and commentary

  • @southcoastinventors6583
    @southcoastinventors6583 Рік тому +1

    That ending was priceless about the adobe measuring tool its like the guy who picked a car lock with a tennis ball

  • @tomenglish9340
    @tomenglish9340 Рік тому +5

    I've had only one session with Claude 2. But I've already found deficiencies that I never see in the chat agent based on GPT-4. Claude 2 contradicts itself within single responses. Furthermore, when I tried to nudge it in the right direction, it instead generated worse responses.
    I'll mention also that I've come to doubt that multiple-choice questions are appropriate in testing AIs. For humans, performance on multiple-choice tests is predictive of performance on more open-ended tasks. We don't have a good basis for believing the same about agents based on large language models.

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      Thanks for sharing your perspective.
      I agree with your point that it's unclear the degree to which we can consider exams (particularly those designed with the constraints of human memory in mind) as predictive of broader performance for LLMs.

  • @ram49967
    @ram49967 Рік тому +3

    Really appreciate your excellent selection of excerpts from technical papers that I never even knew existed. Your interaction with the AI is also very instructive. I believe that it is now possible to upload up to 5 pdf files at once, as well as documents in several other formats. I thought that the 100k token limit was for the combination of input and response, but it may be only for input, whereas the token limit for response is 4k.

  • @AIWRLDOFFICIAL
    @AIWRLDOFFICIAL Рік тому +2

    Thanks

  • @foreignconta
    @foreignconta Рік тому +1

    Your videos are awesome. Can't wait for more.

  • @JL-zl6ot
    @JL-zl6ot Рік тому +1

    really enjoy your walkthrough!

  • @juliangawronsky9339
    @juliangawronsky9339 Рік тому +3

    8:22 Thanks for this insight

  • @TheDRAGONFLITE
    @TheDRAGONFLITE Рік тому +2

    Keep going!

  • @alpaca_llama
    @alpaca_llama Рік тому +1

    awesome

  • @zakgoldwasser7224
    @zakgoldwasser7224 Рік тому +2

    Keep it up man!

  • @TheManinBlack9054
    @TheManinBlack9054 Рік тому +1

    Love your humor lol! gj!

  • @reinerheiner1148
    @reinerheiner1148 Рік тому +1

    claude 2 is an awesome model, but it has serious problems with hallucinations. That could probably be fixed by providing it with data access via web access, and for the data files, they need to do some finetuning to make sure claude does not invent data that is not found in the files provided. Providing references in claude 2's respones would probably help. If I were anthropic, I would have realized that this is the most pressing concern for claude at the moment. On a positive note, I feel like for an estimates 174b parameters, claude 2 comes very close to gpt4 - which is said to be a bunch of experts, making it even more impressive.

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      Thanks for sharing your experience with the model!

  • @quantumjun
    @quantumjun Рік тому +3

    Hope we never reach the end

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому +1

      I'm not entirely sure I understand your meaning. But I hope this too.

    • @quantumjun
      @quantumjun Рік тому

      @@SamuelAlbanie1 12:46

  • @cholst1
    @cholst1 Рік тому +1

    For me so far. It's very unwilling to return full code in its answers (longer code), quite annoying. And wont even try refactoring larger codebases. Just returns summaries and psuedocode.

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      Thanks for sharing your experience. I've mainly used it for summarisation so far - less so for coding. It's interesting to hear that Claude 2 may be less well-suited for that use-case.

    • @cholst1
      @cholst1 Рік тому

      @@SamuelAlbanie1 For summarizing large texts its been great, but a bit of a battle code wise, also code quality is not as good in most cases as GPT4. At least in my experience. Though it may have some more interesting ideas some times as to what to do (ie, reviewing/creating a ticket for given code).
      I have yet to try Sourcegraph though, it looks like it may be a good competitor to Copilot Chat (which is pretty useless running on 3.5).

  • @denisblack9897
    @denisblack9897 Рік тому

    as much as i love my country and its people - ai unavailability makes me want to leave russia
    seems like it was planned to force russia out of ai race

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      Have you experimented with SberBank GigaChat? (I saw it announced, but haven't tried it myself.)

  • @-blackcat-4749
    @-blackcat-4749 Рік тому +1

    That was a expected condition. A 👌 standard day

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      I think that's a reasonable assessment at this time.

  • @-blackcat-4749
    @-blackcat-4749 Рік тому +1

    That was a habitual 📍 moment. A commonplace event

    • @SamuelAlbanie1
      @SamuelAlbanie1  Рік тому

      Thanks for the question. I don't think we can read too much into the homogeneity of the context from the figure - it's primarily aimed at demonstrating that the loss continues to trend downwards.
      That being said, intuitively it seems highly plausible that extending to a significantly larger context window may diminish the model's ability to pick up on details within the window (relative to an alternative that uses a a similar compute budget but a more compact window). I think it's an open question though.