Sonnet 3.5 vs Llama 3.1 405B vs GPT-4omni: My TEST

Поділитися
Вставка
  • Опубліковано 22 жов 2024

КОМЕНТАРІ • 5

  • @jondo7680
    @jondo7680 2 місяці тому +2

    Always great to see quality questions asked and rated on the arena.

  • @autingo6583
    @autingo6583 2 місяці тому +1

    cognitive errors in all kinds of substrates (be it carbon or silicon) are always interesting. i wonder, if giving them maps (real ones? ones with fictional geography? as png/jpg) as ground truth for reasoning with the prompt, will they also exhibit the same errors?

    • @code4AI
      @code4AI  2 місяці тому +1

      If we assume, that those models had access to the complete internet, the interesting Q was: Why this mistakes? One possible hint is, that the may refer to individual time-zone indicators, with a lot of different contradictory training data (imagine, which nations switch to summer daylight saving time) based on human time references, and not understanding that the causal event of the sun rises above the horizon is an astronomical event, that has nothing to do with human conventions regarding arbitrary time-zones.
      Therefore the LLMs and VLMs apply the wrong understanding to solve the problem. Now how often will this happen to problems, that are not so simple like my sun rise example, to clearly show the underlying "non-optimal" understanding by an AI?
      And this extraneous solution happened to Claude 3.5 Sonnet, LLama 3.1 405B and GPT-4omini, which serve as benchmark and trainer to smaller models.

  • @achille5509
    @achille5509 2 місяці тому

    It is not clear in the video which model you are testing, could you put the name of the models on the screen if you do this again?