I was inspired by "What the heck are embeddings?" by David Shapiro

Поділитися
Вставка
  • Опубліковано 7 лис 2024

КОМЕНТАРІ • 4

  • @Inglewood123
    @Inglewood123 6 місяців тому +1

    damn this channel is way much better than my prof

  • @marr750
    @marr750 Рік тому +1

    The 1,536 dimensions don't mean anything. They are just one potential abstract representation of the word, sentence, or document "embedded" into the "latent space" of the model. They just are what they are. There will certainly be trends in the 1536 dimensions, but they will still be coupled. You can certainly train an embedding model to have real concepts represented by the output, but it's a different process, and it's not the typical approach (because there will be trade-offs in performance and costs).
    Google search MTEB Leaderboards and the top result will be the Massive Text Embedding Benchmark Leaderboard on huggingface. These are the state-of-the-art open and commonly used proprietary embeddings on the market today. They have different dimensional outputs, sizes, and performances on different tasks. Generally, there's no rhyme or reason to any specific dimension of their output, and the 1,536 dimensional output of text-embedding-ada-02 is relatively large.

  • @mostlazydisciplinedperson
    @mostlazydisciplinedperson Рік тому +1

    I came here for David Shapiro

    • @cmd_labs
      @cmd_labs  Рік тому

      Props to David Shapiro
      Pretty much ripped off his video
      Attempted to stretch beyond his concept in this video tho...
      ua-cam.com/video/EaNNRVY_pgU/v-deo.html