The 1,536 dimensions don't mean anything. They are just one potential abstract representation of the word, sentence, or document "embedded" into the "latent space" of the model. They just are what they are. There will certainly be trends in the 1536 dimensions, but they will still be coupled. You can certainly train an embedding model to have real concepts represented by the output, but it's a different process, and it's not the typical approach (because there will be trade-offs in performance and costs). Google search MTEB Leaderboards and the top result will be the Massive Text Embedding Benchmark Leaderboard on huggingface. These are the state-of-the-art open and commonly used proprietary embeddings on the market today. They have different dimensional outputs, sizes, and performances on different tasks. Generally, there's no rhyme or reason to any specific dimension of their output, and the 1,536 dimensional output of text-embedding-ada-02 is relatively large.
Props to David Shapiro Pretty much ripped off his video Attempted to stretch beyond his concept in this video tho... ua-cam.com/video/EaNNRVY_pgU/v-deo.html
damn this channel is way much better than my prof
The 1,536 dimensions don't mean anything. They are just one potential abstract representation of the word, sentence, or document "embedded" into the "latent space" of the model. They just are what they are. There will certainly be trends in the 1536 dimensions, but they will still be coupled. You can certainly train an embedding model to have real concepts represented by the output, but it's a different process, and it's not the typical approach (because there will be trade-offs in performance and costs).
Google search MTEB Leaderboards and the top result will be the Massive Text Embedding Benchmark Leaderboard on huggingface. These are the state-of-the-art open and commonly used proprietary embeddings on the market today. They have different dimensional outputs, sizes, and performances on different tasks. Generally, there's no rhyme or reason to any specific dimension of their output, and the 1,536 dimensional output of text-embedding-ada-02 is relatively large.
I came here for David Shapiro
Props to David Shapiro
Pretty much ripped off his video
Attempted to stretch beyond his concept in this video tho...
ua-cam.com/video/EaNNRVY_pgU/v-deo.html