Don’t Embed Wrong!

Поділитися
Вставка
  • Опубліковано 3 лис 2024

КОМЕНТАРІ • 60

  • @madytyoo
    @madytyoo 3 дні тому +15

    This is the best channel about Ollama!

  • @rundeks
    @rundeks 3 дні тому +1

    I never heard of this before. Thank you so much for sharing it!

  • @conneyk
    @conneyk День тому

    Thanks for the video!
    I‘m working on my own RAGs for some time now. Maybe prefixing would help.
    What I learned so far is, that RAG is very individual for each use case. Like if you are dealing with code Docs or with large Texts or with multi Line PDFs. Also if your docs aren’t in english embedding models like nomic or other open source are really weak. You first have to translate the docs before embedding them. Than we even haven’t talked about reranking queries, corrective rag to enhance your query with web search results or other docs, hybrid query search based on metadata and docs content, and so on. Also the vector store you using is playing a difference.
    All this is making it very complex to implement all the combinations and benchmark and test them.
    I really would love to find some RAG KISS Principles and best practices

  • @raymond_luxury_yacht
    @raymond_luxury_yacht 12 годин тому

    When I created embeddings before I sent to the embedder I got llm to analyse the text and add 10 questions about it added it to the chunk and sent it. Search accuracy was very good

  • @ahasani2008
    @ahasani2008 3 дні тому

    can't help to notice your Batik shirt, nice one. And the content is excellent as always Matt, thanks

  • @proterotype
    @proterotype 3 дні тому

    So awesome man. I really appreciate this kind of information

  • @deucebigs9860
    @deucebigs9860 2 дні тому

    Liking and subscribed to tell you you're definitely on the right path of what I want to learn!

  • @wnicora
    @wnicora 11 годин тому

    This video opens new perspectives on Rag, tx
    Could you share links to articles explaining the design and use of prefixes?

  • @serikazero128
    @serikazero128 3 дні тому

    @10:49 could've been the perfect time for "Stop, Get some help!" meme :)

  • @karlfranz2pl
    @karlfranz2pl 3 дні тому +1

    I havent used any embed models but a while ago I tried to give PDF to llama 3.1 7b and results were between nothing to horrible. Then I tied the same document with llama 3.1 70b and results were actually pretty good. I could not test it really in depth because my PC runs 70b model at almost negative speed :) (please keep in mind I actually don't know what I am doing with thees LLMs :) )

  • @sebastianpodesta
    @sebastianpodesta 3 дні тому +1

    Thanks a lot!!! Great stuff! Quick question, what would you recomend for multilingual data, what happens if the rag data and the user prompts are in Spanish, should I do all system prompts and instructions in Spanish? Or tell it to translate the just answer?

    • @sebastianpodesta
      @sebastianpodesta 3 дні тому

      I’m trying to do RAG on n8n using ollama with llama 3.1 chat model and nomic embedding model with mixed results, I get answers some times in English, others in Spanish and some times the model tells me that it didn’t understand the question

  • @agsvk-com
    @agsvk-com 2 дні тому

    Thank you for sharing. I'm just wondering how would we be able to select one of the prefixed nomic or prefixed snowflake arctic using one of the vector databases. Is this possible or do we need to do this via typescript or python? All the videos I see doesn't seem to have embeddings using any prefixed models? I'm still learning. It would be really great to have more step by step tutorials on this. 😊 God bless

  • @basterman13
    @basterman13 3 дні тому +2

    Thank you for the video, I learned a lot! Could you please advise on best RAG implementation and document splitter for Python? I’ve tried several methods, but I often get mixed results, around 50/50 accuracy. The main issue is with chunking: sometimes, chunks split in a way that separates the beginning of a class or method from its continuation. Is there a way to ensure that chunks belonging to the same file can be grouped or kept together more effectively?
    Thank you in advance.

    • @technovangelist
      @technovangelist  3 дні тому +1

      That’s what the metadata in most vector databases is for. Describe the source. Then use that in your code to keep similar things together.

    • @basterman13
      @basterman13 3 дні тому

      @@technovangelist Thank you. Yesterday, after I left my comment, I came to the same conclusion. I just need to get distracted sometimes. The answer was always on the surface. I thought maybe there were some more specific approaches, but in this case, the simplest way is the best.
      Have you heard anything about LightRAG(HKUDS developer)? I'd be interested to hear your thoughts on it.

  • @paulomtts
    @paulomtts 3 дні тому

    Right on time, I'm just implementing a RAG pipeline!

  • @fabriai
    @fabriai 3 дні тому +1

    Excellent stuff, Matt. Thanks for this! Why do you prefer typescript for coding the test over python? Do you run it in node? Have you tried dejó for these tasks?

    • @technovangelist
      @technovangelist  2 дні тому +1

      It doesn’t have all the installation baggage that comes with Python. Python is so brittle and easy to screw up your setup. I use deno to run it usually. I don’t know what dejo is.

  • @smhanov
    @smhanov 2 дні тому

    I have 200000 images of things described by llava. But if the user is searching for a single word, like "pants" then the search is too broad. It comes up with people wearing pants, shoes, etc. I'm hoping this prefix method helps a little.

  • @FrankenLab
    @FrankenLab 3 дні тому

    The wave of the future doesn't include MORE work to get models to digest our content, it involves models that perform better on their own without coaxing them to give us a marginal improvement in the results. Also, only having 2 models with prefixing doesn't give many options. Great content though, appreciate the effort it takes to research, edit, and produce videos!

    • @technovangelist
      @technovangelist  3 дні тому +2

      Eventually maybe, but not for a long while. It’s still early days for this tech. There are more than 2. 3 were in this video and there are others that can be imported. And 2x in some cases is hardly marginal

    • @rv7591
      @rv7591 3 дні тому

      Well yeah but the future is discovered through experiments.

    • @technovangelist
      @technovangelist  3 дні тому

      Yeah but wishing for things doesn’t make them happen

  • @AlekseyRubtsov
    @AlekseyRubtsov 3 дні тому

    Thanks!

  • @muchainganga9563
    @muchainganga9563 3 дні тому +1

    Love this!

  • @hitmusicworldwide
    @hitmusicworldwide 3 дні тому

    I see the Thanka on your wall on the viewer left hand side.

    • @technovangelist
      @technovangelist  3 дні тому

      Good eye. From one of my two visits to Nepal. My sister used to run a health care clinic in a town called Jiri for about 20 years.

  • @davidtapang6917
    @davidtapang6917 3 дні тому

    Hey bro! Subscribed!

  • @YuryGurevich
    @YuryGurevich 3 дні тому

    Thanks, Matt!

  • @ToddWBucy-lf8yz
    @ToddWBucy-lf8yz 3 дні тому

    Great I'm refactoring for prefixes now, I'm sure now I need to update training data as well for prefixes Any pre trained models already capable.of using prefixes?

    • @technovangelist
      @technovangelist  3 дні тому +1

      Perhaps you should watch the video. It shows 3 models that use the prefixes.

    • @ToddWBucy-lf8yz
      @ToddWBucy-lf8yz 3 дні тому

      ​​@@technovangelistnomic isn't useful when your trying to integrate cypher queries and vector store queries in the same model. I'm try to avoid multiple models for my particular RAG setup.

    • @technovangelist
      @technovangelist  3 дні тому

      Avoiding multiple models is asking for lower quality results

    • @ToddWBucy-lf8yz
      @ToddWBucy-lf8yz 3 дні тому

      @@technovangelist yeah you are probably right...at least nomic is small and fast. Someone really needs to create a MoE just for RAG and databases.

    • @technovangelist
      @technovangelist  3 дні тому +1

      embedding models arent something you ask questions to. its just for the embedding to stick into the vector db and find similar results. you still have to use a regular model to get insights into your data.

  • @k1chko
    @k1chko 2 дні тому

    Seems similar to contextual embedding.

    • @technovangelist
      @technovangelist  2 дні тому

      Different topics. This was about how to get the embedding model to function correctly.

  • @remmask
    @remmask 3 дні тому

    Hi Matt. Thank you for these videos. Can we get the source in python?

  • @tomwawer5714
    @tomwawer5714 3 дні тому

    Prefix yay

  • @ISK_VAGR
    @ISK_VAGR 3 дні тому

    Ok, Matt. All what u just said i knew. However, the question of the million dollars is why bigger models perform bad in embedding?

    • @technovangelist
      @technovangelist  3 дні тому +1

      They aren’t embedding models. Embedding models do embeddings. Regular LLMs don’t do it.

    • @jparkerweb
      @jparkerweb 2 дні тому

      @@technovangelist in other words, just because something "can" do it doesn't mean it "should" 🤣

    • @technovangelist
      @technovangelist  2 дні тому +1

      But I don’t think that language is strong enough. An embedding model might take 30 seconds when an llm can take 45 minutes and is 10% as effective. It’s bad enough when folks insist on using a 70b model for an answer that is maybe 10% better than an 8b model and wait 3 minutes instead of 30 seconds. That’s not worth it in most cases but there is a debatable benefit. Embedding with an llm make zero sense.

    • @jparkerweb
      @jparkerweb 2 дні тому

      @@technovangelist oh, I 100% agree! Choose the right tool for the right job

    • @ISK_VAGR
      @ISK_VAGR 2 дні тому

      @@technovangelist Matt u may have misunderstood my question. I was interested in why mathematically, a good LLM is not a good embedder. When I started to use RAG I believed that perhaps embedding models were LLMs delivering the output of hidden layers as embeddings. I still wonder why if LLMs can find patterns are not good in providing embeddings for RAG. Cheers..

  • @ShaunyTravels.
    @ShaunyTravels. 2 дні тому

    Wish there was more videos about running ollama on a mobile app I made a chat app using ollama running on a server on my phone with flutter dart but we need more videos to do that 😂

  • @grahaml6072
    @grahaml6072 3 дні тому +1

    I had to stop watching unfortunately with that fuzzy text flashing across the screen. Maybe I will just try and read the transcript

    • @technovangelist
      @technovangelist  3 дні тому

      I don’t have any fuzzy text on this one. If it’s fuzzy don’t watch at all low rez

    • @grahaml6072
      @grahaml6072 3 дні тому

      @@technovangelistI am not watching at low resolution. I watched on a 65” OLED. An iPad 12.9” a Samsung 49” widescreen and a 4K UST projector on 120” screen just to check it wasn’t me. It starts at 6:10 when you scroll through your outputs.

    • @technovangelist
      @technovangelist  3 дні тому +1

      oh, you were making a joke...got it....you aren't supposed to read that, which is why i said I am speeding forward.