Ollama 0.1.26 Makes Embedding 100x Better

Поділитися
Вставка
  • Опубліковано 21 лют 2024
  • Embedding has always been part of Ollama, but before 0.1.26, it kinda sucked. Now it’s amazing, and could be the best tool for the job.
    Yes I know I flubbed the line about bun. It’s not an alt to js. It’s a whole new runner for js/ts. Makes typescript which is a better js even better than it was.
    Be sure to sign up to my monthly newsletter at technovangelist.com/newsletter
    And if interested in supporting me, sign up for my patreon at / technovangelist
  • Наука та технологія

КОМЕНТАРІ • 208

  • @Slimpickens45
    @Slimpickens45 3 місяці тому +52

    I am here for it. Lets goooo! And yes, videos on vector DBs would be amazing.

    • @dinoscheidt
      @dinoscheidt 3 місяці тому +1

      Postgres pgvector or Redis. Done. Vectors in DBs is incredible easy - beside all the very adversarial hype and marketing - what is hard is to iterate i.e. on the chunking size.

    • @Hypersniper05
      @Hypersniper05 3 місяці тому

      Easy just use plain Json to store the emeddings and text locally 😊 , granted not for scale but local projects it's fast enough

    • @jonyfrany1319
      @jonyfrany1319 3 місяці тому

      does anyone know where ollama rag codes exist?

    • @technovangelist
      @technovangelist  3 місяці тому +1

      ollama itself doesn't do anything with rag. rag would be part of the solution you build with ollama.

    • @MEDEBER-ENGINEERS
      @MEDEBER-ENGINEERS 3 місяці тому

      Definitely looking forward to vector DBs video.

  • @ChetanVashistth
    @ChetanVashistth 3 місяці тому +10

    You are a great teacher!! I want to see more videos of yours. Thanks for your service🙇

  • @guidoschmutz
    @guidoschmutz 3 місяці тому +2

    Thanks a lot for all your videos, this one really helped me a lot, just started with Ollama and local LLMs a week ago and was using llama2 for embeddings and it was painfully slow and I didn't event know that it can be faster until I watched this video yesterday evening. Just changed it to using "nomic-embed-text" and I love it :-) Thanks and keep up the good work! I also really like you humor !!!

  • @NLPprompter
    @NLPprompter 3 місяці тому +1

    thank you, I really appreciate your works and support. can't wait next video.

  • @joan_arc
    @joan_arc 3 місяці тому +6

    Hi Matt, thanks for making these videos. It is very informative and helpful.

  • @rccmhalfar
    @rccmhalfar 3 місяці тому

    Thanks for your superb videos, your content is so rich and well paced - would like to see more about model training using ollama and embedding

  • @disturb16
    @disturb16 3 місяці тому +36

    Could you share the source code of the examples you use in your videos?

    • @efficiencygeek
      @efficiencygeek 3 місяці тому

      Yes, please, specially the python script.

    • @potatodog7910
      @potatodog7910 3 місяці тому

      That would be helpful

    • @jrfcs18
      @jrfcs18 3 місяці тому

      Please share code you show in your example

  • @sun33t
    @sun33t 3 місяці тому

    Thanks for posting these videos mate. I’m finding them so helpful in orienting myself in the world of ai tooling 🎉

  • @archamondearchenwold8084
    @archamondearchenwold8084 3 місяці тому +9

    Your voice is amazing. I could listen to you present on anything man. Amazing video

  •  3 місяці тому

    Thank you Matt for making these videos!

  • @janduplessis1357
    @janduplessis1357 3 місяці тому +1

    Hi Matt, love your content - super stuff thank you, this is exactly what I was looking for and you explain it so well, I am working on a project of RAG search using open-source for a big Genomics project, providing specific information to users of the service, really detailed information about which test to request etc this video came just at the right time 👍

    • @technovangelist
      @technovangelist  3 місяці тому

      Great. Maybe I should suggest it to my sister who does that kind of thing.

  • @joeburkeson8946
    @joeburkeson8946 3 місяці тому

    Looking forward to when tools to embed documents into models become available, thanks for all you do.

  • @lucioussmoothy
    @lucioussmoothy 3 місяці тому

    Very informative and on point ..Keep up the good work Matt.

  • @brian2590
    @brian2590 3 місяці тому

    I jumped when i saw this. This is very exciting for me. Thank you!

  • @nicholasdudfield8610
    @nicholasdudfield8610 3 місяці тому

    Vids keep getting better - and thanks - I overlooked the embeddings due to gemma!

  • @HistoryIsAbsurd
    @HistoryIsAbsurd 3 місяці тому

    Definitely still learning on this topic here so thank you for the vid! Be interesting to dive into

  • @SyntharaPrime
    @SyntharaPrime 2 місяці тому +1

    Thank you for your great effort

  • @Turbozilla
    @Turbozilla 3 місяці тому +3

    I'm loving your videos! I really like that their to the point. Out of all the UA-camrs doing video in this AI, LLM space, I enjoy yours the most. Keep the coming! Tell your family this is more important! Lol 😮. I'm kidding. 😂

  • @JoshuaMcQueen
    @JoshuaMcQueen 3 місяці тому +2

    Really nice video Matt. We're thinking about doing a similar video testing top 5-10 vector DBs

  • @trsd8640
    @trsd8640 3 місяці тому

    Great video! Embeddings take Ollama to the next level! And I love that you dont lose a word about Gemma ;)

  • @JulianHarris
    @JulianHarris 3 місяці тому

    This is absolutely brilliant. Also, to answer your question, looking at vector databases, I think a useful distinction is whether they support Colbert-style embeddings because Colbert is clearly the way forward when you want high-quality embeddings.

  • @karanv293
    @karanv293 3 місяці тому +1

    This is such good content. Can you do a full video tutorial on a production case of a best rag strategy. There's so many out there .

  • @c0t1
    @c0t1 3 місяці тому

    I really loved this video! Great and super timely topic. Yes on a Vector DB comparison video.

  • @LordOfRuin
    @LordOfRuin 3 місяці тому +1

    Thank you! swapping my langchain embedding model with nomic-embed-text, really speed it up. This really is bigger news then gemma

  • @vikrantkhedkar6451
    @vikrantkhedkar6451 2 місяці тому

    Great video i was really trying find some open source embedding model❤❤

  • @martinisj
    @martinisj 3 місяці тому

    A video on vector databases would be great. As always, please do not forget to include a brief how-to, those well-thought snippets in your videos really do make a difference. Thanks!

  • @user-ne8kj2hx3j
    @user-ne8kj2hx3j 3 місяці тому

    Great video! would love to see the vector DB video as well

  • @marcosissler
    @marcosissler 3 місяці тому

    Thank you Matt! 🎉

  • @artur50
    @artur50 3 місяці тому

    Having a ball of laughter at the end . Cheers!

  •  3 місяці тому

    Thank you for the video. I was looking into calling embedding in golang since all embedding services were very slow.
    PS: I thought there was a surprise at the end since there was a silent part after you finished talking.

    • @technovangelist
      @technovangelist  3 місяці тому +2

      There is a crowd of fans that love that at the end.

  • @riftsassassin8954
    @riftsassassin8954 3 місяці тому

    I personally struggle to understand and use embeddings effectively. This video is highly appreciated! please do go on a deep dive on the differences on vector db providers. I'll definitely like and share if you do!

  • @hossainmahi3559
    @hossainmahi3559 3 місяці тому

    Thanks a lot for your great videos! Please make a video on "how to" and "which" of vector databases.

  • @miikalewandowski7765
    @miikalewandowski7765 3 місяці тому

    Haha 😂 I love the ending! Reminds me of Roy Anderssons brilliant movie „Songs from the second floor“. Also great content. Keep it up 👌

  • @JimLloyd1
    @JimLloyd1 3 місяці тому

    Hey Matt, I'm excited that ollama supports nomic-embed-text due to its large maximum sequence length of 8192 tokens. You mentioned "summaries and summaries of summaries". Summaries are really necessary when the max sequence length is 512 tokens, which is typical of most embedding models. I''m very curious to see if the 8K sequence length can significantly reduce the need for summarization. Thanks for your high quality videos.

  • @joxxen
    @joxxen 3 місяці тому

    You are great, your content is great. Thanks

  • @aisimp
    @aisimp 3 місяці тому

    Love the delivery. Got me laughing with “Hello World of RAG 😂” … totally agree 👍

  • @elanrider
    @elanrider 3 місяці тому +1

    All in for vector DBs!

  • @yourspanishstories
    @yourspanishstories 29 днів тому +1

    What prompt you used for the miniature of this video, man?
    "colorful llama in a library" 😂

  • @andrewowens5653
    @andrewowens5653 3 місяці тому +1

    @Matt Williams. It would be nice if you could do a video to clarify exactly which extended instruction sets are needed on the CPU to support Ollama? My old i7, only supports first generation AVX.

  • @artur50
    @artur50 3 місяці тому

    if you could provide a full tutorial on that that would be awesome

  • @gambiarran419
    @gambiarran419 3 місяці тому +1

    Fantastic video. Do you offer your time as a consultant / programmer as your explanation is so clear about the subject matter.

    • @technovangelist
      @technovangelist  3 місяці тому +2

      No, I’m focused on UA-cam for a while. But thanks

  • @user-xj5gz7ln3q
    @user-xj5gz7ln3q 3 місяці тому

    Great video as always.
    Question:
    How is it different when using embeddings via the Mistral 7B model compared to BERT? I have been utilizing the Mistral 7B model with a 4096 vector dimension, hoping to capture more contextual information compared to BERT's 1536 vectors. However, I didn't notice any speed difference between the two. Just curious if anyone else has tried it and noticed any pros or cons.

  • @Pablo-Ramirez
    @Pablo-Ramirez 5 днів тому

    Hello, all your videos are very interesting. I have been working for some time with Ollama and models like Phi3 and Llama3 and some specific models dedicated to Embedding. What I have not been able to solve when there are several similar documents, for example procedures, how can I bring the correct data when they are so similar. He brings me the information, however, he always mixes it up. Cheers and thanks for your time.

  • @sam.sleepwell
    @sam.sleepwell 3 місяці тому

    Great content! Super useful embedding. Seems we need to use nomic API from now on for using the embedding?

  • @brandonheaton6197
    @brandonheaton6197 3 місяці тому +1

    Definitely do the side by side for the db options in the context of ollama on something like an M2. Our work machines for the public school system are M2s with only 8 gigs of RAM, as a reference point. The potential for a local teaching assistant is definitely close

  • @unclecode
    @unclecode 3 місяці тому

    Amazing, Just switched from OpenAI to this a few days ago. Everything was doable locally except for this embedding that required OpenAI for quick development. Now, we've got all the pieces in place. By the way, please make a video on vector database. Do we really need a cloud service, or can we find more efficient ways to run it on the server at scale?

  • @ralphv.l8066
    @ralphv.l8066 3 місяці тому +4

    Thanks!

    • @technovangelist
      @technovangelist  3 місяці тому +4

      OMG, this is way too kind. You need to let me know how I can help you in any way. thanks so much.

  • @colliander242
    @colliander242 Місяць тому

    A great addition to Ollama. Hopefully, batching will be supported soon. As of now, it is one API call per string which makes it less suitable for larger data sets

    • @technovangelist
      @technovangelist  Місяць тому

      I’m not sure I see the issue. Any competent developer can work with this.

  • @sanjayojha1
    @sanjayojha1 3 місяці тому

    Thanks for the update, I know mostly about vector db but I would like to know difference between vector store and vector db. For example difference between Faiss and a proper vector db like qdrant.

  • @piercenorton1544
    @piercenorton1544 3 місяці тому

    Would love a video on db options

  • @Vedmalex
    @Vedmalex 3 місяці тому

    Cool! Good news!
    Lets discuss vector db, algorithms for vector search

  • @SODKGB
    @SODKGB 2 місяці тому

    Maybe you can answer this question for me, I know that we need to ingest content so it is searchable. In this video, where do your newly created embedding go in order for Ollama to access the content? Wondering if it is possible to just add newly created embedding into an existing gguf? Just want to make it easy to ingest and later ask and retrieve information using Ollama for Windows. Thanks.

    • @technovangelist
      @technovangelist  2 місяці тому +1

      You wouldn't add the embeddings to a model directly, though you can create a dataset from your content and then fine tune the model on it if you like. You add the embeddings to a vector db for RAG.

    • @SODKGB
      @SODKGB 2 місяці тому

      @@technovangelist thank you

  • @mosth8ed
    @mosth8ed 3 місяці тому

    When OpenAI first came out with plugins, I became interested in learning more about all this kind of stuff, but was quite dissatisfied with pythons speed of handling what I was trying to do so I learned enough rust to make a vectorizer that would, when I loaded a project create embeddings of all the appropriate files for the project type using All-MiniLmv12(or 6 if I changed a setting) or when I saved a specific file, it would do that one as well, upload them to a locally hosted qdrant db, which I gave a gpt plugin access to, so if I wanted, I could ask anything about my current project and it would have all the current context.
    Once I finished it, I never used it again, but it was crazy fast, and a good learning experience.

  • @mtprovasti
    @mtprovasti 3 місяці тому

    Db comparison for local instance? That would be interesting.

  • @BR-lx7py
    @BR-lx7py 3 місяці тому

    It's nice that these embeddings are generated much faster, but have you ran any tests to see if they're any good?

  • @preben01
    @preben01 3 місяці тому

    Great video as always BUT - maybe Im just not getting everything but .... Does this mean I dont have to use langchain and a local chromadb? I can just send textchunks through the API ? If so, can you have document collections? Can you remove embeddings if you need to update? Will embedding affect one model or all?

    • @technovangelist
      @technovangelist  3 місяці тому

      Rag will always need a vector store whether that’s chroma or a json file or a kludgy Postgres or whatever. But for rag there was never a need for langchain. As things get much more complicated than rag, then lc has a place.

  • @Persikys
    @Persikys 3 місяці тому

    It will be great to figure out that is the difference between all that vector dbs

  • @daryladhityahenry
    @daryladhityahenry 3 місяці тому

    Hi! Nice explanation. Now I know why people still use bert for this.. But, I want to know something, hope you can enlighten me.
    In the example, the data is either text, or pdf. What if it's from web? I mean, the data is really contaminated by many other text like: navigation text, title text, footer, ads, etc.
    We don't want that to be included into our vector db right?
    What kind of technique that we can use to clean up the data? Or maybe split ever sentence, and then embed it? looks if it's match our needs or not, and then put the fitting one to the vector db?
    But I'm afraid that ruins the data because sometimes the information context is more than a sentence. Right? I really confuse on this.
    Thank you :).

  • @HoneyCombAI
    @HoneyCombAI 3 місяці тому

    Please make the video on different vector databases. I wouldn’t mind spending an hour watching the nuanced differences with a rubric defined early on!

  • @makesnosense6304
    @makesnosense6304 3 місяці тому

    Ok, so the big question now then is if you can use embeddings generated with one of these smaller ones on a big model? Are they compatible and how does this work?

  • @mshonle
    @mshonle 3 місяці тому +1

    Really curious to know about chunking techniques where the chunk size varies based on its content, with the goal of producing more precise or relevant results for RAG queries. (I also totally thought you were going to do a Ferris Bueller at the very end.)

    • @technovangelist
      @technovangelist  3 місяці тому +2

      There will be no naked showers in my videos. Even with the camera on my face. Or you meant “oh your still here? Go home”

    • @ilianos
      @ilianos 3 місяці тому

      That's a really interesting topic for me as well! I can recommend to look at advanced chunking strategies such as "semantic chunking" using NLTK or spaCy. You should read the article titled "How to Chunk Text Data - A Comparative Analysis" by Solano Todeschini.

  • @aminzarei1557
    @aminzarei1557 3 місяці тому

    I usually use all-MiniLM-L6-v2 with it's 384 dim and it's just work for most of the cases, Tiny but accurate and fast. But definitely gonna give Nomic a shot. Tnx 🙏

  • @satyamgupta2182
    @satyamgupta2182 3 місяці тому

    Thank you for the video. But which model is getting embedded? For example, if I want to interact with a specific model "llama2" however I want to embed my text file using nomic to interact with it. This is how it works right? But here you're not really specifying the model that you want to chat with, only the model you want to embed.

    • @technovangelist
      @technovangelist  3 місяці тому

      I am specifying the model I want to use to embed the content that I want to ask llama2 about.

  • @stephenthumb2912
    @stephenthumb2912 3 місяці тому

    RAG is just the database for models. It'll exist in some form until we don't have any use for databases in general. There will always be a cost for keeping everything in memory and that includes LLM's and other DL models.

    • @technovangelist
      @technovangelist  3 місяці тому

      There is a bit more to it. And RAG is the technique. The database, specifically a vector db, is a part of rag, but not everything. And there is a lot of choices with vector dbs. You also have to decide how you want to manage embeddings, how you want to break down the source docs and more. And there is always going to be a need for rag, as long as we have internal company info and until we have a massive revolution in computing with much faster bus speeds. Gemini is showing with its massive context size that the need for rag will not go away anytime soon.

  • @prispeshnik-istini2
    @prispeshnik-istini2 2 місяці тому

    hi. I have a lot of questions. I changed your code and now it works with CSV files, but now I have a question - Where does the information broken into pieces fit? How do I work with her? " I will be grateful for your reply! Thanks !

  • @roopad8742
    @roopad8742 3 місяці тому

    Is it just me or anyone else likes the realistic pause scenes at the end of the videos😂

  • @kvrmd25
    @kvrmd25 3 місяці тому

    Can you use NLU or tokenize text to split into chunks for better embeddings?

  • @kabaduck
    @kabaduck 3 місяці тому

    Super impressive if you're updating your previous videos with corrected content, I would love to see your workflow on this as a video; maybe you already did this?

    • @technovangelist
      @technovangelist  3 місяці тому +1

      There isn’t really a process correct it mark the old one as having a correction and post a new one. Luckily nothing I have said has been wrong yet. A few people have said something was wrong but no one has been able to point to any code or examples that proves their opinions.

  • @TimothyGraupmann
    @TimothyGraupmann 3 місяці тому +1

    Look at that speed boost! It's like watching the Silicon Valley series and discovering the compression algorithm!

    • @technovangelist
      @technovangelist  3 місяці тому +1

      I lived in a house just like that in Sunnyvale back in 96-99. Just before moving to Seattle to join MSFT. The house was exactly the layout as the show and the roommates were just as odd.

  • @user-jo3kt2hv9f
    @user-jo3kt2hv9f 3 місяці тому

    Yes Pls Videos on Vector DBs and KnowledgeGraph(Nebula,neo4j) would also be Helpful

  • @rezkiy95
    @rezkiy95 3 місяці тому

    Your bunny wrote
    On a serious note great vids mate

  • @jimlynch9390
    @jimlynch9390 3 місяці тому

    I'm not sure I understand what you are saying. To use the new methods do we have to run a program to break a document we want to query up into chunks? Or does ollama do that for us. Seems to me that some models let you point to a book, pdf, or other text representation and ask questions. Oh and I'd really like a comparison of the vector dbs.

    • @technovangelist
      @technovangelist  3 місяці тому

      There are very few models that can point to a book or even a pdf and just answer questions about it. First the context size isn’t big enough and then they tend to forget stuff in the middle. Google is promising that is not the case with their new models but they promise a lot that doesn’t ever come true. And usually there is irrelevant info in the doc anyway. Rag helps get the model the relevant content for the particular query.

  • @DaveBriggs
    @DaveBriggs 3 місяці тому

    Would you have to use an overlap when chunking?

  • @aituoisang
    @aituoisang 3 місяці тому

    windows need to upgrade your ollama to 0.1.26 to use Gemma model, just figure it out when trying to delete and re-download the model all over again.. so, we should read the docs firstly or just wasting your time.. btw.. I was miss out the new embedded model from nomic. Thanks for reminding us this important feature..
    Great video as always.. thanks..

  • @sultansaeed7136
    @sultansaeed7136 3 місяці тому

    What about the most accurate embedding, the one that captures the semantic meaning of a text very well?

  • @nuvotion-live
    @nuvotion-live Місяць тому

    I keep hitting token count limitations when using embedding models. What am I doing wrong? What are the strategies to prevent that?

    • @technovangelist
      @technovangelist  Місяць тому

      How? You are splitting up your text into smaller chunks, right?

  • @dawidw.6016
    @dawidw.6016 3 місяці тому

    Very ❤ Professional

  • @vpd825
    @vpd825 3 місяці тому +1

    Like @Slimpickens45 says, please do a video on Vector DBs, but from the perspective of an Ollama user 🙏🏼

  • @khangvutien2538
    @khangvutien2538 3 місяці тому

    Thanks for sharing. If I understand well, ollama is not Google Gemma but is working with them, and ollama 0.1.26 uses Gemma model for its nomic embedding.
    But I’m struggling to understand `splitIntoChunks()` in the video
    -In line 8, `chunks` is declared as `const`.
    -In line14, you push something into `chunks`.
    How can it work?
    Please help.

    • @technovangelist
      @technovangelist  3 місяці тому +1

      Support for Gemma was added. But that is unrelated to embedding. Embedding is possible because of support for Bert models such as mimic embed text. That’s a different model. As for the code, chunks is a const and I can’t modify chunks but I can add to the array that chunks represents. You can look more into typescript to see why this works.

  • @laserboy23
    @laserboy23 2 місяці тому

    I'm using langchain (javascript ) 0.1.28 and ollama 0.1.29. I create my embeddings for a PDF file using the nomic-embed-text model. Every thing works fine! But when I'm starting my query (using model llama or mistral) the following exception is thrown:
    "Error parsing vector similarity query: query vector blob size (6144) does not match index's expected size (3072)"
    Can You help? Many thanks in advice!

    • @technovangelist
      @technovangelist  2 місяці тому

      I’m guessing you used llama2 or another model to do embeddings before. You need to redo all the embeddings

  • @markbarton
    @markbarton 2 місяці тому

    So once we have the embeddings saved as vectors - in my case considering using Weaviate - do we have to use the same model in Ollama for the inference?

    • @technovangelist
      @technovangelist  2 місяці тому

      No. Embeddings is just to find similar text. The. You provide the source text to the model, not the embedding.

    • @markbarton
      @markbarton 2 місяці тому

      @@technovangelist Ah - makes sense - so Weaviate will return the results which in turn is then passed to the model - weaviate requires the query to be encoded using the same embedding model, which I assume all Vectors DBs would. A video on Vector DBs would be very useful - especially trying to set up a local instance - after all Ollama is very much geared around local LLM and a lot of vector DBs seem to be cloud hosted only.
      In a way what is more interesting is the best methods / prompts in using example search results to feed to the local LLMs - to demonstrate why it's a more powerful approach.

  • @potter207
    @potter207 2 місяці тому

    bunnys can fly

  • @somasuraj
    @somasuraj 3 місяці тому

    Can you make a video on How vector database work? It's internal working

  • @theh1ve
    @theh1ve 3 місяці тому +1

    So are these embeddings 'better' than some of the huggingface embeddings? Having said that the more important question is what is in that flask, i think thats what we all want to know! 😊

  • @ClaudioBottari
    @ClaudioBottari 3 місяці тому

    A video about how to navigate in all the possibilities that we got in vector db field... it would be very useful

  • @henkhbit5748
    @henkhbit5748 3 місяці тому

    I am not familiar yet with ollama. I have been waiting for the windows version... Does it only support specific embeddings? I use for example BGE embeddings for rag. Is this possible? I also see in comments that ollama does not support multi user inference concurrently. If true than it's ok for testing but not for production.
    Btw: I prefer 2 legs Bunnies than flying Bunnies😉

    • @technovangelist
      @technovangelist  3 місяці тому

      Ollama for now is focused on being the primary production ready single user ai application. There are plenty of folks who have shown how to achieve concurrent use of multiple models but of course to enable max output that would have to involve multiple systems. Ollama can’t magically produce cycles out of thin air. Or are you just asking for queueing. That’s been there since day one.

    • @henkhbit5748
      @henkhbit5748 3 місяці тому

      @@technovangelist if I have a chat bot application based on ollma. Is it possible that multiple users can access the application without waiting or getting into a deadlock?

    • @technovangelist
      @technovangelist  3 місяці тому

      i guess it depends on how you build it.

  • @ischmitty
    @ischmitty 3 місяці тому +2

    Your TypeScript embeedding sample wasn't written to fire off the embeddings call in parallel. I'm not sure if that would make a huge difference locally depending on the utilization of system resources by ollama. It certainly makes a massive difference when using an API like OpenAI's embedding model, where you can process each chunk in parallel.

    • @technovangelist
      @technovangelist  3 місяці тому +2

      But ollama runs on your local hardware and is meant for single user rather than having the $750k er day compute costs. Plus there are all the security and privacy risks with that.

    • @ischmitty
      @ischmitty 3 місяці тому

      @@technovangelistWasn't meaning to compare local vs OpenAI et al. I agree with you on that. I was referring to writing asynchronous code to run the requests in parallel

    • @technovangelist
      @technovangelist  3 місяці тому

      But ollama won’t process things in parallel. Allowing for that would mean every request would be slower. If a process takes 75% or the system, running 2 or 3 of them with finite resources means everything runs slower.

  • @andrebremer7772
    @andrebremer7772 3 місяці тому

    I am not sure if that feature is that big of a deal honestly.
    I recently set up Llama-Index using HF embeddings on top of Ollama. Very straightforward. Just a handful of lines of code and given all the available integrations, you have document loading and indexing handled for you.

    • @technovangelist
      @technovangelist  3 місяці тому

      Why require someone to use something extra if it is now built in?

  • @MrMitdac01
    @MrMitdac01 3 місяці тому

    can you make a example how ollama host LLM in local LAN network for other can use LLm please

  • @gilbertb99
    @gilbertb99 3 місяці тому

    Do people actually use llama2 for embeddings though?

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 2 місяці тому

    how about looking at crewai and ollama together?

  • @fkxfkx
    @fkxfkx 3 місяці тому

    Maybe you could share with us the update procedure if we're running ollama webui for windows out of local docker, the best way to update it without screwing it up?

    • @technovangelist
      @technovangelist  3 місяці тому

      usually with docker its just a matter of pulling the container again. Why did you choose to use docker on windows?

    • @fkxfkx
      @fkxfkx 3 місяці тому

      @@technovangelisti
      Ok,that’s not updating, but it will work 👍
      I do so much with windows, and so do my clients. It makes sense to keep docker windows in the loop. And so much online is about mac, this is outlier

    • @technovangelist
      @technovangelist  3 місяці тому

      That’s the standard way to update docker containers. they are supposed to be immutable.

    • @fkxfkx
      @fkxfkx 3 місяці тому

      @vangelist don't mean to be argumentative but while images are immutable, (below from microsoft copilot)
      Docker Containers:
      Dynamic and Mutable: Containers are dynamic and mutable instances created from images.
      Writable Layer: Containers have a writable layer where runtime changes can be temporarily stored.
      Statefulness: Containers can hold runtime data, but their core image remains unchanged.
      I assume a new upgrade image used to rebuild the container would have accommodations to preserve existing downloads of models, etc, but I could be wrong. Demolishing all previous work just to install an upgrade would be unfortunate.
      The folks on their discord are being a little hazy about this and it would be helpful to get a deterministic and clear statement of the situation.
      I'm just looking for a clear docker command to upgrade without losing my models downloads.
      🤷‍♂

  • @geraldofrancisco5206
    @geraldofrancisco5206 3 місяці тому

    keep up

  • @userou-ig1ze
    @userou-ig1ze 3 місяці тому

    Can't you read the file fully into ram before processing? Sounds unbelievable that read/write speed is the limiting factor

    • @technovangelist
      @technovangelist  3 місяці тому

      I don’t think I understand the question. Can you clarify?

    • @userou-ig1ze
      @userou-ig1ze 3 місяці тому

      Mea culpa, I inferred at 5:50 that loading/processing the file would take most processing time, but I guess I was mistaken. Thanks for the reply though and continous commitment and interaction with userbase. Respect and thumbs up

  • @carterjames199
    @carterjames199 3 місяці тому

    Please do a vector db comparison video

  • @pablocosta7181
    @pablocosta7181 3 місяці тому

    Hi Matt . You are realy impresionante. Could you share with me a siurce Code of video example. I'll be very happy

  • @knoopx
    @knoopx 3 місяці тому

    did they finally add batching support?

    • @technovangelist
      @technovangelist  3 місяці тому

      can you tell me more about what you mean by this?

    • @knoopx
      @knoopx 3 місяці тому

      @@technovangelist batching? Generating multiple embeddings at once in a single request

    • @technovangelist
      @technovangelist  3 місяці тому

      Have you added an issue to the repo? Pretty easy just to add multiple requests vs queueing things up in ollama.

    • @technovangelist
      @technovangelist  3 місяці тому

      This is the first version they have shown any love for embedding. can't expect everything in one release.

    • @knoopx
      @knoopx 3 місяці тому

      @@technovangelist yeah i complained about it +two months ago and they actually fixed some of the points. Issue #962

  • @mvdiogo
    @mvdiogo 3 місяці тому

    i think bunny can fly, i just saw in your video

  • @jeanchindeko5477
    @jeanchindeko5477 3 місяці тому +1

    4:42 ok I’ll not say bunny can fly or should fly! But Bun is definitely not an alternative to JavaScript, instead it’s an alternative to Nodejs, and the code you’re showing is written in Typescript which is a superset of JavaScript that Bun can natively support.
    Other than that thanks for this great informative and entertaining video.

    • @technovangelist
      @technovangelist  3 місяці тому

      Omg, I flub one line in my script and it gets pointed out immediately. It use to be that hardly anyone saw these.

    • @technovangelist
      @technovangelist  3 місяці тому

      But thanks for noticing. And watching. And being here.

  • @Soniboy84
    @Soniboy84 3 місяці тому

    You sound like Shawn Woods from youtube. Maybe you guys are from the same area