Ollama 0.1.26 Makes Embedding 100x Better

Matt Williams

1 800

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 5 лют 2025
Embedding has always been part of Ollama, but before 0.1.26, it kinda sucked. Now it’s amazing, and could be the best tool for the job.
Yes I know I flubbed the line about bun. It’s not an alt to js. It’s a whole new runner for js/ts. Makes typescript which is a better js even better than it was.
My Links 🔗
👉🏻 Subscribe (free): / technovangelist
👉🏻 Join and Support: / @technovangelist
👉🏻 Newsletter: technovangelis...
👉🏻 Twitter: / technovangelist
👉🏻 Discord: / discord
👉🏻 Patreon: / technovangelist
👉🏻 Instagram: / technovangelist
👉🏻 Threads: www.threads.ne...
👉🏻 LinkedIn: / technovangelist
👉🏻 All Source Code: github.com/tec...
Want to sponsor this channel? Let me know what your plans are here: technovangelis...

КОМЕНТАРІ • 224

@Slimpickens45 11 місяців тому ⁺⁵³
I am here for it. Lets goooo! And yes, videos on vector DBs would be amazing.
@dinoscheidt 11 місяців тому ⁺¹
Postgres pgvector or Redis. Done. Vectors in DBs is incredible easy - beside all the very adversarial hype and marketing - what is hard is to iterate i.e. on the chunking size.
@Hypersniper05 11 місяців тому
Easy just use plain Json to store the emeddings and text locally 😊 , granted not for scale but local projects it's fast enough
@efexzium 11 місяців тому
does anyone know where ollama rag codes exist?
@technovangelist 11 місяців тому ⁺¹
ollama itself doesn't do anything with rag. rag would be part of the solution you build with ollama.
@MEDEBER-TECHNOLOGIES 11 місяців тому
Definitely looking forward to vector DBs video.
@ChetanVashistth 11 місяців тому ⁺¹⁰
You are a great teacher!! I want to see more videos of yours. Thanks for your service🙇
@ralphv.l8066 11 місяців тому ⁺⁴
Thanks!
@technovangelist 11 місяців тому ⁺⁴
OMG, this is way too kind. You need to let me know how I can help you in any way. thanks so much.
@guidoschmutz 11 місяців тому ⁺²
Thanks a lot for all your videos, this one really helped me a lot, just started with Ollama and local LLMs a week ago and was using llama2 for embeddings and it was painfully slow and I didn't event know that it can be faster until I watched this video yesterday evening. Just changed it to using "nomic-embed-text" and I love it :-) Thanks and keep up the good work! I also really like you humor !!!
@joan_arc 11 місяців тому ⁺⁶
Hi Matt, thanks for making these videos. It is very informative and helpful.
@joeburkeson8946 11 місяців тому
Looking forward to when tools to embed documents into models become available, thanks for all you do.
@archamondearchenwold8084 11 місяців тому ⁺⁹
Your voice is amazing. I could listen to you present on anything man. Amazing video
@NLPprompter 11 місяців тому ⁺¹
thank you, I really appreciate your works and support. can't wait next video.
@LordOfRuin 11 місяців тому ⁺¹
Thank you! swapping my langchain embedding model with nomic-embed-text, really speed it up. This really is bigger news then gemma
@martinisj 11 місяців тому
A video on vector databases would be great. As always, please do not forget to include a brief how-to, those well-thought snippets in your videos really do make a difference. Thanks!
@Turbozilla 11 місяців тому ⁺³
I'm loving your videos! I really like that their to the point. Out of all the UA-camrs doing video in this AI, LLM space, I enjoy yours the most. Keep the coming! Tell your family this is more important! Lol 😮. I'm kidding. 😂
@rccmhalfar 11 місяців тому
Thanks for your superb videos, your content is so rich and well paced - would like to see more about model training using ollama and embedding
@trsd8640 11 місяців тому
Great video! Embeddings take Ollama to the next level! And I love that you dont lose a word about Gemma ;)
@sun33t 11 місяців тому
Thanks for posting these videos mate. I’m finding them so helpful in orienting myself in the world of ai tooling 🎉
@janduplessis1357 11 місяців тому ⁺¹
Hi Matt, love your content - super stuff thank you, this is exactly what I was looking for and you explain it so well, I am working on a project of RAG search using open-source for a big Genomics project, providing specific information to users of the service, really detailed information about which test to request etc this video came just at the right time 👍
@technovangelist 11 місяців тому
Great. Maybe I should suggest it to my sister who does that kind of thing.
@riftsassassin8954 11 місяців тому
I personally struggle to understand and use embeddings effectively. This video is highly appreciated! please do go on a deep dive on the differences on vector db providers. I'll definitely like and share if you do!
@JoshuaMcQueen 11 місяців тому ⁺²
Really nice video Matt. We're thinking about doing a similar video testing top 5-10 vector DBs
11 місяців тому
Thank you for the video. I was looking into calling embedding in golang since all embedding services were very slow.
PS: I thought there was a surprise at the end since there was a silent part after you finished talking.
@technovangelist 11 місяців тому ⁺²
There is a crowd of fans that love that at the end.
@disturb16 11 місяців тому ⁺³⁷
Could you share the source code of the examples you use in your videos?
@efficiencygeek 11 місяців тому
Yes, please, specially the python script.
@potatodog7910 11 місяців тому
That would be helpful
@jrfcs18 11 місяців тому
Please share code you show in your example
@karanv293 11 місяців тому ⁺¹
This is such good content. Can you do a full video tutorial on a production case of a best rag strategy. There's so many out there .
@miikalewandowski7765 11 місяців тому
Haha 😂 I love the ending! Reminds me of Roy Anderssons brilliant movie „Songs from the second floor“. Also great content. Keep it up 👌
@JulianHarris 11 місяців тому
This is absolutely brilliant. Also, to answer your question, looking at vector databases, I think a useful distinction is whether they support Colbert-style embeddings because Colbert is clearly the way forward when you want high-quality embeddings.
11 місяців тому
Thank you Matt for making these videos!
@nicholasdudfield8610 11 місяців тому
Vids keep getting better - and thanks - I overlooked the embeddings due to gemma!
@lucioussmoothy 11 місяців тому
Very informative and on point ..Keep up the good work Matt.
@brian2590 11 місяців тому
I jumped when i saw this. This is very exciting for me. Thank you!
@SyntharaPrime 10 місяців тому ⁺¹
Thank you for your great effort
@colliander242 9 місяців тому
A great addition to Ollama. Hopefully, batching will be supported soon. As of now, it is one API call per string which makes it less suitable for larger data sets
@technovangelist 9 місяців тому
I’m not sure I see the issue. Any competent developer can work with this.
@RoyFox-t1i 11 місяців тому
Great video! would love to see the vector DB video as well
@vikrantkhedkar6451 10 місяців тому
Great video i was really trying find some open source embedding model❤❤
@c0t1 11 місяців тому
I really loved this video! Great and super timely topic. Yes on a Vector DB comparison video.
@artur50 11 місяців тому
Having a ball of laughter at the end . Cheers!
@hossainmahi3559 11 місяців тому
Thanks a lot for your great videos! Please make a video on "how to" and "which" of vector databases.
@aisimp 11 місяців тому
Love the delivery. Got me laughing with “Hello World of RAG 😂” … totally agree 👍
@brandonheaton6197 11 місяців тому ⁺¹
Definitely do the side by side for the db options in the context of ollama on something like an M2. Our work machines for the public school system are M2s with only 8 gigs of RAM, as a reference point. The potential for a local teaching assistant is definitely close
@marcosissler 11 місяців тому
Thank you Matt! 🎉
@artur50 11 місяців тому
if you could provide a full tutorial on that that would be awesome
@Pablo-Ramirez 8 місяців тому
Hello, all your videos are very interesting. I have been working for some time with Ollama and models like Phi3 and Llama3 and some specific models dedicated to Embedding. What I have not been able to solve when there are several similar documents, for example procedures, how can I bring the correct data when they are so similar. He brings me the information, however, he always mixes it up. Cheers and thanks for your time.
@HistoryIsAbsurd 11 місяців тому
Definitely still learning on this topic here so thank you for the vid! Be interesting to dive into
@gambiarran419 11 місяців тому ⁺¹
Fantastic video. Do you offer your time as a consultant / programmer as your explanation is so clear about the subject matter.
@technovangelist 11 місяців тому ⁺²
No, I’m focused on UA-cam for a while. But thanks
@professional_0000 9 місяців тому ⁺¹
What prompt you used for the miniature of this video, man?
"colorful llama in a library" 😂
@aminzarei1557 11 місяців тому
I usually use all-MiniLM-L6-v2 with it's 384 dim and it's just work for most of the cases, Tiny but accurate and fast. But definitely gonna give Nomic a shot. Tnx 🙏
@mosth8ed 11 місяців тому
When OpenAI first came out with plugins, I became interested in learning more about all this kind of stuff, but was quite dissatisfied with pythons speed of handling what I was trying to do so I learned enough rust to make a vectorizer that would, when I loaded a project create embeddings of all the appropriate files for the project type using All-MiniLmv12(or 6 if I changed a setting) or when I saved a specific file, it would do that one as well, upload them to a locally hosted qdrant db, which I gave a gpt plugin access to, so if I wanted, I could ask anything about my current project and it would have all the current context.
Once I finished it, I never used it again, but it was crazy fast, and a good learning experience.
@roopad8742 11 місяців тому
Is it just me or anyone else likes the realistic pause scenes at the end of the videos😂
@theh1ve 11 місяців тому ⁺¹
So are these embeddings 'better' than some of the huggingface embeddings? Having said that the more important question is what is in that flask, i think thats what we all want to know! 😊
@HoneyCombAI 11 місяців тому
Please make the video on different vector databases. I wouldn’t mind spending an hour watching the nuanced differences with a rubric defined early on!
@joxxen 11 місяців тому
You are great, your content is great. Thanks
@andrewowens5653 11 місяців тому ⁺¹
@Matt Williams. It would be nice if you could do a video to clarify exactly which extended instruction sets are needed on the CPU to support Ollama? My old i7, only supports first generation AVX.
@piercenorton1544 11 місяців тому
Would love a video on db options
@Parsecter 11 місяців тому
It will be great to figure out that is the difference between all that vector dbs
@stephenthumb2912 11 місяців тому
RAG is just the database for models. It'll exist in some form until we don't have any use for databases in general. There will always be a cost for keeping everything in memory and that includes LLM's and other DL models.
@technovangelist 11 місяців тому
There is a bit more to it. And RAG is the technique. The database, specifically a vector db, is a part of rag, but not everything. And there is a lot of choices with vector dbs. You also have to decide how you want to manage embeddings, how you want to break down the source docs and more. And there is always going to be a need for rag, as long as we have internal company info and until we have a massive revolution in computing with much faster bus speeds. Gemini is showing with its massive context size that the need for rag will not go away anytime soon.
@mshonle 11 місяців тому ⁺¹
Really curious to know about chunking techniques where the chunk size varies based on its content, with the goal of producing more precise or relevant results for RAG queries. (I also totally thought you were going to do a Ferris Bueller at the very end.)
@technovangelist 11 місяців тому ⁺²
There will be no naked showers in my videos. Even with the camera on my face. Or you meant “oh your still here? Go home”
@ilianos 11 місяців тому
That's a really interesting topic for me as well! I can recommend to look at advanced chunking strategies such as "semantic chunking" using NLTK or spaCy. You should read the article titled "How to Chunk Text Data - A Comparative Analysis" by Solano Todeschini.
@ischmitty 11 місяців тому ⁺²
Your TypeScript embeedding sample wasn't written to fire off the embeddings call in parallel. I'm not sure if that would make a huge difference locally depending on the utilization of system resources by ollama. It certainly makes a massive difference when using an API like OpenAI's embedding model, where you can process each chunk in parallel.
@technovangelist 11 місяців тому ⁺²
But ollama runs on your local hardware and is meant for single user rather than having the $750k er day compute costs. Plus there are all the security and privacy risks with that.
@ischmitty 11 місяців тому
@@technovangelistWasn't meaning to compare local vs OpenAI et al. I agree with you on that. I was referring to writing asynchronous code to run the requests in parallel
@technovangelist 11 місяців тому
But ollama won’t process things in parallel. Allowing for that would mean every request would be slower. If a process takes 75% or the system, running 2 or 3 of them with finite resources means everything runs slower.
@michaelthompson8251 7 місяців тому
curious.
maybe a bench marking of war and peace using the the various data bases based on size and or based on speed
@technovangelist 7 місяців тому
it really should be a more recent long document. War and Peace is probably a part of every model already. But I need to find something written in the last year or so that is long and not part of the training data for every model.
@kabaduck 11 місяців тому
Super impressive if you're updating your previous videos with corrected content, I would love to see your workflow on this as a video; maybe you already did this?
@technovangelist 11 місяців тому ⁺¹
There isn’t really a process correct it mark the old one as having a correction and post a new one. Luckily nothing I have said has been wrong yet. A few people have said something was wrong but no one has been able to point to any code or examples that proves their opinions.
@rezkiy95 11 місяців тому
Your bunny wrote
On a serious note great vids mate
@Vedmalex 11 місяців тому
Cool! Good news!
Lets discuss vector db, algorithms for vector search
@andrebremer7772 11 місяців тому
I am not sure if that feature is that big of a deal honestly.
I recently set up Llama-Index using HF embeddings on top of Ollama. Very straightforward. Just a handful of lines of code and given all the available integrations, you have document loading and indexing handled for you.
@technovangelist 11 місяців тому
Why require someone to use something extra if it is now built in?
@JimLloyd1 11 місяців тому
Hey Matt, I'm excited that ollama supports nomic-embed-text due to its large maximum sequence length of 8192 tokens. You mentioned "summaries and summaries of summaries". Summaries are really necessary when the max sequence length is 512 tokens, which is typical of most embedding models. I''m very curious to see if the 8K sequence length can significantly reduce the need for summarization. Thanks for your high quality videos.
@BobKane-g6x 11 місяців тому
Great video as always.
Question:
How is it different when using embeddings via the Mistral 7B model compared to BERT? I have been utilizing the Mistral 7B model with a 4096 vector dimension, hoping to capture more contextual information compared to BERT's 1536 vectors. However, I didn't notice any speed difference between the two. Just curious if anyone else has tried it and noticed any pros or cons.
@elanrider 11 місяців тому ⁺¹
All in for vector DBs!
@jeanchindeko5477 11 місяців тому ⁺¹
4:42 ok I’ll not say bunny can fly or should fly! But Bun is definitely not an alternative to JavaScript, instead it’s an alternative to Nodejs, and the code you’re showing is written in Typescript which is a superset of JavaScript that Bun can natively support.
Other than that thanks for this great informative and entertaining video.
@technovangelist 11 місяців тому
Omg, I flub one line in my script and it gets pointed out immediately. It use to be that hardly anyone saw these.
@technovangelist 11 місяців тому
But thanks for noticing. And watching. And being here.
@BR-lx7py 11 місяців тому
It's nice that these embeddings are generated much faster, but have you ran any tests to see if they're any good?
@sam.sleepwell 11 місяців тому
Great content! Super useful embedding. Seems we need to use nomic API from now on for using the embedding?
@technovangelist 11 місяців тому ⁺¹
Until there is a better one
@TimothyGraupmann 11 місяців тому ⁺¹
Look at that speed boost! It's like watching the Silicon Valley series and discovering the compression algorithm!
@technovangelist 11 місяців тому ⁺¹
I lived in a house just like that in Sunnyvale back in 96-99. Just before moving to Seattle to join MSFT. The house was exactly the layout as the show and the roommates were just as odd.
@mtprovasti 11 місяців тому
Db comparison for local instance? That would be interesting.
@stevegailey770 3 місяці тому
There seems to be a problem with ollama embed api in batch mode. It appears to return the embeddings in a jumbled order which means you can't match them up correctly with the input code. Not sure if anyone else is seeing this?
@technovangelist 3 місяці тому
I hadn't seen that. I know there isn't really any benefit to using it, but its always come out in the right order.
@stevegailey770 3 місяці тому
@@technovangelist There seems to be a speed benefit. I'm sending batches of 128 which takes about 4 seconds to process (on my old hardware) whilst I can process about 2 - 4 individual embeddings if I don't use the batch mode. Of course it would help if I got back embeddings I can use 🙂
@technovangelist 3 місяці тому
Interesting. That wasn't my experience with it. Maybe I should try again.
@stevegailey770 3 місяці тому
@@technovangelist Probably no point until they fix the ordering of the embedding!
@pixelplayhouse89 11 місяців тому
windows need to upgrade your ollama to 0.1.26 to use Gemma model, just figure it out when trying to delete and re-download the model all over again.. so, we should read the docs firstly or just wasting your time.. btw.. I was miss out the new embedded model from nomic. Thanks for reminding us this important feature..
Great video as always.. thanks..
@ClaudioBottari 11 місяців тому
A video about how to navigate in all the possibilities that we got in vector db field... it would be very useful
@sanjayojha1 11 місяців тому
Thanks for the update, I know mostly about vector db but I would like to know difference between vector store and vector db. For example difference between Faiss and a proper vector db like qdrant.
@kvrmd25 11 місяців тому
Can you use NLU or tokenize text to split into chunks for better embeddings?
@unclecode 11 місяців тому
Amazing, Just switched from OpenAI to this a few days ago. Everything was doable locally except for this embedding that required OpenAI for quick development. Now, we've got all the pieces in place. By the way, please make a video on vector database. Do we really need a cloud service, or can we find more efficient ways to run it on the server at scale?
@basicvisual7137 7 місяців тому
Do you have videos on hugging face embeddings ? Secondly, have you experimented with Graph RAG ?
@technovangelist 7 місяців тому
I used to use that before ollama supplied it and now just use ollama since it’s faster and better in every way
@sultansaeed7136 11 місяців тому
What about the most accurate embedding, the one that captures the semantic meaning of a text very well?
@somasuraj 11 місяців тому
Can you make a video on How vector database work? It's internal working
@mcpduk 8 місяців тому
old skool....loads of good frameworks make embedding VERY VERY simple
@technovangelist 8 місяців тому
It’s pretty hard to beat the simplicity without a framework. And most frameworks just complicate without benefit. Like langchain and llamaindex.
@laserboy23 10 місяців тому
I'm using langchain (javascript ) 0.1.28 and ollama 0.1.29. I create my embeddings for a PDF file using the nomic-embed-text model. Every thing works fine! But when I'm starting my query (using model llama or mistral) the following exception is thrown:
"Error parsing vector similarity query: query vector blob size (6144) does not match index's expected size (3072)"
Can You help? Many thanks in advice!
@technovangelist 10 місяців тому
I’m guessing you used llama2 or another model to do embeddings before. You need to redo all the embeddings
@daryladhityahenry 11 місяців тому
Hi! Nice explanation. Now I know why people still use bert for this.. But, I want to know something, hope you can enlighten me.
In the example, the data is either text, or pdf. What if it's from web? I mean, the data is really contaminated by many other text like: navigation text, title text, footer, ads, etc.
We don't want that to be included into our vector db right?
What kind of technique that we can use to clean up the data? Or maybe split ever sentence, and then embed it? looks if it's match our needs or not, and then put the fitting one to the vector db?
But I'm afraid that ruins the data because sometimes the information context is more than a sentence. Right? I really confuse on this.
Thank you :).
@prispeshnik-istini2 11 місяців тому
hi. I have a lot of questions. I changed your code and now it works with CSV files, but now I have a question - Where does the information broken into pieces fit? How do I work with her? " I will be grateful for your reply! Thanks !
@jjolla6391 2 місяці тому
i dont understand the relationship between ollama and nomic-embed-text -- the latter i can use outside ollama . Are you excited that ollama now supports it?
@technovangelist 2 місяці тому
Ollama runs models. Lots of models. Many model runners run them too. I was excited when ollama started supporting nomic 8 months ago.
@user-wr4yl7tx3w 10 місяців тому
how about looking at crewai and ollama together?
@nuvotion-live 9 місяців тому
I keep hitting token count limitations when using embedding models. What am I doing wrong? What are the strategies to prevent that?
@technovangelist 9 місяців тому
How? You are splitting up your text into smaller chunks, right?
@khangvutien2538 11 місяців тому
Thanks for sharing. If I understand well, ollama is not Google Gemma but is working with them, and ollama 0.1.26 uses Gemma model for its nomic embedding.
But I’m struggling to understand `splitIntoChunks()` in the video
-In line 8, `chunks` is declared as `const`.
-In line14, you push something into `chunks`.
How can it work?
Please help.
@technovangelist 11 місяців тому ⁺¹
Support for Gemma was added. But that is unrelated to embedding. Embedding is possible because of support for Bert models such as mimic embed text. That’s a different model. As for the code, chunks is a const and I can’t modify chunks but I can add to the array that chunks represents. You can look more into typescript to see why this works.
@seannewell397 11 місяців тому
Embedding is serialization to a common format for easier and faster comparisons. Sourcegraph use it for Cody.
lmk what I'm missing cause that seems too simplistic.
@technovangelist 11 місяців тому
It’s more than serialization. It’s understanding the meaning of the text. You aren’t comparing words but comparing the semantic meaning of those words
@potter207 10 місяців тому
bunnys can fly
@makesnosense6304 11 місяців тому
Ok, so the big question now then is if you can use embeddings generated with one of these smaller ones on a big model? Are they compatible and how does this work?
@markbarton 10 місяців тому
So once we have the embeddings saved as vectors - in my case considering using Weaviate - do we have to use the same model in Ollama for the inference?
@technovangelist 10 місяців тому
No. Embeddings is just to find similar text. The. You provide the source text to the model, not the embedding.
@markbarton 10 місяців тому
@@technovangelist Ah - makes sense - so Weaviate will return the results which in turn is then passed to the model - weaviate requires the query to be encoded using the same embedding model, which I assume all Vectors DBs would. A video on Vector DBs would be very useful - especially trying to set up a local instance - after all Ollama is very much geared around local LLM and a lot of vector DBs seem to be cloud hosted only.
In a way what is more interesting is the best methods / prompts in using example search results to feed to the local LLMs - to demonstrate why it's a more powerful approach.
@vpd825 11 місяців тому ⁺¹
Like @Slimpickens45 says, please do a video on Vector DBs, but from the perspective of an Ollama user 🙏🏼
@henkhbit5748 11 місяців тому
I am not familiar yet with ollama. I have been waiting for the windows version... Does it only support specific embeddings? I use for example BGE embeddings for rag. Is this possible? I also see in comments that ollama does not support multi user inference concurrently. If true than it's ok for testing but not for production.
Btw: I prefer 2 legs Bunnies than flying Bunnies😉
@technovangelist 11 місяців тому
Ollama for now is focused on being the primary production ready single user ai application. There are plenty of folks who have shown how to achieve concurrent use of multiple models but of course to enable max output that would have to involve multiple systems. Ollama can’t magically produce cycles out of thin air. Or are you just asking for queueing. That’s been there since day one.
@henkhbit5748 11 місяців тому
@@technovangelist if I have a chat bot application based on ollma. Is it possible that multiple users can access the application without waiting or getting into a deadlock?
@technovangelist 11 місяців тому
i guess it depends on how you build it.
@MrMitdac01 11 місяців тому
can you make a example how ollama host LLM in local LAN network for other can use LLm please
@Soniboy84 11 місяців тому
You sound like Shawn Woods from youtube. Maybe you guys are from the same area
@dawidw.6016 11 місяців тому
Very ❤ Professional
@preben01 11 місяців тому
Great video as always BUT - maybe Im just not getting everything but .... Does this mean I dont have to use langchain and a local chromadb? I can just send textchunks through the API ? If so, can you have document collections? Can you remove embeddings if you need to update? Will embedding affect one model or all?
@technovangelist 11 місяців тому
Rag will always need a vector store whether that’s chroma or a json file or a kludgy Postgres or whatever. But for rag there was never a need for langchain. As things get much more complicated than rag, then lc has a place.
@carterjames199 11 місяців тому
Please do a vector db comparison video
@technovangelist 11 місяців тому
its being worked on now. thanks
@fkxfkx 11 місяців тому
Maybe you could share with us the update procedure if we're running ollama webui for windows out of local docker, the best way to update it without screwing it up?
@technovangelist 11 місяців тому
usually with docker its just a matter of pulling the container again. Why did you choose to use docker on windows?
@fkxfkx 11 місяців тому
@@technovangelisti
Ok,that’s not updating, but it will work 👍
I do so much with windows, and so do my clients. It makes sense to keep docker windows in the loop. And so much online is about mac, this is outlier
@technovangelist 11 місяців тому
That’s the standard way to update docker containers. they are supposed to be immutable.
@fkxfkx 11 місяців тому
@vangelist don't mean to be argumentative but while images are immutable, (below from microsoft copilot)
Docker Containers:
Dynamic and Mutable: Containers are dynamic and mutable instances created from images.
Writable Layer: Containers have a writable layer where runtime changes can be temporarily stored.
Statefulness: Containers can hold runtime data, but their core image remains unchanged.
I assume a new upgrade image used to rebuild the container would have accommodations to preserve existing downloads of models, etc, but I could be wrong. Demolishing all previous work just to install an upgrade would be unfortunate.
The folks on their discord are being a little hazy about this and it would be helpful to get a deterministic and clear statement of the situation.
I'm just looking for a clear docker command to upgrade without losing my models downloads.
🤷‍♂
@jimlynch9390 11 місяців тому
I'm not sure I understand what you are saying. To use the new methods do we have to run a program to break a document we want to query up into chunks? Or does ollama do that for us. Seems to me that some models let you point to a book, pdf, or other text representation and ask questions. Oh and I'd really like a comparison of the vector dbs.
@technovangelist 11 місяців тому
There are very few models that can point to a book or even a pdf and just answer questions about it. First the context size isn’t big enough and then they tend to forget stuff in the middle. Google is promising that is not the case with their new models but they promise a lot that doesn’t ever come true. And usually there is irrelevant info in the doc anyway. Rag helps get the model the relevant content for the particular query.
@pablocosta7181 11 місяців тому
Hi Matt . You are realy impresionante. Could you share with me a siurce Code of video example. I'll be very happy
@SODKGB 10 місяців тому
Maybe you can answer this question for me, I know that we need to ingest content so it is searchable. In this video, where do your newly created embedding go in order for Ollama to access the content? Wondering if it is possible to just add newly created embedding into an existing gguf? Just want to make it easy to ingest and later ask and retrieve information using Ollama for Windows. Thanks.
@technovangelist 10 місяців тому ⁺¹
You wouldn't add the embeddings to a model directly, though you can create a dataset from your content and then fine tune the model on it if you like. You add the embeddings to a vector db for RAG.
@SODKGB 10 місяців тому
@@technovangelist thank you
@gilbertb99 11 місяців тому
Do people actually use llama2 for embeddings though?
@technovangelist 11 місяців тому
Yes. Pretty common.
@userou-ig1ze 11 місяців тому
Can't you read the file fully into ram before processing? Sounds unbelievable that read/write speed is the limiting factor
@technovangelist 11 місяців тому
I don’t think I understand the question. Can you clarify?
@userou-ig1ze 11 місяців тому
Mea culpa, I inferred at 5:50 that loading/processing the file would take most processing time, but I guess I was mistaken. Thanks for the reply though and continous commitment and interaction with userbase. Respect and thumbs up

Наступне

Автоматичне відтворення