In my experience, I think the best improvement on this workflow is to give the search engine to the model as a tool instead of sending the question directly to the search engine. That way the model can ask multiple questions and keep track of the conversation, the responses get much better and much more natural. For someone building a new RAG system I would highly recommend this approach, you need a little more knowledge because of the tool use, but it is very simple and there are tutorials in this channel too.
@@SajjadHussain-d9d "How does function calling with tools really work?" and "Function Calling in Ollama vs OpenAI" in this channel. And also the OpenAI documentation on "Function calling" and the "Chat Completion Object". I can't paste links in youtube.
Also, you can integrate this flow in a while loop that ends when the finish reason is "stop" (at least with the OpenAI API) and it can keep searching until it generates a response or until you force it to finish and generate a response.
A quick walkthrough on how to set this up would be awesome! I got as far as downloading deno, making a venv in python, copying the github, cd to the files, and thats about it. Really want to try this out, cool video and idea!
Thank you Matt. The information is very useful and understandable in your presentation. Good luck to you! I sincerely hope that you will soon have 1,000,000 subscribers!
Agreed Ben Eater is legendary. Programming with Gio i havent tried and Matt william is just a super awesome guy, I use his content to stay updated with ai tools that i can use locally
First time here. Instant subscribe. I already thought how perplexity works under the hood. Yeap, that video is the answer. Well done, curious about what you will come up next.
Performance is one thing. Plus it compiles to other platforms far more easily. With bun i rarely reliably could write it on my Mac and compile to windows and it would just work. Deno allows this. And vs node? Typescript. It really sucks when I need to build something and I have to use js instead of ts.
Great info. I was wondering when Ollama would provide the function on website to integrate with LLM for most up to date answers, and there you have it.
This is Awesom. Thank you.I am not a programmer anymore and this is useful and helpful to me. I also know I can ask a ai to explain it to me. Can I use this in Ollama and run it through the interface of anything llm and how does this compare to anything llm's web use?(which I can't get to work) Until last week when I started to play with Ollama I hadn't programmed in more than 15 years, its amazing to see how python hasn't changed much in some aspects.
Integrating functions into ollama still has me a bit confused. I get how to pull pages or links or searches, massage the data and feed it to ollama to get a summary. I just dont understand how you can tell ollama to search and it knows about the function to call. I guess I need to spend a weekend and try to deep dive it. For now, I tend to use N8N to pull the data for the request and feed it to ollama to formulate a proper response.
Could this be done in n8n, too? Also what vector databae to use there for designing an Ai agent that is open source and free and self hostable? From what i understand Supabase is not easily self-hostable?
thank you Matt, Very informative and new idea for searching to get the precise information from the web. do something with the output the output is in text format can we get this text format in a formatted output or in a table form or required format. to do some modify code.
This search is very nice.... but I use Ollama with n8n workflow. And I have also the searxng installed in docker. Do you think is possible ti replicate it in n8n please ? Thanks for the video
The web search is an upgrade after a RAG process or with Web search feature the RAG part can be skipped? Thanks for your hard work Matt, and sorry for this stupid question
yes and/or no. it depends. if you need to use web search use that. if you need stuff from local docs, use that. some folks will do rag on a websearch. so it depends.
Hey Matt great video as always. Would suggest suggest to use readerlm to convert the html to markdown for the LLM or is cleaning still the better option?
Readerlm generates better looking results in 4 seconds each and this returns each one in maybe 50 ms if that. I don’t need good looking results. Readerlm is solving a problem we don’t have. And there are other better tools for that still. It’s like using LLMs for ocr. There are better faster tools.
Hey Matt! Love the content! I'm working on a sales enablement market research paper generation tool. My approach is different. I take more of a "dredging" approach where I have several queries that I scrape the results, throw all the results into a memory vector store, then similarity search several questions custom made to match the query, then I have the final query that is run through llama3 with a larger context window. This gives me well informed, sourced results. But I really wish I could use an agent or something to ask follow up questions based on the info I got. I can't write theses follow up questions myself because the tool needs to work for any company. Anyone got any ideas? I'm thinking of incorporating an agent, but I'm concerned that it will bloat up an already slow process.
So we are one step closer to the Perplexity 😂 Jokes aside this piece of code is priceless 🎉 I believe that we still couldn't find a way to create a value with AI.
I just noticed that all the models I am trying date themselves to November 2021. Are they all really just the same thing and, is this the way around that restriction. Are there more up to date models?
Hey matt, Can you please post on how to fine-tune any LLM on a CPU. I've been trying to do this but I get into a problem of no CUDA available from bitsandbytes package
I just came across your videos as I'm trying to learn and understand AI and Ollama more. I see the code and that is great but.... where do I actually put it, what all do I actually need to do this and where does it all go? That is my biggest problem that I believe you could help with in this community is that there is a huge knowledge gap when it comes to AI and how to get more out of just installing Ollama, Open-WebUI and just typing in things.
Thanks Matt, I have been using llama with open WebUI. Is this easily setup within this interface (as opposed to running requests through powershell etc)?
thank you for the info. a bit off topic but I recently upgraded my 1070 to a 3060 12gb to play around with local models using ollama. I use openai all the time to help me with coding task. for example reformatting files or created xml from classes. basically things I can do but takes a lot longer. I noticed that the only model that consistently give me what I need is open ai. In some instances the same request to llama 3.2 return a math formula, WEIRD. I have no idea how the model went from xml to math. Which model can i run locally that matches the understanding that open ai does. I am ok paying with openai but sometimes is so slow that I wish I had a good local llm. Thank you
ollama pull mannix/gemma2-9b-simpo:latest ollama pull eramax/nxcode-cq-7b-orpo:q6 ollama pull open-orca-platypus2:13b-q5_0 these are the best ones I've tested so far, try them and see if it helps
I agree with you, for production environments I only trust GPT-4 models. I'm still trying to find a proper usage for local models because they usually ignore instructions and hallucinate like crazy.
Qwen2.5 is usually a pretty good coding model. You can give that a try if you are comfortable with the questionable training set acquisition ethics and potential copyright implications. A commercial offering at least in theory provides here more safeguards legally.
Is Jeffrey Way your son?, because I can watch your videos and not lose focus, not fall asleep and keep the motion going, and actually make me love even more new technology. Thank you for your videos, I'm in, new sub.
Don’t know who that is. But my daughter is 5 so you probably aren’t hearing her. Glad that I can keep you awake. It has been suggested for me to do asmr
Hello. thanks for great tips and materials for using ai. the script you give in the video looks good. I would also like to use web search but I don't know how to use it? where do I put it to run it via ollama? if possible, can you make a video for how to use scripts like this in ollama? Thank you for all the course it is excellent!
I would love to add this to my llama3.2. Alas, I am a newbie and haven't a clue what I am doing. I downloaded the three files on your github link. I installed deno. I Attempted to install them in docker 'docker build . -t main' I looked up how to create the dockerfile. But I am lost. Is it possible for you do up an install for windows instructions including prerequisites?
@@technovangelist I installed deno. When I run deno main.ts I get error: Unsupported lockfile version '4'. Try upgrading Deno or recreating the lockfile at 'D:\ollama\Searchweb\deno.lock'.
I have found two of the problems I was having. 1. Env Var "DENO_FUTURE" needs to be set to "1" for it to recognize Version 4 .lock files. 2. Env Var "SEARCH_URL" needs to be set to "127.0.0.1:4000/search" on my machine. If I pass "127.0.0.1:4000/search?q='some query'&format=json" to my browser, I get a JSON back with the search results. The problem I am left with is when i execute `deno run main.ts` or `deno main.ts` I get the error error: Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'map') .map((result) => result.url) Now I don't program in ts, so the code is not revealing the problem to me. I assume my URL is somehow wrong because other people are not having the issue. The variable "result" is throwing me for a loop, because there appears to be another variable called "results". Can you at least tell me if my SEARCH_URL is incorrect in some way?
Love your videos, man, excellent content! Also, you should definitely join me in the bald brotherhood. Once you go fully bald, you won't go back. It's glorious. 😁 Keep up the great work, sir!
@@technovangelist it's definitely strange at first, and my wife hated it, but it's fantastic now. Try it out. Goes poorly, just let it grow back out again. No worries. Good luck, you've got this!
Nice, well I managed to use voice with Ollama. Low latency... It would be nice to connect news context to a chatting agent. "Tell me about..." ...while (Sipping carrot juice notices)
I have ollama on my Windows desktop and only used it for prompt creating but this is interesting. Can't stand docker and much prefer the good old venv way.
It is a good addition, but I think developers should focus on making, or having more variety of vision models like Llama-3.2-11B-Vision-Instruct, or phi-3.5-vision-instruct, since only kLlava is the only vision model in Ollama, and it wouldn't hurt to look to make it more compatible with hardware from other brands such as AMD, and Intel, and not only with CUDA more options like directml, olive or Rocm , or make a tuto whit Zluda
Hi Matt, can I suggest something, it would help you grow your channel and help us, I love your wisdom, but I can't use it because im a complete noob, if you do a video explaining things with a title like "Upgrade Your AI Using Web Search - The Ollama Course" you should have very indepth video of you doing the actions, kind of like "NetworkChuck" he explains, shows and I believe it will really be amazing, because like this you are catering to 10000 people who understand you, but you could reach 10million easy with actual practical instruction.... as I can see your not a kid, tell me when you read a comic as a kid, did you read it then look at the pictures or the other way around, watch superman kick ass then find out what he was doing... I hope your channel grows to 2million, because poeple like me need your wisdom
@@technovangelist Yes, I know I use it. I'm just saying, I think you could dramatically increase your view count. The easier it is to use things like LLMs and Ollama, the larger the audience will be. BTW, I think your channel is great.
I think it's great too. And I second the GUI idea, though I use the command line for everything. But when I first started learning AI last year, I had 0 command line knowledge and would have loved to have a soft landing travelling from GUI to CLIs. Your video on the RAG GUIs was very interesting.
Limiting the search result to the first five sources means the answers will always be mainstream, politically correct, mostly leftist. I hope there will be a way to perform an honest search and collect the relevant information, from objective sources.
Matt has anyone ever told you that you're a tremendous teacher? A true natural!! Love your videos!
💯%
In my experience, I think the best improvement on this workflow is to give the search engine to the model as a tool instead of sending the question directly to the search engine. That way the model can ask multiple questions and keep track of the conversation, the responses get much better and much more natural. For someone building a new RAG system I would highly recommend this approach, you need a little more knowledge because of the tool use, but it is very simple and there are tutorials in this channel too.
Great note. what are the tutorials and channels for RAG system you mentioned!... pls share, thnx
@@SajjadHussain-d9d "How does function calling with tools really work?" and "Function Calling in Ollama vs OpenAI" in this channel. And also the OpenAI documentation on "Function calling" and the "Chat Completion Object". I can't paste links in youtube.
Also, you can integrate this flow in a while loop that ends when the finish reason is "stop" (at least with the OpenAI API) and it can keep searching until it generates a response or until you force it to finish and generate a response.
@sorpia thanks I'll def have a look at this.
n8n might handle that requirement well.
A quick walkthrough on how to set this up would be awesome! I got as far as downloading deno, making a venv in python, copying the github, cd to the files, and thats about it. Really want to try this out, cool video and idea!
Your prose and pace is perfect for teaching.
I have been playing around with this combo for awhile now and you are so much more helpful than any LLM I have queried for assistance 😂. Thank you
The Ollama course is awesome. I can't wait to build this! Ps. Don't be lazy and forget to add the new videos to the playlist 😂😂
Exponential subscriber growth is surely coming your way. I'm really enjoying the ollama course. Thanks and keep up the good work!
Thanks
Brilliant, now I just need to book some days off to really dive into this, cheers!
Thank you Matt. The information is very useful and understandable in your presentation. Good luck to you! I sincerely hope that you will soon have 1,000,000 subscribers!
After Ben Eater and Program With Gio you can find only few god tier yt learning channels.. and Matt is one of them...
Agreed.
Agreed Ben Eater is legendary. Programming with Gio i havent tried and Matt william is just a super awesome guy, I use his content to stay updated with ai tools that i can use locally
First time here. Instant subscribe.
I already thought how perplexity works under the hood.
Yeap, that video is the answer.
Well done, curious about what you will come up next.
bro i fell to this guy, i just love you dude , pure content
I love these videos... I use your calm and composed voice as ASMR! keep going Matt, you are awesome and you matter.
Wow - consistent and without the fluff ... v nice contenct
Amazing tutorials, these are gold!
I found this video very useful. Keep content like this coming and you'll get your one million for sure.
Thanks for sharing Matt :)
Wow, so clear and succinct, like it and subscribed!
How would the model will know when to search and when to not?
I want that and I WILL build this! Thanks!
Yes. Great work
How does it work with loading pages that need JS to render?
Great content, very excited to follow your journey to the million mark!
man, you are the best!!!
I love this! going straight to the source of information... You have my SUB
hi, great content and humor, i was already one of the millions.. so it didn't count i can't add myself still hope you get there !
Hey Matt thanks for creating amazing content for us all to learn from! I want to ask you for the reason on why you chose Deno over Bun or Node?
Performance is one thing. Plus it compiles to other platforms far more easily. With bun i rarely reliably could write it on my Mac and compile to windows and it would just work. Deno allows this. And vs node? Typescript. It really sucks when I need to build something and I have to use js instead of ts.
thanks Matt. as always
Great info. I was wondering when Ollama would provide the function on website to integrate with LLM for most up to date answers, and there you have it.
Thanks for the content and your clear explanations style. Good job
This is Awesom. Thank you.I am not a programmer anymore and this is useful and helpful to me. I also know I can ask a ai to explain it to me. Can I use this in Ollama and run it through the interface of anything llm and how does this compare to anything llm's web use?(which I can't get to work)
Until last week when I started to play with Ollama I hadn't programmed in more than 15 years, its amazing to see how python hasn't changed much in some aspects.
Integrating functions into ollama still has me a bit confused. I get how to pull pages or links or searches, massage the data and feed it to ollama to get a summary. I just dont understand how you can tell ollama to search and it knows about the function to call. I guess I need to spend a weekend and try to deep dive it.
For now, I tend to use N8N to pull the data for the request and feed it to ollama to formulate a proper response.
wonderfully simple and v powerful. here is my Quick Subs!!!
Could this be done in n8n, too? Also what vector databae to use there for designing an Ai agent that is open source and free and self hostable? From what i understand Supabase is not easily self-hostable?
thank you Matt,
Very informative and new idea for searching to get the precise information from the web.
do something with the output the output is in text format can we get this text format in a formatted output or in a table form or required format.
to do some modify code.
This search is very nice.... but I use Ollama with n8n workflow. And I have also the searxng installed in docker.
Do you think is possible ti replicate it in n8n please ?
Thanks for the video
The web search is an upgrade after a RAG process or with Web search feature the RAG part can be skipped? Thanks for your hard work Matt, and sorry for this stupid question
yes and/or no. it depends. if you need to use web search use that. if you need stuff from local docs, use that. some folks will do rag on a websearch. so it depends.
Hey Matt great video as always. Would suggest suggest to use readerlm to convert the html to markdown for the LLM or is cleaning still the better option?
Readerlm generates better looking results in 4 seconds each and this returns each one in maybe 50 ms if that. I don’t need good looking results. Readerlm is solving a problem we don’t have. And there are other better tools for that still. It’s like using LLMs for ocr. There are better faster tools.
Great content, thanks a lot!!
Hey Matt! Love the content! I'm working on a sales enablement market research paper generation tool.
My approach is different. I take more of a "dredging" approach where I have several queries that I scrape the results, throw all the results into a memory vector store, then similarity search several questions custom made to match the query, then I have the final query that is run through llama3 with a larger context window.
This gives me well informed, sourced results. But I really wish I could use an agent or something to ask follow up questions based on the info I got. I can't write theses follow up questions myself because the tool needs to work for any company.
Anyone got any ideas? I'm thinking of incorporating an agent, but I'm concerned that it will bloat up an already slow process.
So we are one step closer to the Perplexity 😂
Jokes aside this piece of code is priceless 🎉
I believe that we still couldn't find a way to create a value with AI.
There is a great project called perplexica I have a video about.
@@technovangelist nice one 😊
I just noticed that all the models I am trying date themselves to November 2021. Are they all really just the same thing and, is this the way around that restriction.
Are there more up to date models?
Does anyone have any references on public searxng api that returns json format? Thank you
Nice press with code snippets and drawings
Amazing! Thanks.
Great video
Thank you so much!
Thank you Mat ! .. We on our way to 1m subz 🎉 soon soon. ✊🏽
Hey matt, Can you please post on how to fine-tune any LLM on a CPU. I've been trying to do this but I get into a problem of no CUDA available from bitsandbytes package
I just came across your videos as I'm trying to learn and understand AI and Ollama more. I see the code and that is great but.... where do I actually put it, what all do I actually need to do this and where does it all go? That is my biggest problem that I believe you could help with in this community is that there is a huge knowledge gap when it comes to AI and how to get more out of just installing Ollama, Open-WebUI and just typing in things.
It’s a simple application that uses searxing.
Very useful video
Thanks Matt, I have been using llama with open WebUI. Is this easily setup within this interface (as opposed to running requests through powershell etc)?
I don’t remember is open webui has support for search.
Awesome 🎉
So I set up docker, grabbed searcxng, now where am I writing my query?
Thanks, how does it know what is our personal/confidential data to mask it, can we define it.
Subscribed.
Please suggest open source model for local embeddings
nice but nowdays many sites are difficult to scrape without a full browser
thank you for the info. a bit off topic but I recently upgraded my 1070 to a 3060 12gb to play around with local models using ollama. I use openai all the time to help me with coding task. for example reformatting files or created xml from classes. basically things I can do but takes a lot longer. I noticed that the only model that consistently give me what I need is open ai. In some instances the same request to llama 3.2 return a math formula, WEIRD. I have no idea how the model went from xml to math. Which model can i run locally that matches the understanding that open ai does. I am ok paying with openai but sometimes is so slow that I wish I had a good local llm. Thank you
ollama pull mannix/gemma2-9b-simpo:latest
ollama pull eramax/nxcode-cq-7b-orpo:q6
ollama pull open-orca-platypus2:13b-q5_0
these are the best ones I've tested so far, try them and see if it helps
I agree with you, for production environments I only trust GPT-4 models. I'm still trying to find a proper usage for local models because they usually ignore instructions and hallucinate like crazy.
Qwen2.5 is usually a pretty good coding model. You can give that a try if you are comfortable with the questionable training set acquisition ethics and potential copyright implications. A commercial offering at least in theory provides here more safeguards legally.
The new qwen2.5-coder has been performing quite well for me so far. Maybe give it a shot.
@@yacahumax1431 check out the aider leader board. It uses known coding exercises to benchmark both hosted and local models using ollama
Is Jeffrey Way your son?, because I can watch your videos and not lose focus, not fall asleep and keep the motion going, and actually make me love even more new technology. Thank you for your videos, I'm in, new sub.
Don’t know who that is. But my daughter is 5 so you probably aren’t hearing her. Glad that I can keep you awake. It has been suggested for me to do asmr
great and thank you :)
I’ve been doing this for a minute. I have some projects that I’ve integrated SearXNG into. It’s meh but can get the job done.
Perplexity searches the web.
But perplexity doesn’t run ollama. Perplexica does. And so you gain most of the same security and privacy benefits.
@@technovangelist I'll check it out. I have already integrated ollama into Emacs.,
Thank you
any other Python , plug and play, easy to use implementation.library that I can use instead of using Docker image....searching?
Hello. thanks for great tips and materials for using ai. the script you give in the video looks good. I would also like to use web search but I don't know how to use it? where do I put it to run it via ollama? if possible, can you make a video for how to use scripts like this in ollama? Thank you for all the course it is excellent!
This script runs ollama after getting the results
I would love to add this to my llama3.2. Alas, I am a newbie and haven't a clue what I am doing. I downloaded the three files on your github link. I installed deno. I Attempted to install them in docker 'docker build . -t main' I looked up how to create the dockerfile. But I am lost. Is it possible for you do up an install for windows instructions including prerequisites?
This one uses deno so you need that. Or if you prefer python rewrite in that
@@technovangelist I installed deno. When I run deno main.ts I get error: Unsupported lockfile version '4'. Try upgrading Deno or recreating the lockfile at 'D:\ollama\Searchweb\deno.lock'.
deno 1.46.3 (stable, release, x86_64-pc-windows-msvc)
v8 12.9.202.5-rusty
typescript 5.5.2
I have found two of the problems I was having. 1. Env Var "DENO_FUTURE" needs to be set to "1" for it to recognize Version 4 .lock files. 2. Env Var "SEARCH_URL" needs to be set to "127.0.0.1:4000/search" on my machine. If I pass "127.0.0.1:4000/search?q='some query'&format=json" to my browser, I get a JSON back with the search results. The problem I am left with is when i execute `deno run main.ts` or `deno main.ts` I get the error
error: Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'map')
.map((result) => result.url)
Now I don't program in ts, so the code is not revealing the problem to me. I assume my URL is somehow wrong because other people are not having the issue. The variable "result" is throwing me for a loop, because there appears to be another variable called "results". Can you at least tell me if my SEARCH_URL is incorrect in some way?
That’s the previous version of deno. Sorry.
Love your videos, man, excellent content! Also, you should definitely join me in the bald brotherhood. Once you go fully bald, you won't go back. It's glorious. 😁 Keep up the great work, sir!
It’s scary to make that leap
@@technovangelist it's definitely strange at first, and my wife hated it, but it's fantastic now. Try it out. Goes poorly, just let it grow back out again. No worries. Good luck, you've got this!
This is great! I was looking for something similar days ago but man you help me again with your content, please keep this awesome content
how to run ollama on igpu intel?
Perplexica does this with ollama from a web interface
Yup. And I have a video about it. The ollama course teaches the basics of ollama so perplexica is a bit off topic for now
Nice, well I managed to use voice with Ollama. Low latency... It would be nice to connect news context to a chatting agent. "Tell me about..." ...while (Sipping carrot juice notices)
How we can use the code using python?
This is with deno but it should be pretty easy to translate the concepts
@@technovangelist and what about Tavily API, it's a pretty good tool for web search
GreaT STUFF
I have ollama on my Windows desktop and only used it for prompt creating but this is interesting. Can't stand docker and much prefer the good old venv way.
Docker in most cases is just so much more efficient. You really should spend a bit of time learning the basics and use it.
This should be built into ollama..
that would probably be a bit out of scope for Ollama for a while to come.
It is a good addition, but I think developers should focus on making, or having more variety of vision models like Llama-3.2-11B-Vision-Instruct, or phi-3.5-vision-instruct, since only kLlava is the only vision model in Ollama, and it wouldn't hurt to look to make it more compatible with hardware from other brands such as AMD, and Intel, and not only with CUDA more options like directml, olive or Rocm , or make a tuto whit Zluda
Llama3.2 vision models are coming soon. There are a lot of bad implementations out there so getting it right takes time
And it does work with amd and intel arc
Open Web UI has the web search tool out of the box. For lazy folks like me
Hi Matt, can I suggest something, it would help you grow your channel and help us, I love your wisdom, but I can't use it because im a complete noob, if you do a video explaining things with a title like "Upgrade Your AI Using Web Search - The Ollama Course" you should have very indepth video of you doing the actions, kind of like "NetworkChuck" he explains, shows and I believe it will really be amazing, because like this you are catering to 10000 people who understand you, but you could reach 10million easy with actual practical instruction.... as I can see your not a kid, tell me when you read a comic as a kid, did you read it then look at the pictures or the other way around, watch superman kick ass then find out what he was doing... I hope your channel grows to 2million, because poeple like me need your wisdom
Matt, I think moving away from cmd line to GUI is going to get you to a million viewers much faster.
Ollama is a developer tool that runs on the command line with millions of users.
@@technovangelist Yes, I know I use it. I'm just saying, I think you could dramatically increase your view count. The easier it is to use things like LLMs and Ollama, the larger the audience will be. BTW, I think your channel is great.
Gradio is nice for that.
I think it's great too. And I second the GUI idea, though I use the command line for everything. But when I first started learning AI last year, I had 0 command line knowledge and would have loved to have a soft landing travelling from GUI to CLIs. Your video on the RAG GUIs was very interesting.
So Advanced for people like me what about step by step video 🙈
when you make videos can you make them step by step, otherwise, it is too time-consuming.
Limiting the search result to the first five sources means the answers will always be mainstream, politically correct, mostly leftist.
I hope there will be a way to perform an honest search and collect the relevant information, from objective sources.
That’s one of the nice things about searxng. You can control the sources.
Noice
Instructions unclear. Pc imploding