I haven't even gotten 30 seconds into this video and I'm so excited... You're like my fav UA-camr to learn this stuff from, but I'm a big Open WebUI user and I am SO F-ing pumped right now you made this video... This is what I have been waiting for to allow me to connect all of the dots in my own little AI world setup.
I haven't even finished the video and I'm astounded at the rate you can consume new information, master it, script a video, and demo it in a step-by-step accessible way. Truly a unicorn and my personal UA-cam Hero.
00:05 Integrating open web UI for chatting with n8n AI agents 02:11 Self-hosted AI starter kit with various services packaged together. 06:01 Setting up Open WebUI with N8N AI Agents locally 07:56 Access N8N through Local Host Port 5678 for account setup. 11:58 Configuring connections and credentials for different nodes in N8N workflow 14:01 Using Google Drive as an example for ingesting documents and updating the quadrant vector store. 17:46 Customize valves in N8N AI agents for specific integrations 19:34 Open WebUI allows convenient parameter customization for n8n AI agents. 23:07 Open Web UI allows for easy customization and voice chat with N8N AI agents 25:01 Extension of local AI starter kit with Open WebUI for voice chat integration. Crafted by Merlin AI.
LOVE IT. After very quick test including closing the microphone listening window, I tried building a push-to-talk action button. It does not quite work yet, but in further testing, my microphone stops listening while "thinking" and responding, then back to active listening when the response ends.
I was probably one of the few who didn't see this as a missing feature but a chance to integrate my own web UI for chat it originally started as a chat gpt clone and got various features added, now I have my own model to run it on that's dope but great video for everyone else still watching it
@@ColeMedin Not as awesome as your content, thank you for making everything so accessible. I will be looking to contribute to your bolt project. You are awesome bro
The only thing thats kinda missing from being a starterkit, is a fully worked out agent that doesnt just handle a document that only has text in it well, but that will handle a document with pictures, text and code. Ideally spreadsheets too. Also, how to train a model on the collected vector data. Thank you for sharing the journey!
You bet! And I totally agree with you here - developing agents to bake into the starter kit for this kind of thing is certainly one of the ways I want to expand this in the future!
This was such a great video - thank you! It would be great to extend this by enabling file uploads via OpenWebUI. I'm guessing that would just mean detecting whether a file has been uploaded or not and putting it in the vector store.
Nice video Cole ! I hope that it will be possible to add pictures to the input prompt of OpenWebUI soon. In the stack, I would like to have image generation.
Mighty Powerful, Great Video.. Im getting Fetch Failure in the ollama embedding at 5 min, even though the connectivity is successful; can you demonstrate how to batch a document in case im running out of memory.. Thanks & Cheers
Thank you! If your documents are too large I would try splitting them based on paragraphs (maybe like 20 paragraphs at a time) and inserting those one at a time into the vector database.
Cole Medin - I thank you for your efforts in doing and showing us ways to utilize Docker with n8n, ollama, postgre, and qdrant..., however. Google IS NOT 100% LOCAL and offline. I thought this was purely an OFFLINE LOCAL setup. I'm very new to this and so far, yours' has been the BEST. I just got lost when you went to GOOGLE. I need a step by step for local documents WITHOUT GOOGLE. Can you help with that? thanks.Better yet a n8n flow that already has the local input instead of google docs.
You are so welcome! You can really just switch out the Google Drive triggers for the local file trigger to work with your local files! docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.localfiletrigger/
This is GOLD! Thank you so much, Cole! Does anyone is getting |python tags| rather than the actual message from the Tool while triggering the agent via chat?
You bet!! I've had this happen before, generally with smaller models because they hallucinate since there are actually quite a bit of instructions under the hood for RAG in n8n.
so grateful to you for sharing info about N8N and local LLMs. Would be extremely grateful if you could follow up with how to use cloud based tools for embeddings. Super excited to move forward on all this but from what I can tell my desktop machine (Mac mini M2 with 16gb ram) is just not powerful enough to do the embeddings. I went through your incredibly helpful video from a few weeks back on setting up N8N. I got it all set up but my machine kept timing out when it tried to do the embeddings. I WAS able to get it to work doing embeddings with INCREDIBLY short documents, i.e. like text file that was two lines long. But anything longer and it would time out ofter 5 minutes. I'm super new at this but I've seen some discussion online of using cloud based options to do the embeddings? Would be super grateful for a video on that.
It's my pleasure! For cloud based embeddings I would recommend using OpenAI embeddings, you just need an OpenAI API key and it's supported as an embedding option in n8n! Also which embedding model are you using? There are a lot of options and you could always try another smaller one for making it quick locally!
Thanks much @@ColeMedin RE: the model I'm using, I should have mentioned that I'm using nomic-embed-text. I went through your helpful video on Run ALL Your AI locally in Minutes -- video was great and I got the n8n all set up with Ollama-- but, as I said, my Mac mini kep timing out when I tried to set up embeddings for documents more than a few lines long.
Okay yeah sorry I forgot you already mentioned! I would suggest going to the Ollama site and searching through their embedding models, lot of fast options to try there!
Hi Cole. Regarding addition to your project: Add nodered, influxdb, grafana to the container Then input IOT data with an MQQT node in nodered. Then webhook and/or api call from influxdb. That way your RAG is live data. Maybe some observability with grafana because it is awesome. Thanks.
@@ColeMedin It's basically the same thing you did, but instead of the data source being a document, it's an Excel table with columns like product, customer, date and sales price. The goal would be to ask specific information about products, such as total sales in a given period.
HI Cole, awesome, awesome awesome!!! I managed to configure it thanks a lot! when I run it (simple hello), the title of the chat is in Spanish, no big deal of course but you know how to get it to behave like yours, just english? Thanks again!
Thank you Matt! I'm actually really not sure why the title would be in Spanish... is there a setting on your system that sets the default language to Spanish or something like that?
Great video thanks, looking forward to using this on my mac. but i am a bit stuck ran everything as per video and README yet i get and error when n8n tries to start up it seems to be be complaining that the Encryption key has a mismatch. Any ideas on what i have done wrong? I have looked everywhere.
You are welcome! For that error I would delete the credentials and recreate them, it's probably because those are the default credentials when downloading the repo.
Cole what if we have the original AI Starter kit already installed and OpenWeb UI installed separately? Is it possible to do an upgrade so I don't lose everything like models and custom settings in OpenWeb UI? Or should I just delete the old installs and go fresh? Thanks for the effort! Where can I support you?
Fantastic question! You could continue to use your Open WebUI instance and just point it to the N8N endpoints you want to use from the local AI starter kit! You would just use localhost for the URL instead of n8n for the webhooks, assuming Open WebUI is running directly on your machine. Thank you for asking about supporting me! Right now I am building a community behind the scenes which I will be releasing soon. Being a part of that will be the best way to support me! It means a lot! :D
First off, great video set. Thank you for doing these. I will say one thing, these are making me work more than expected. I didn't realize how rusty I am on all these items until I started trying to work my way through them. Still haven't gotten it completely off the ground but working through it. Do you have something created for setting the local file triggers up in N8N instead of using Google?
Thank you very much! And that's partially on me - I am working on making this entire package easier to get up and running and work with. I don't have a video using the local file trigger instead of Google drive but I'm looking to do that in the near future!
@@ColeMedin I need to just knuckle down and figure out how to connect to Google drive and be done with it. I was doing some research and doing a local repository will be an interesting problem to tackle with n8n running in a container.
You are awesome... thanks mate. Please would you be able to show how to integrate LlamaParse in the n8n workflow for parsing complex pdfs. i'm struggling. thanks
You bet Tony!! I haven't used LlamaParse before actually but if I ever give it a shot I'll certainly consider making a video on integrating it with n8n!
Great work man! I am all set up and ready to start ingesting documents!! quick question, in your previous video you touched on the issue with the response adding in the headings. I think you mentioned we can clear that with a prompt directive or something.. thanks!
Thanks man! Yeah those kind of weird responses are typically from smaller models that get confused by the n8n tool calling prompting that happens under the hood. So you can prompt the model to say something like "don't output tool call syntax" or something like that. I know that's pretty general but you can mess around with what gets the responses you want!
@ awesome! I’ll try that. Curious if you’ve adapted the workflow to ingest documents from a local bound share? I would like to do this but I think the mods to the workflow may need a bit of work since it won’t be a on a Google drive. Guess I’m going to need to really learn n8n now. lol
Great tutorial cole! Thanks a lot! Unfortunately, when I run requests through WebUI I get weird answers. Example: Input: Hi! Can you give me some information about dogs? WebUI Output using N8N Pipe: tool.call("dog") Any idea about the culprit?
Thank you! Yeah I've seen this happen before - typically it is because I'm using a smaller LLM that isn't powerful enough to handle the larger prompt for RAG. Which model are you using? Also for Ollama models, it can be helpful to increase the context window size. That's under "Context Length" in the options for the Ollama node in n8n.
in case it helps anyone, if you try to make a model to customize with this as well. It seems that the way it ties the session id and first prompt has a conflict with setting a prompt for the model ,since the first part of the prompt never changes then.
Fantastic video! Now only the next step is missing, how to turn such a local environment into a production one, how to release such a built application into production based on Azure/Google/AWS or any other hosting. Maybe this is a topic for another video, I would be the first customer of such content.
Thank you and yes that is a very needed next step so I will be making more content around that in the future! I do already have a video on deploying this to the cloud though as a starting point! ua-cam.com/video/259KgP3GbdE/v-deo.html But more to come to make it really production ready!
Awesome video as always and keep bringing n8n content please ;-)) next n8n video maybe with agents swarm ? where multiple agents work together to accomplish task and show overall output in order in open web ui like searching through web and getting back response and than summarizing it or extracting values from it and writing in google docs,sheets or can self host only office and maybe try to create excel with that. This will be cool idea to just say anything and in chat and it would perform actual actions ;-))
I was just about to add this as a comment. @ColeMedin you are doing amazing work, keep it up. I am just about to replace my current n8n container with yours 🤣
Thank you very much - I sure will! I love your thoughts here and I'll definitely be making content like this in the near future! Maybe not the very next video but coming up for sure.
Thanks for your great videos Cole. I find them inspirational and I am learning lots!! I like your enthusiastic attitude, it sometimes lift me up if I am feeling down 🙂 There is just one little thing standing between me and total Victory! What you have there to "clean the old vectors" does not seem work with a local file trigger :-( Probably you don't have to make a whole video just for that one node. If you could just somehow share that bit of updated code then that would be fantastic!! and even better if it is in python instead of java (if at all possible) Keep going with your excellent work, you are making a positive difference during this inflection point in humanity.
Do you have a Discord or similar setup for users to talk about your various projects, help each other, and offer advice on how to improve certain functions, report bugs, etc? If not this would be an incredibly important asset for you and your work, I think!
Thank you for the great tutorial, it inspired me a lot. I followed your tutorial, but i have some issues; The AI is hallucinating, even if the result of the Vector Store Tool is ok; So im my case, the vector Store Tool has the correct information, the AI Agent couldn't find any information ... Maybe I have missed a few steps. What is your AI Agent System message? Do you have a Description in the Vector Store Tool? Thanks for the great tutorials and your help :-) Keep on working :-)
I'm glad - thank you! Smaller LLMs will give bad results sometimes even if the right info is given from the RAG part of the workflow - I've noticed that. Which model are you using?
Do you have any setup steps for using clouflare tunnels? I have it running for n8n on a subdomain. How can I do the same for open webui? UPDATE: Nevermind, I've successfully setup the ssh tunnel for open webui
Hello, I am working on a challenge to create a database with just over a thousand monographs for academic consultation in natural language (via LLM). Do you think that this package presented in this video is appropriate for my intention? Can you suggest another package or video?
@@ColeMedin I really enjoyed using Ollama, it's simple and works very well with Open Web UI. However, in some research I did, I read that vLLM has a much better performance than Ollama, especially in requests from multiple users, because it uses batches. I'm waiting for an order for two Nvidia L4s, I intend to use one for production and the other for testing. If necessary, I'll put both to use in production. I'd like to know your opinion, whether it would really be worth using vLLM (integrated with Open Web UI via Open AI API) due to hardware limitations, or whether Ollama and vLLM can be equivalent in terms of performance.
You know I haven't experimented with vLLM before but I've heard similar claims. Honestly I'd say just test with both using the same setup and see if there is a big difference in performance!
I would recommend using Supabase and I'm even planning on adding it to the local AI starter kit in place of Postgres and Qdrant! It's just such an awesome platform
Thank you so much for this tutorial cole! That being said, I do need some help. I am trying to run the “command-r” LLM instead of llama3.1. I changed the Yama file but it seems like docker only likes to pull the llama module, not the command-r. How would you go about running the other LLM? Are there any other dependencies or alterations I need to make? Once again thanks for your help!
You are so welcome! You can run the Ollama pull command for the command-r LLM while the container is running to pull it! You just have to access the terminal for the container (super easy to do in Docker Desktop) and run the Ollama commands like usual.
I need to setup this for personal study group, with university material, but i would like to share this with some friends. How would you go about it hosting this solution on GCP for example? What are the VMs minimun prerequisites to run this smoothly? Sorry if this go to far beyond the local solution, but i would appreciate a lot on how to do this
Fantastic question! It depends a ton on the LLM you want to use. If you want to use smaller models like Llama 3.2 8b, you probably only need a GPU with 8GB of VRAM. Something like a 32b parameter model you would want a GPU like the 3090 with 24GB of VRAM. There are a lot of cloud providers out there that will provide you with a machine with these kind of specs!
Thank you, Cole, for the amazing video! I have a question: when I use Open WebUI with an n8n pipe (Webhook), the workflow runs three times. The first time it uses my chatInput from Open WebUI, but the second and third times it seems to send default messages like "Create a concise, 3-5 word title with an emoji..." and "### Task: Generate 1-3 broad tags...". Could you explain why this happens? Thanks again for your great work!
Almost everything is working, but when I activate the function, it doesn't show up as an AI model, and I'm not sure what you did to make it appear there. It's active and configured, n8n is working and tested, but the function isn't showing up as a model like it does in your setup. I don't know why this is happening.
Same experience with the function. It does not show up in the list of webUI models. I created a new function and pasted the code in to create the function.
@@ColeMedin When I imported the function, even after updating everything, the model with the name N8N Pipe didn't show up. It was imported correctly, but nothing happened. In another video, I saw that it was possible to create a custom model and insert the desired function into this model, which I did. However, the imported function didn't appear in the model's properties. When editing another piece of function code, I noticed that the main class you used as Pipe was called Filter, so the WebUI understood the function, and it started appearing when creating a custom model. But even then, running the model with the customized function, N8N wasn't triggered. So, I understood that there was something wrong with this flow. That's what I tried because simply updating it wasn't working for me at all.
phenomenal. Are all these instructions same when self hosting on cloud with say digital ocean? What GB will you recommend to buy to run this efficiently
Thank you and yes the self hosting instructions remain the same! The hardware depends on what models you want to run with Ollama. Aside from that though you can pretty much use any DigitalOcean droplet to run this for yourself! You'll just need a good GPU to run the LLMs locally.
If there is an update to one of the containers you can stop all at once with Docker compose, pull the updates, and then restart it again without losing anything you've set up in the containers!
Awesome work here Mr. Medin, My question is to use a nextcloud self hosted setup instance, but I have an issue getting past authentication for a local instance of nextcloud and docker, Please assist if possible. Thanks in advance
That's really awesome thank you. I haven't tested it yet as I'm using openwebui but with Live APIs like gpt, gemini and openai compatible named Deepinfra (Together AI also) which I'm using currently (Deepinfra) to use their large models like 405b. before starting this process, what should I change to use the API with the API token within n8n instead of using local models?
You are welcome! Love all the models you are using here! You can change out the model nodes (and embedding nodes as well if you want) in the n8n workflows to use other models like GPT or Claude. They support quite a few options!
Greeaat, you are the best bro! Do you know any source where I can improve my skills in prompt engineering for agents and function calling? I’m really needing to get better at this point
hey man, I'm not an expert in containers, but my intuition is any environment you set up that is connected to the Internet is susceptible to getting hacked, so don't use your ssn as your username or turn off auth, etc on these things. otherwise, super cool vid!
Thanks man and yeah I agree! I was sort of kidding haha just to drive home the point that it is indeed running locally. And technically you can run literally everything offline as long as your N8N workflows aren't going out to the internet for things! If you using a local file trigger instead of Google Drive for example.
Great video. Liked and subscribed. Surprised this vid has 18k views and barely 1k likes??? Nobody tips anymore... So Open-WebUI has a code editor. Is it any good? Could we use the BOLT fork on ur Git pg instead? Lmk, & Keep Iteratin'.
Thank you so much! Honestly 1k likes for 18k views is pretty good across UA-cam so I'm happy! Thanks though haha Honestly I haven't played around the code editor in Open WebUI, but I would certainly like to try it out! The Bolt.new fork is great for iterating on projects initially - I would highly suggest checking it out!
Hi Cole Have been trying to follow this but the function repository on the OpenWebUI website no long works. is there any chance you can make the script available somewhere else? Many Thanks! :)
Shoot you mean it's totally broken? It looks okay to me. But I also have the pipe code here! github.com/coleam00/ai-agents-masterclass/blob/main/local-ai-packaged/n8n_pipe.py
@ hi Cole sorry I forgot to Update that it came back online a few days later! Thanx anyways! On another note… I am looking for instruction on how to install bolt onto the same VPS server as I have the AI Masterclass installed but can’t seem to get it to work. All the videos I have seen are for local installs only. Any help much appreciated! ;) Merry Christmas to you all!
Super video 👍 I have the silicon Mac version with local Ollama and without open Webui up and running. Will an update install Webui and not interfere with local ollama? thx a lot
Hey Cole. For.additional integrations to the ai took kit: add nodered, influx and grafana to the docker container. Add data to influx via nodered with an mqqt node or other then webhook to influx API. That would be un believable as your RAG could be live data. Many, many possibilities with some visualization in grafana just for.more capability. Nobody's got such a system.
Assuming the amount of "RAG data" you can give the model is limited by the context window, I'm sceptical how powerful RAG is, unless context windows increase in size significantly. Beyond Gemini Pro even.
It definitely depends a lot on the use case and what you are trying to pick out of your data! If you want to use RAG to summarize entire documents, then yes, the context window will be an issue. But if you want to pick out a small piece of info from a bunch of documents, that is where RAG rocks.
At first thanks for a great tutorial. I am running this on a NUC i5 16GB and it is running really slowwww. E.g. Any request will take about 1minute to finish. Any ideas on what it the issue? Using Open WebUI and Ollama3.1 I get the response in about 4-5 sec. Testing llama3.1 through terminal provides same response (4-5 sec). But running through n8n it take too long. Any hints would be great!
You are so welcome! I'm guessing the speed difference is because there is a lot of n8n prompting under the hood to make the tool calling/RAG work. So your local LLMs are having to deal with a lot more tokens for the input.
24:16 is the voice model test, kinda feels like a text to speech software that was add on top, not really a voice model lol, not like chatGPT's short and human like answers, needs work for sure, but could be useful to blind people.
Hey Cole love it , I am not a developer but i am researching on a way to serve the agent as an Openai Endpoint this will make the integration easier , I am not sure it's just an idea , you think it's doable ?
@ColeMedin sure! In my head I'm thinking of a web form that a user could upload the document and enter any relevant meta data to, which the submit action would trigger the addition to the vector database, and probably store the file for reference later if necessary.
Can you easily take the abacus ai agent it builds and convert it into the 8n8 pipeline you made? Just seems so easy to make it on there but then I can't get it to work on the 8n8 workflow then transfer over to my app
You should be able to with the Abacus API! I haven't done it yet though but that would certainly make a good video. Is there a specific error you are getting?
Great question! Yes you can easily host this somewhere with any platform like Render, DigitalOcean, AWS, etc. I have a tutorial on my channel for self hosting apps like this (using the local AI starter kit as an example actually) that will basically apply to any platform you want to use: ua-cam.com/video/259KgP3GbdE/v-deo.html
Love whats being offered but when trying to download I'm just totally lost on the whole Postgres login password details part. any help would be amazing!
Quick question, I'm playing around with openWebUI and it seems to have it's own volume in docker with a chroma db. If so, what's the purpose of an additional Qdrant and Postgres DB?
@ OK. If I just install openWebUI it already does rag and documents upload without doing the n8n work flow. I checked the volume and it seemed to be being stored in a chroma vector database. I was asking. If I just want to set up a local llm. Is installing openWebUi all I really need to do. Or are there benefits to using Qdrant etc
@@martinhltr The Bolt.new fork is going strong! Actually more updates on that tomorrow! I'll still be posting other content like this each week but it doesn't mean the Bolt.new fork is going anywhere!
Great question! I love Supabase as well and for RAG it will do just as well as Qdrant until you start to get millions of records. That's when a dedicated vector DB like Qdrant or Pinecone starts to perform better than using PGVector with Supabase. But until then, I honestly would prefer Supabase because then you also have the platform for auth and SQL as well!
been building my own framework for ollama agents from scratch all with chatGPT since i'm not a programmer lol. I need to figure out how to do this but I just don't get it. Where do I even start lmao.
That's a big task especially for a non-programmer. Good on you for diving into it! I would suggest using libraries/tools like LangChain or LlamaIndex to help you get started!
Sorry to bother you again 😅. But is there a way to integrate say coolify for automated deployment or do you have another one. Sorry for the constant questioning.
Thanks Cole! I ended up doing the same you did, but I think this solution, as state also by others online, has a MAJOR flaw that makes this unusable in production: there's no way for the pipe to return the DOCUMENTS chunks and present them in a nice way, so that the user can click and see the citations from the retrieved documents. Is there something similar?
You are welcome! I totally agree with you here - what I have built is a great starting point and something for an individual to use locally, but it would have to be extended in ways like what you mentioned to be production ready. Citing sources is actually something that a basic RAG agent in N8N can't do, so you would have to set up something more custom than using the default N8N vector store retrieval tool. I'm likely going to make a tutorial on that in the future!
Hey Cole, after I set everything up and got it to work with default values as shown in the video, the response I get from the pipe is something similar to this: tool.call("what is up") There is no chat, but only these responses. Is there something I am missing? Also, when clicking the "Call" button, I just get a message saying, "Permission denied using media devices". There is no popup to activate the permissions.
This is an awesome tutorial but I am running into an issue with the localhost 5678 link. It just says "it works!" when I click on it instead of brining up n8n. When I hit the link under it for the workflow it gives me a URL not found error. How do I fix this?
@@ColeMedin I figured it out. I was running an apache server on that port... very stupid haha. That being said, I have encountered another error. Whenever I query the N8N pip i get an error such as this import json Define a JSON object json_data = { "alternatives": [ {"name": "Saturday School", "description": "Serving suspension on a Saturday"}, # Add your alternatives here...] } Print the first alternative from the JSON data print(json_data['alternatives'][0]['name']) The pipe will respond accordingly to basic text such as "hello" but whenever I ask it about the database it spits this, or something similar out. Any idea?
Glad you figured it out! It seems like this output is more because of a smaller LLM that is confused by the n8n prompting under the hood for tool calling. I'd try using a more powerful LLM if you could! I've seen this kind of thing happen a lot when using 7b parameter models and lower.
@ColeMedin I am working on a way to save my chat on say chatgpt and filter the prompt and responses then fed that to qdrant for learning. This will eventually be able to do this for most of the ai chat platforms. Happy for suggestions.
Hello, Greate video. I tried to follow everything but i think i missed a step. When i try to chat i get following response to a hello. tool.call(query="define hello") Can someone tell me what i missed ? Thanks!
Thank you! Which model are you using? I've noticed that weaker LLMs like Llama 3.1 8b will do this sometimes because it doesn't handle the n8n prompting under the hood for RAG very well. I'd try using a more powerful model if that is an option for you!
@@ColeMedin Thank a lot. Llama 3.2 helped. But the thing is that i use it with M1 Mac and tryed with local running ollama instead of running in the docker but still its super slow, answering after 40-50 seconds for a simple hello. :(
Do you have any of these that use the local file system? I don't want to go through the trouble of creating a Google cloud account to be able to use Google drive. I've tried but can't seem to get it to actually work.
I think the issue is oauth won't work without a SSL enabled domain so I don't think any local instance will work with anything that needs oauth e.g. google
Actually, for those that need help on answering the question why test mode is only allowing one click of query , the webhook tab contain two mode, the test mode url is for testing and production mode url is the one you should copy to the web ui valves !
I haven't even gotten 30 seconds into this video and I'm so excited... You're like my fav UA-camr to learn this stuff from, but I'm a big Open WebUI user and I am SO F-ing pumped right now you made this video... This is what I have been waiting for to allow me to connect all of the dots in my own little AI world setup.
@@lofigamervibes I'm so glad to be able to give you what you've been looking for! It's my pleasure and thank you so much for the kind words! 😁
I haven't even finished the video and I'm astounded at the rate you can consume new information, master it, script a video, and demo it in a step-by-step accessible way. Truly a unicorn and my personal UA-cam Hero.
Thank you so much Cameron, that means a lot to me! :D
00:05 Integrating open web UI for chatting with n8n AI agents
02:11 Self-hosted AI starter kit with various services packaged together.
06:01 Setting up Open WebUI with N8N AI Agents locally
07:56 Access N8N through Local Host Port 5678 for account setup.
11:58 Configuring connections and credentials for different nodes in N8N workflow
14:01 Using Google Drive as an example for ingesting documents and updating the quadrant vector store.
17:46 Customize valves in N8N AI agents for specific integrations
19:34 Open WebUI allows convenient parameter customization for n8n AI agents.
23:07 Open Web UI allows for easy customization and voice chat with N8N AI agents
25:01 Extension of local AI starter kit with Open WebUI for voice chat integration.
Crafted by Merlin AI.
I love your work flow and how you think about the processes and how to integrate them, I wan to thank you for pulling all this together.
You bet man!!
LOVE IT. After very quick test including closing the microphone listening window, I tried building a push-to-talk action button. It does not quite work yet, but in further testing, my microphone stops listening while "thinking" and responding, then back to active listening when the response ends.
Yo, you're quick to address the ollama issue. props.
Dude this is so fire, this is exactly what I wanted you are a godsend.
Thanks man - I'm glad! 🔥
Thank you for creating such an outstanding tutorial! I truly appreciate your effort in making this video.
You are so welcome - thank you for the kind words!
What incredible video! Congratulations. This is the best video I saw on youtube about it
Thank you very much man!
Cole! You are the man! Thanks for this awesome integration!
You are so welcome! Thank you!
I was probably one of the few who didn't see this as a missing feature but a chance to integrate my own web UI for chat it originally started as a chat gpt clone and got various features added, now I have my own model to run it on that's dope but great video for everyone else still watching it
That's awesome!
@@ColeMedin Not as awesome as your content, thank you for making everything so accessible. I will be looking to contribute to your bolt project. You are awesome bro
Thank you so much! I look forward to seeing your contributions man!
Wow , amazing and complete free content ! I Think the next step is to implement the vision of llama 3.2 in this workflow!
Thank you very much! Yeah I agree!
The only thing thats kinda missing from being a starterkit, is a fully worked out agent that doesnt just handle a document that only has text in it well, but that will handle a document with pictures, text and code. Ideally spreadsheets too. Also, how to train a model on the collected vector data. Thank you for sharing the journey!
You bet! And I totally agree with you here - developing agents to bake into the starter kit for this kind of thing is certainly one of the ways I want to expand this in the future!
Hands down you are a godsend. Thank you so much for this video. I was spinning my wheels as of late and now I can let off the brake 🚘🚀
You are so welcome! I'm glad you're moving fast now, that's awesome!
Dude! You deserve so much more subscribers!
Thank you man, that means a lot to me! :D
This was such a great video - thank you! It would be great to extend this by enabling file uploads via OpenWebUI. I'm guessing that would just mean detecting whether a file has been uploaded or not and putting it in the vector store.
It would be awesome if you replaced vanilla Postgres with SupaBase in the docker file!
Thank you so much for the support Marc! I agree that's a smart move, and I am actually planning on doing that in the future!
@ColeMedin awesome! I'm anxiously looking forward to it!
Really thank you for your awesome work! 🎉
@@k3nning_w3st You are so welcome!!
Excellent video. You read my mind. ❤
@@subasc I'm glad! Thank you very much!
Another excellent video "FORWARD SLASH, FORWARD SLASH" // Not Back \ happy days
Nice video Cole ! I hope that it will be possible to add pictures to the input prompt of OpenWebUI soon. In the stack, I would like to have image generation.
Thank you and yes that is one of the highest priority features to be added right now!
Mighty Powerful, Great Video.. Im getting Fetch Failure in the ollama embedding at 5 min, even though the connectivity is successful; can you demonstrate how to batch a document in case im running out of memory.. Thanks & Cheers
Thank you! If your documents are too large I would try splitting them based on paragraphs (maybe like 20 paragraphs at a time) and inserting those one at a time into the vector database.
Cole Medin - I thank you for your efforts in doing and showing us ways to utilize Docker with n8n, ollama, postgre, and qdrant..., however. Google IS NOT 100% LOCAL and offline. I thought this was purely an OFFLINE LOCAL setup. I'm very new to this and so far, yours' has been the BEST. I just got lost when you went to GOOGLE. I need a step by step for local documents WITHOUT GOOGLE. Can you help with that? thanks.Better yet a n8n flow that already has the local input instead of google docs.
You are welcome and I hear you on this! I would look into using the "local file trigger" with n8n!
Thank you so much for this. I'm very new to this setup, can you guide us on how to work with a local folder with say my Obsidian notes?
You are so welcome! You can really just switch out the Google Drive triggers for the local file trigger to work with your local files!
docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.localfiletrigger/
i would like to know this as well
This is GOLD! Thank you so much, Cole!
Does anyone is getting |python tags| rather than the actual message from the Tool while triggering the agent via chat?
You bet!!
I've had this happen before, generally with smaller models because they hallucinate since there are actually quite a bit of instructions under the hood for RAG in n8n.
good stuff sir, next video can have something for RAG to work with SQL command generation and it tries to reprocess if SQL command got field errors
Thank you Mike and great suggestion!
This is brilliant! Just brilliant! ❤
Thank you very much!
so grateful to you for sharing info about N8N and local LLMs. Would be extremely grateful if you could follow up with how to use cloud based tools for embeddings.
Super excited to move forward on all this but from what I can tell my desktop machine (Mac mini M2 with 16gb ram) is just not powerful enough to do the embeddings.
I went through your incredibly helpful video from a few weeks back on setting up N8N. I got it all set up but my machine kept timing out when it tried to do the embeddings. I WAS able to get it to work doing embeddings with INCREDIBLY short documents, i.e. like text file that was two lines long. But anything longer and it would time out ofter 5 minutes.
I'm super new at this but I've seen some discussion online of using cloud based options to do the embeddings? Would be super grateful for a video on that.
It's my pleasure! For cloud based embeddings I would recommend using OpenAI embeddings, you just need an OpenAI API key and it's supported as an embedding option in n8n!
Also which embedding model are you using? There are a lot of options and you could always try another smaller one for making it quick locally!
Thanks much @@ColeMedin RE: the model I'm using, I should have mentioned that I'm using nomic-embed-text. I went through your helpful video on Run ALL Your AI locally in Minutes -- video was great and I got the n8n all set up with Ollama-- but, as I said, my Mac mini kep timing out when I tried to set up embeddings for documents more than a few lines long.
Okay yeah sorry I forgot you already mentioned! I would suggest going to the Ollama site and searching through their embedding models, lot of fast options to try there!
Hi Cole. Regarding addition to your project: Add nodered, influxdb, grafana to the container
Then input IOT data with an MQQT node in nodered. Then webhook and/or api call from influxdb. That way your RAG is live data. Maybe some observability with grafana because it is awesome. Thanks.
This is a fantastic idea - thank you Jeffrey!!
Love your content Cole, any chance you'd do something to show how to put this in production on a Railway or AWS or something?
I do cover deploying this to the cloud here!
ua-cam.com/video/259KgP3GbdE/v-deo.html
Are you able to add exo-explore/exo as a scalable service in the docker compose so we can spread the llama inference across multiple compute nodes?
@@AndyHuangCA This is a fantastic suggestion and a great next step to extend this setup - thank you! I will look into this!
Hi, Cole Medin, thank you for the awesome work. Could you do a demo showing how an LLM can answer questions about a sales database for example?
Thank you! And I appreciate the suggestion! Could you elaborate a bit more on what this use case would look like in your mind?
@@ColeMedin It's basically the same thing you did, but instead of the data source being a document, it's an Excel table with columns like product, customer, date and sales price. The goal would be to ask specific information about products, such as total sales in a given period.
Oh yeah, I'll actually be making a video on this in the near future with CSV/spreadsheet RAG!
can you add to this docker the stable diffusion connected to open-webui
Thank you :-) please tell me how do I get the' Documents' and 'Prompt' Tabs on the left below Workspace
You are welcome! I'm not sure what tabs you are referring to, is this within Open WebUI? I haven't used those!
Great video. Can you add searxng to allow anonymous web search from openwebui?
@@joepalovick1915 Thank you and I appreciate the suggestion! I'm going to create a list of improvements to make and add this to it!
Can you upload or create a video showing how to replace the Google Drive workflow to local files for truly local?
I am definitely considering making a video on that!
HI Cole, awesome, awesome awesome!!! I managed to configure it thanks a lot! when I run it (simple hello), the title of the chat is in Spanish, no big deal of course but you know how to get it to behave like yours, just english? Thanks again!
Thank you Matt! I'm actually really not sure why the title would be in Spanish... is there a setting on your system that sets the default language to Spanish or something like that?
Great video thanks, looking forward to using this on my mac. but i am a bit stuck ran everything as per video and README yet i get and error when n8n tries to start up it seems to be be complaining that the Encryption key has a mismatch. Any ideas on what i have done wrong? I have looked everywhere.
You are welcome! For that error I would delete the credentials and recreate them, it's probably because those are the default credentials when downloading the repo.
@@ColeMedinreally appreciate your response.
Have not had a chance to try yet need to work out where the creds are stored so I can delete them.
Cole what if we have the original AI Starter kit already installed and OpenWeb UI installed separately? Is it possible to do an upgrade so I don't lose everything like models and custom settings in OpenWeb UI? Or should I just delete the old installs and go fresh? Thanks for the effort! Where can I support you?
Fantastic question! You could continue to use your Open WebUI instance and just point it to the N8N endpoints you want to use from the local AI starter kit! You would just use localhost for the URL instead of n8n for the webhooks, assuming Open WebUI is running directly on your machine.
Thank you for asking about supporting me! Right now I am building a community behind the scenes which I will be releasing soon. Being a part of that will be the best way to support me! It means a lot! :D
@@ColeMedin Awesome, thank you for the response! Well as soon as you get the community finished I’ll be first in line to sign up!
You bet! I appreciate it!
First off, great video set. Thank you for doing these. I will say one thing, these are making me work more than expected. I didn't realize how rusty I am on all these items until I started trying to work my way through them. Still haven't gotten it completely off the ground but working through it. Do you have something created for setting the local file triggers up in N8N instead of using Google?
Thank you very much! And that's partially on me - I am working on making this entire package easier to get up and running and work with. I don't have a video using the local file trigger instead of Google drive but I'm looking to do that in the near future!
@@ColeMedin I need to just knuckle down and figure out how to connect to Google drive and be done with it. I was doing some research and doing a local repository will be an interesting problem to tackle with n8n running in a container.
You are awesome... thanks mate.
Please would you be able to show how to integrate LlamaParse in the n8n workflow for parsing complex pdfs. i'm struggling. thanks
You bet Tony!!
I haven't used LlamaParse before actually but if I ever give it a shot I'll certainly consider making a video on integrating it with n8n!
thanks mate.... much appreciated
You bet!
Great work man! I am all set up and ready to start ingesting documents!! quick question, in your previous video you touched on the issue with the response adding in the headings. I think you mentioned we can clear that with a prompt directive or something.. thanks!
Thanks man! Yeah those kind of weird responses are typically from smaller models that get confused by the n8n tool calling prompting that happens under the hood. So you can prompt the model to say something like "don't output tool call syntax" or something like that. I know that's pretty general but you can mess around with what gets the responses you want!
@ awesome! I’ll try that. Curious if you’ve adapted the workflow to ingest documents from a local bound share? I would like to do this but I think the mods to the workflow may need a bit of work since it won’t be a on a Google drive.
Guess I’m going to need to really learn n8n now. lol
I haven't yet but I know there are local file triggers to help you with that!
@@ColeMedin yeah I got local file trigger working, but I’m not sure it can ingest .pdf documents.
Great tutorial cole! Thanks a lot!
Unfortunately, when I run requests through WebUI I get weird answers.
Example:
Input: Hi! Can you give me some information about dogs?
WebUI Output using N8N Pipe: tool.call("dog")
Any idea about the culprit?
Thank you!
Yeah I've seen this happen before - typically it is because I'm using a smaller LLM that isn't powerful enough to handle the larger prompt for RAG. Which model are you using?
Also for Ollama models, it can be helpful to increase the context window size. That's under "Context Length" in the options for the Ollama node in n8n.
in case it helps anyone, if you try to make a model to customize with this as well. It seems that the way it ties the session id and first prompt has a conflict with setting a prompt for the model ,since the first part of the prompt never changes then.
Yes that is true, thank you for bringing this up!
Fantastic video! Now only the next step is missing, how to turn such a local environment into a production one, how to release such a built application into production based on Azure/Google/AWS or any other hosting. Maybe this is a topic for another video, I would be the first customer of such content.
Thank you and yes that is a very needed next step so I will be making more content around that in the future! I do already have a video on deploying this to the cloud though as a starting point!
ua-cam.com/video/259KgP3GbdE/v-deo.html
But more to come to make it really production ready!
Awesome video as always and keep bringing n8n content please ;-)) next n8n video maybe with agents swarm ? where multiple agents work together to accomplish task and show overall output in order in open web ui like searching through web and getting back response and than summarizing it or extracting values from it and writing in google docs,sheets or can self host only office and maybe try to create excel with that.
This will be cool idea to just say anything and in chat and it would perform actual actions ;-))
I was just about to add this as a comment. @ColeMedin you are doing amazing work, keep it up. I am just about to replace my current n8n container with yours 🤣
Thank you very much - I sure will! I love your thoughts here and I'll definitely be making content like this in the near future! Maybe not the very next video but coming up for sure.
Thanks for your great videos Cole. I find them inspirational and I am learning lots!!
I like your enthusiastic attitude, it sometimes lift me up if I am feeling down 🙂
There is just one little thing standing between me and total Victory!
What you have there to "clean the old vectors" does not seem work with a local file trigger :-(
Probably you don't have to make a whole video just for that one node.
If you could just somehow share that bit of updated code then that would be fantastic!! and even better if it is in python instead of java (if at all possible)
Keep going with your excellent work, you are making a positive difference during this inflection point in humanity.
Thank you so much!! Let me look into that!
Do you have a Discord or similar setup for users to talk about your various projects, help each other, and offer advice on how to improve certain functions, report bugs, etc? If not this would be an incredibly important asset for you and your work, I think!
Thank you for mentioning this! I don't yet but I am actually building up a community platform behind the scenes right now that will be released soon!
OWUI lists integration with LMStudio - would be nice to see this included in the stack (or is there too much of an overlap?)
Yeah I'd say they are super similar. But maybe there is a place for it!
Thank you for the great tutorial, it inspired me a lot.
I followed your tutorial, but i have some issues;
The AI is hallucinating, even if the result of the Vector Store Tool is ok; So im my case, the vector Store Tool has the correct information, the AI Agent couldn't find any information ...
Maybe I have missed a few steps.
What is your AI Agent System message?
Do you have a Description in the Vector Store Tool?
Thanks for the great tutorials and your help :-)
Keep on working :-)
I'm glad - thank you! Smaller LLMs will give bad results sometimes even if the right info is given from the RAG part of the workflow - I've noticed that. Which model are you using?
@@ColeMedin ah, ok. I have used llama3.1 8B and llama 3.2 3B. I'll try it with ChatGPT ...
Do you have any setup steps for using clouflare tunnels? I have it running for n8n on a subdomain. How can I do the same for open webui?
UPDATE: Nevermind, I've successfully setup the ssh tunnel for open webui
Hello, I am working on a challenge to create a database with just over a thousand monographs for academic consultation in natural language (via LLM). Do you think that this package presented in this video is appropriate for my intention? Can you suggest another package or video?
Hey Jose, yeah this package would be great for this!
@@ColeMedin I really enjoyed using Ollama, it's simple and works very well with Open Web UI. However, in some research I did, I read that vLLM has a much better performance than Ollama, especially in requests from multiple users, because it uses batches. I'm waiting for an order for two Nvidia L4s, I intend to use one for production and the other for testing. If necessary, I'll put both to use in production. I'd like to know your opinion, whether it would really be worth using vLLM (integrated with Open Web UI via Open AI API) due to hardware limitations, or whether Ollama and vLLM can be equivalent in terms of performance.
You know I haven't experimented with vLLM before but I've heard similar claims. Honestly I'd say just test with both using the same setup and see if there is a big difference in performance!
Hi Cole, would you suggest me to use supabase or directly postgresql for vector database or is it better to use Qdrant?
I would recommend using Supabase and I'm even planning on adding it to the local AI starter kit in place of Postgres and Qdrant! It's just such an awesome platform
Thank you Cole! 😊
You bet!! :D
Thank you so much for this tutorial cole! That being said, I do need some help. I am trying to run the “command-r” LLM instead of llama3.1. I changed the Yama file but it seems like docker only likes to pull the llama module, not the command-r. How would you go about running the other LLM? Are there any other dependencies or alterations I need to make? Once again thanks for your help!
You are so welcome! You can run the Ollama pull command for the command-r LLM while the container is running to pull it! You just have to access the terminal for the container (super easy to do in Docker Desktop) and run the Ollama commands like usual.
This is amazing! Do you plan on adding the ability to add multiple agents?
This is awesome! 🎉🎉 can you add langfuse to the stack please
Thank you! Langfuse is one of the next additions I am considering!
I need to setup this for personal study group, with university material, but i would like to share this with some friends. How would you go about it hosting this solution on GCP for example? What are the VMs minimun prerequisites to run this smoothly?
Sorry if this go to far beyond the local solution, but i would appreciate a lot on how to do this
Fantastic question! It depends a ton on the LLM you want to use. If you want to use smaller models like Llama 3.2 8b, you probably only need a GPU with 8GB of VRAM. Something like a 32b parameter model you would want a GPU like the 3090 with 24GB of VRAM. There are a lot of cloud providers out there that will provide you with a machine with these kind of specs!
Thank you, Cole, for the amazing video! I have a question: when I use Open WebUI with an n8n pipe (Webhook), the workflow runs three times. The first time it uses my chatInput from Open WebUI, but the second and third times it seems to send default messages like "Create a concise, 3-5 word title with an emoji..." and "### Task:
Generate 1-3 broad tags...".
Could you explain why this happens?
Thanks again for your great work!
Ops, I see an option within Open WebUI and everything works perfect! Thanks a lot
You bet, thank you! Glad you figured it out!
Almost everything is working, but when I activate the function, it doesn't show up as an AI model, and I'm not sure what you did to make it appear there. It's active and configured, n8n is working and tested, but the function isn't showing up as a model like it does in your setup. I don't know why this is happening.
Same experience with the function. It does not show up in the list of webUI models. I created a new function and pasted the code in to create the function.
Yeah I've had this happen before but all I had to do was refresh the page and it showed up!
@@ColeMedin The name off class, i change to filter..but not start n8n job.
Sorry could you clarify?
@@ColeMedin When I imported the function, even after updating everything, the model with the name N8N Pipe didn't show up. It was imported correctly, but nothing happened. In another video, I saw that it was possible to create a custom model and insert the desired function into this model, which I did. However, the imported function didn't appear in the model's properties. When editing another piece of function code, I noticed that the main class you used as Pipe was called Filter, so the WebUI understood the function, and it started appearing when creating a custom model. But even then, running the model with the customized function, N8N wasn't triggered. So, I understood that there was something wrong with this flow. That's what I tried because simply updating it wasn't working for me at all.
phenomenal. Are all these instructions same when self hosting on cloud with say digital ocean? What GB will you recommend to buy to run this efficiently
Thank you and yes the self hosting instructions remain the same! The hardware depends on what models you want to run with Ollama. Aside from that though you can pretty much use any DigitalOcean droplet to run this for yourself! You'll just need a good GPU to run the LLMs locally.
@@ColeMedin Thank you Cole! You really are the absolute best
You bet - thank you!!
What would be the best way to keep these separate projects up to date within their docker containers?
If there is an update to one of the containers you can stop all at once with Docker compose, pull the updates, and then restart it again without losing anything you've set up in the containers!
Awesome work here Mr. Medin, My question is to use a nextcloud self hosted setup instance, but I have an issue getting past authentication for a local instance of nextcloud and docker, Please assist if possible. Thanks in advance
Thank you! What is the error you are seeing?
That's really awesome thank you. I haven't tested it yet as I'm using openwebui but with Live APIs like gpt, gemini and openai compatible named Deepinfra (Together AI also) which I'm using currently (Deepinfra) to use their large models like 405b. before starting this process, what should I change to use the API with the API token within n8n instead of using local models?
You are welcome! Love all the models you are using here! You can change out the model nodes (and embedding nodes as well if you want) in the n8n workflows to use other models like GPT or Claude. They support quite a few options!
@@ColeMedin Thanks 🙏 I will try yo modify it to make it work. ❤️
You bet!
Greeaat, you are the best bro! Do you know any source where I can improve my skills in prompt engineering for agents and function calling? I’m really needing to get better at this point
Thank you Arthur! Here is my favorite resource for learning prompt engineering in a really concise way:
www.promptingguide.ai/
Damn dude, thank you so much!
@@orthodox_gentleman Haha you bet man!
You are a gem bro
Thank you Ramo!!
Can this be setup to access files from within my network instead of Google drive? Which nodes should I use in n8n?
hey man, I'm not an expert in containers, but my intuition is any environment you set up that is connected to the Internet is susceptible to getting hacked, so don't use your ssn as your username or turn off auth, etc on these things. otherwise, super cool vid!
Thanks man and yeah I agree! I was sort of kidding haha just to drive home the point that it is indeed running locally. And technically you can run literally everything offline as long as your N8N workflows aren't going out to the internet for things! If you using a local file trigger instead of Google Drive for example.
Great video. Liked and subscribed. Surprised this vid has 18k views and barely 1k likes??? Nobody tips anymore... So Open-WebUI has a code editor. Is it any good? Could we use the BOLT fork on ur Git pg instead? Lmk, & Keep Iteratin'.
Thank you so much! Honestly 1k likes for 18k views is pretty good across UA-cam so I'm happy! Thanks though haha
Honestly I haven't played around the code editor in Open WebUI, but I would certainly like to try it out! The Bolt.new fork is great for iterating on projects initially - I would highly suggest checking it out!
Hi.
Is it possible to add upload file into you internal knowladge and chat with that document and also deploy n8n with UI in cloud
I will be making a video on these topics in the future! It is certainly possible!
Is there a way to also include a web inerface to Postgres like adding Pgadmin? Otherwise awesome kit!
Yeah anything open source you could add in another container to this! I might be doing this soon actually!
Hi Cole Have been trying to follow this but the function repository on the OpenWebUI website no long works. is there any chance you can make the script available somewhere else? Many Thanks! :)
Shoot you mean it's totally broken? It looks okay to me.
But I also have the pipe code here!
github.com/coleam00/ai-agents-masterclass/blob/main/local-ai-packaged/n8n_pipe.py
@ hi Cole sorry I forgot to
Update that it came back online a few days later! Thanx anyways! On another note… I am looking for instruction on how to install bolt onto the same VPS server as I have the AI Masterclass installed but can’t seem to get it to work. All the videos I have seen are for local installs only. Any help much appreciated! ;)
Merry Christmas to you all!
I'd recommend installing bolt.diy with Cloudflare actually! It's the recommended approach.
thinktank.ottomator.ai/t/deploying-bolt-diy-with-cloudflare-pages-the-easy-way/2403
@ Great Stuff Cole! Thanx a million! ;)
Super video 👍 I have the silicon Mac version with local Ollama and without open Webui up and running.
Will an update install Webui and not interfere with local ollama? thx a lot
Hey Cole. For.additional integrations to the ai took kit: add nodered, influx and grafana to the docker container. Add data to influx via nodered with an mqqt node or other then webhook to influx API. That would be un believable as your RAG could be live data. Many, many possibilities with some visualization in grafana just for.more capability. Nobody's got such a system.
@@jeffreymoore1431 Boy that is a really great idea (ok I use all of it :) yes live data would be killer. great, great, great 👍
@@jeffreymoore1431 cool, super idea👍
Thank you and great question! This all runs in containers so it won't interfere with what you have running already!
Fantastic idea - thanks Jeffrey!
Cole! Thanks! 🤩
You bet Fredrik!!
Assuming the amount of "RAG data" you can give the model is limited by the context window, I'm sceptical how powerful RAG is, unless context windows increase in size significantly. Beyond Gemini Pro even.
It definitely depends a lot on the use case and what you are trying to pick out of your data! If you want to use RAG to summarize entire documents, then yes, the context window will be an issue. But if you want to pick out a small piece of info from a bunch of documents, that is where RAG rocks.
This is a freakin' awesome project!!!!
I'm having issues getting my NVIDIA GeForce RTX 2080 Ti to be used. Any guidance?
Thank you! Ollama should use your GPU by default. Are you saying it isn't and the LLMs are running on your CPU?
This is awesome!
Thank you very much!
At first thanks for a great tutorial. I am running this on a NUC i5 16GB and it is running really slowwww. E.g. Any request will take about 1minute to finish. Any ideas on what it the issue? Using Open WebUI and Ollama3.1 I get the response in about 4-5 sec. Testing llama3.1 through terminal provides same response (4-5 sec). But running through n8n it take too long. Any hints would be great!
You are so welcome! I'm guessing the speed difference is because there is a lot of n8n prompting under the hood to make the tool calling/RAG work. So your local LLMs are having to deal with a lot more tokens for the input.
24:16 is the voice model test, kinda feels like a text to speech software that was add on top, not really a voice model lol, not like chatGPT's short and human like answers, needs work for sure, but could be useful to blind people.
Yeah it's meant to be more of an example at this point, I know the Open WebUI team is working on improving it though!
Hey Cole love it , I am not a developer but i am researching on a way to serve the agent as an Openai Endpoint this will make the integration easier , I am not sure it's just an idea , you think it's doable ?
You mentioned using local files for input, but how about using a webhook to ingest the RAG data?
Could you expand more on what you mean by this? Sounds interesting for sure man!
@ColeMedin sure! In my head I'm thinking of a web form that a user could upload the document and enter any relevant meta data to, which the submit action would trigger the addition to the vector database, and probably store the file for reference later if necessary.
Oohhhh I gotcha, yeah I love that idea! Something I am actually doing for a client right now.
Can you easily take the abacus ai agent it builds and convert it into the 8n8 pipeline you made? Just seems so easy to make it on there but then I can't get it to work on the 8n8 workflow then transfer over to my app
You should be able to with the Abacus API! I haven't done it yet though but that would certainly make a good video. Is there a specific error you are getting?
You are awesome!!!
Easy to host all this somewhere? Render?
Great question! Yes you can easily host this somewhere with any platform like Render, DigitalOcean, AWS, etc. I have a tutorial on my channel for self hosting apps like this (using the local AI starter kit as an example actually) that will basically apply to any platform you want to use:
ua-cam.com/video/259KgP3GbdE/v-deo.html
Love whats being offered but when trying to download I'm just totally lost on the whole Postgres login password details part. any help would be amazing!
Sure thing! What part is tripping you up for the Postgres login stuff?
Quick question, I'm playing around with openWebUI and it seems to have it's own volume in docker with a chroma db. If so, what's the purpose of an additional Qdrant and Postgres DB?
I didn't actually know Open WebUI has Chroma in the image, could you clarify more? Also Chroma doesn't have an integration with N8N like Qdrant!
@ OK. If I just install openWebUI it already does rag and documents upload without doing the n8n work flow. I checked the volume and it seemed to be being stored in a chroma vector database. I was asking. If I just want to set up a local llm. Is installing openWebUi all I really need to do. Or are there benefits to using Qdrant etc
Chroma is going to basically be as good as Qdrant so if it comes with Open WebUI you could just use that!
what happens to the bolt.new fork?
@@martinhltr The Bolt.new fork is going strong! Actually more updates on that tomorrow!
I'll still be posting other content like this each week but it doesn't mean the Bolt.new fork is going anywhere!
I like supabase, is qdrant better ?
Great question! I love Supabase as well and for RAG it will do just as well as Qdrant until you start to get millions of records. That's when a dedicated vector DB like Qdrant or Pinecone starts to perform better than using PGVector with Supabase. But until then, I honestly would prefer Supabase because then you also have the platform for auth and SQL as well!
Is it possible to run a larger LLM via API on Ollama in this setup? I mean, for example, a model running on RunPod serverless.
Great question and the answer is yes! You just have to change the base URL for Ollama to point to your RunPod machine.
@@ColeMedin So, if I understand correctly, in that case Ollama is not needed, because I can set the API endpoint to the RunPod address in n8n?
Yeah that's right!
been building my own framework for ollama agents from scratch all with chatGPT since i'm not a programmer lol. I need to figure out how to do this but I just don't get it. Where do I even start lmao.
That's a big task especially for a non-programmer. Good on you for diving into it! I would suggest using libraries/tools like LangChain or LlamaIndex to help you get started!
Sorry to bother you again 😅. But is there a way to integrate say coolify for automated deployment or do you have another one. Sorry for the constant questioning.
You are good! I don't have content on my channel for automated deployments yet but I am planning on it! You certainly can set it up.
@@ColeMedin could even use ottodev to help. 🤓
Haha true!!
Thanks Cole! I ended up doing the same you did, but I think this solution, as state also by others online, has a MAJOR flaw that makes this unusable in production: there's no way for the pipe to return the DOCUMENTS chunks and present them in a nice way, so that the user can click and see the citations from the retrieved documents. Is there something similar?
You are welcome! I totally agree with you here - what I have built is a great starting point and something for an individual to use locally, but it would have to be extended in ways like what you mentioned to be production ready. Citing sources is actually something that a basic RAG agent in N8N can't do, so you would have to set up something more custom than using the default N8N vector store retrieval tool. I'm likely going to make a tutorial on that in the future!
Hey Cole, after I set everything up and got it to work with default values as shown in the video, the response I get from the pipe is something similar to this: tool.call("what is up")
There is no chat, but only these responses. Is there something I am missing?
Also, when clicking the "Call" button, I just get a message saying, "Permission denied using media devices". There is no popup to activate the permissions.
I am too having this issue!
So that error happens a lot when using smaller LLMs that aren't able to handle the larger N8N prompts. I would try a bigger model if you can!
This is an awesome tutorial but I am running into an issue with the localhost 5678 link. It just says "it works!" when I click on it instead of brining up n8n. When I hit the link under it for the workflow it gives me a URL not found error. How do I fix this?
Thank you! Sorry you're running into that though! Is this when you try to access n8n in the browser? I haven't seen this before.
@@ColeMedin I figured it out. I was running an apache server on that port... very stupid haha. That being said, I have encountered another error. Whenever I query the N8N pip i get an error such as this
import json
Define a JSON object
json_data = {
"alternatives": [
{"name": "Saturday School", "description": "Serving suspension on a Saturday"},
# Add your alternatives here...]
}
Print the first alternative from the JSON data
print(json_data['alternatives'][0]['name'])
The pipe will respond accordingly to basic text such as "hello" but whenever I ask it about the database it spits this, or something similar out. Any idea?
Glad you figured it out! It seems like this output is more because of a smaller LLM that is confused by the n8n prompting under the hood for tool calling. I'd try using a more powerful LLM if you could! I've seen this kind of thing happen a lot when using 7b parameter models and lower.
@ thank you for the responses! This is a godsend after 5 hours of troubleshooting. Do you have any you’d recommend?
I love Qwen 2.5!
Hi, is it possible to modify the code to add uploads of documents too in the Open WebUI ?
The workflow doesn't support that right now but it would certainly be possible!
Just a quick one, How can I automate my ChatGPT chat history and feed that into Qdrant to learn from? If possible?
Good question! Typically chat memory isn't the best for RAG since RAG is just for lookups, not learning from long conversations.
@ColeMedin I am working on a way to save my chat on say chatgpt and filter the prompt and responses then fed that to qdrant for learning. This will eventually be able to do this for most of the ai chat platforms. Happy for suggestions.
Hello,
Greate video. I tried to follow everything but i think i missed a step. When i try to chat i get following response to a hello.
tool.call(query="define hello")
Can someone tell me what i missed ?
Thanks!
Thank you! Which model are you using? I've noticed that weaker LLMs like Llama 3.1 8b will do this sometimes because it doesn't handle the n8n prompting under the hood for RAG very well. I'd try using a more powerful model if that is an option for you!
@@ColeMedin Thank a lot. Llama 3.2 helped. But the thing is that i use it with M1 Mac and tryed with local running ollama instead of running in the docker but still its super slow, answering after 40-50 seconds for a simple hello. :(
Do you have any of these that use the local file system? I don't want to go through the trouble of creating a Google cloud account to be able to use Google drive. I've tried but can't seem to get it to actually work.
This, I want to be able to host all my data locally and access it with the system.
I think the issue is oauth won't work without a SSL enabled domain so I don't think any local instance will work with anything that needs oauth e.g. google
I don't right now but I know there is a huge need for this so I will be covering a totally local file system implementation in the future!
@@ColeMedin I've just seen something about n8n with tunnel within the docs. and it appears this could be their answer to the total local with oauth
Nice! Have a link to this in the docs?
Why is it that I have to click test flow button for each n8n to work with the web hook for each query on web ui ?
Actually, for those that need help on answering the question why test mode is only allowing one click of query , the webhook tab contain two mode, the test mode url is for testing and production mode url is the one you should copy to the web ui valves !
Thanks for coming back and answering your question, that's right!