Wow, this was a contentious one. Some folks are really eager to prove that function calling does call functions, or that I was just wrong in this video. But they just say it without providing any supporting information. So if you think that I am wrong, and what I say in this video is not 100% factual, then please provide some sort of documentation or a tutorial or a video or anything that suggests that function calling in the base OpenAI API is not exactly how I described it. Don't just point to some other product as your example. If you have any evidence, I would be happy to do an update video. This video talks specifically about the base OpenAI API. Otherwise, simply accept that function calling doesn't call functions and instead is all about formatting the output, which is incredibly valuable. It certainly improved at that when the json format option came available later on.
Your demo scenario is a good, concise example for what goes on with any model. People who think the model is calling the functions really don't understand how the whole stack works. Maybe you should have a diagram in the video that shows the components and their respective jobs and what flows between? People need to see how much work is done on the calling side vs. the model. Otherwise, they continue to think that these things are doing all sorts of things they aren't.
to call functions you need to call a cell in jupiter notebook ..... and execute it : if the response is a function call then execute it and return the value to the script. if the response is a response then return it....(simple) my only problem is getting the jupiter client to work right!!
I was confused about function calling as I could not figure out how the model was calling a function. The model was just returning json formatted data. Thanks for the clear and concise explanation.
Don't care about the naming debate. But want to point out, that you don't pass in just a single function description, you can pass-in multiple. The functions take parameters and the model chooses what parameters to pass to which functions. Some parameters might be optional or have heuristics in their description explaining to the model when they are best used. The model does call a function by responding to you in a certain format, and you end-up calling the function, but the choices of parameters is up to the model. This is not just about formatting the responses and is a great way to enrich the capabilities of an agent with "reasoning" as the results of the function calls expand the context with new extra information. When I need the model to respond in a certain format - I just ask it to do that, without function calling. When I need the model to "try" to "reason" and call the right function with right parameters to expand context - I use function calling.
This reminds me of when people think dogs have feelings or think like humans. We take this thing that we like and assume there must be more to it. But that doesn’t make it true. The model takes input and returns a structured output to a specific schema. That’s it. Wishing or imagining it doing more doesn’t make it so.
@@technovangelist I like it yeah, but that's not what I'm saying. The message I got from the video is - it's just formatting. But I see it more as an approach/strategy of automating the interaction with the model, where we build over multiple request/response calls allowing it to use a set of tools that builds context to solve a problem. It's more than just formatting. This is how you would build a system like Voyager where they let ChatGPT control Minecraft and build skills library.
But it is just formatting. It formats the output to align with a schema. That’s all it does. If you can use that to help improve your application, and it lets you organize the code better, great. But all it does is align to a schema and try to output as valid JSON. At first, it wasn't very good at that second part, so they later came out with the output format json option which forces the output to be valid json, whether or not it makes sense. The two pieces together make it all work and make it on par with format JSON in Ollama.
Thanks Matt for another great super clear video. You are a fine teacher and I'm really happy to have AI related creators that are not just on the hype train but actually thoughtfully looking at the tools in a grounded sort of way. Cheers and I've been to the island where you live a couple of times. Really nice vibe there! P.S. For other video ideas, I'd love you to explain some approaches to working with longterm memory and chat history in Ollama. It would be nice to be able to save and load past sessions like you can in chatgpt, but beyond that to save the chat history to a vector db and then do RAG as needed for the model to recall certain things we have chatted about before, Like my dog's name or my favorite Band. that would be cool! Thanks
Happy to help! And thanks for those ideas. Memory is tough to get right in LLMs. Most papers have shown that just giving a model all the information confuses it. Larger context usually results in lower performance. And RAG is easy to start, but hard to go further and get right. I would love to look into this more in future videos.
You are great! No hype, no nonsense, straight to the point and super-clear! Thank you for making your videos. And sorry to hear that it took a bite from your family time.
I found this really helpful thank you. An interesting follow-up would be to explore the “instructor” lib that uses pydantic and has auto-retry logic if the LLM misbehaves.
Yes, I think function calling is very important. Maybe a video about fine tuning a model for function calling. ;-). Great video. Thanks for taking time out to share your knowledge.
@@technovangelistYes, broadly speaking, it could be useful to investigate the topic of "fine-tuning" in relation to Ollama. So, the basic questions are: how can I fine-tune a pretrained model (available in the Ollama directory) and run these fine-tuned models with the Ollama runtime?
Since we have seen every attempt at fine tuning for function calling has not improved their abilities to do function calling, I doubt I would do a video on it for a while. When they get better that models that are not fine tuned for function calling, maybe
This is great I was so upseet when i actually found out how function calling would work , i was thinking you would go into pydantic and the amazing instructor library , perhaps that would be a great next video
Yeah, thought the same after I listened to the weaviate podcast on the instructor package yesterday. We might already start brainstorming a new name for it, maybe extractor?
I guess the "magic" that people desire is still more in the auto-mapping of a conversational input to a function call (whether that be local or on the Open AI side). If Llama or OpenAI cannot perform a task I want (or 100's of tasks) and I can write a function which helps with that, openAI or Llama can automatically parse natural language to find the correct input to one or more functions I define, take the output and give some natural language result, I think this is maybe what people may conflate with "functional calling" but in either case, it may be incredibly useful for the developer and user to have this openAI or Llama intermediary to handle the magic or dirty work of mapping natural language to parameters and the correct function.
Totally agree, I actually add an LLM Agent in my flows to check the response matches the desired JSON and can be parsed, if it can't it re-querys the previous Agent with extra prompting to say, make sure you follow the JSON. This also works with models without function calling built in.
Function calling has great benefits. if an output of a function is required to complete the users prompt it will output the function name of it and wait till it get the response from the function which we have to provide from our API. Then it process that data and create an proper result. It's really easy to setup function calling in the playground tool
Since you mention function calling in the playground, I assume you are talking about the Assistants API. This video refers to the core OpenAI Function Calling feature, not the similarly named feature in the Assistants API. The two things are related but different.
I’ve been using functional calling in a project where I ‘sync’ a chatbot conversation with dynamically loaded page content. The idea is to display relevant supporting content on screen during a chat conversation. I have it working perfectly with GPT4 (my current project makes over 20 function calls during a typical conversation) but is flaky when using GPT3.5 - but, for me, function calling has been a game changer in terms of what I can create as a web developer.
Ho Sir, among the videos from some people they don't understand what they present, your one is a deep fresh air. I know this is 8 months old. I am currently trying to add some "functions" in Ollama, working when I use full python, but I would like to see the correct answer in web UI. My plan is the folliwing: we have already a lot of information from observability stack. I would like to grab some info from Grafana for example to answer "what are the last 5 alerts for the server BLA_BLA_BLA".
When I first started programming agent systems, the first problem was consistent outputs for parsing, I had this issue since day 1, so it's great it's becoming standard. Really like that you can give llamacpp bdnf form grammar as well.
Thank you for sharing your insights on the 'function calling' feature. I appreciate your perspective and the effort to demystify this concept for a wider audience. I'd like to offer a clarification on the function calling feature as it pertains to AI models, particularly in the context of OpenAI's API, to enrich our understanding. Function calling, as it is implemented by OpenAI, does indeed involve the model in the process of generating responses that incorporate the outcomes of predefined functions. The key distinction lies in how 'calling a function' is conceptualized. Rather than formatting the output for external functions to be executed by the user's application, function calling enables the model to leverage internal or predefined functions during its response generation process. This means the model dynamically integrates the results of these functions into its output. The feature is named 'function calling' because it extends the model's capability to 'call' upon these functions as part of forming its responses, not because the model executes functions in the traditional programming sense where code is run to perform operations. This naming might seem a bit abstract at first, but it accurately reflects the model's enhanced capability to internally reference functions to enrich its responses. Understanding function calling in this light highlights its innovative approach to making model outputs more dynamic and contextually rich. It's not just about formatting data but about embedding the function's utility directly into the model's processing pipeline. This feature opens up new possibilities for integrating AI models with complex data processing and analysis tasks, directly within their response generation workflow. I hope this explanation clarifies the intent and functionality behind the 'function calling' feature. It's indeed a powerful tool that, when understood and utilized correctly, significantly broadens the scope of what AI models can achieve in their interactions.
Thanks for sharing this interesting interpretation of function calling. But I think you are confusing two separate topics. I am referring to Function Calling, which is one of the core capabilities of the OpenAI API. But I think you are specifically referring to the Function Calling sub-feature of the Assistants API. I see you have a few assistants available, so you are probably familiar with that API. These two things are different, though it is doubly annoying that OpenAI has given that the same name. OpenAI acknowledges that the features are similar but different. In the core feature, which this video addresses, the model doesn't know what the function actually does, so cannot execute it. Review the API and their documentation, and it clearly talks about only returning the values that abide by the schema. I agree that it is a powerful tool, but you should understand what it does. I have not dealt with the Assistants API, so I cannot confirm whether your assessment is valid there. I have to assume that it is completely accurate. As an interesting, completely unrelated side note, I think I saw somewhere that you are based in Nieuwegein. When I first moved to the Netherlands, I stayed at a tiny hotel in the city center there. Not the Van der Valk, but a tiny one in the center. The company I was working for, Captaris, had an office in Vianen and coworkers lived in Nieuwegein and Houten and other places. Beautiful area. I ended up living on the Prinsengracht at Utrechtsestraat in Amsterdam for a couple of years and then bought a place closer to the Vondelpark for the rest of my 10 years there. Since then, I lived in Boston and now near Seattle.
@@technovangelist Just makes things even more confusingly named 🤣I *WILL* grant you that point! I have 40 odd years experience in IT and this must be the most convoluted naming of features ever. Just to set the record straight: 1. There is a feature in the OpenAI API -- which both me and ChatGPT-4 (I gave it the transcript!) thought this was refering to. The example for that feature is being able to ask a GPT/LLM what the weather will be like. For that it requires your location and to call a weather services with that location. All of that is in the documentation of the OpenAI API (which is in itself confusing and badly written). For that feature to work, you will have to also include a system prompt and the function call has build in parameter descriptions so the GPT will know what it is and when to use it. 2. When creating a GPTs we have Actions. These come down to the same things as (1) and given a *LOT* of coding experience will work. I created a GPT that is able to take notes by calling a app I wrote for that, discovered it will require confirmation for each note it takes and also that I'd have to pay the bill for the server to run said app and abandoned it for now. 3. In the OpenAI API version of GPTs we have something called Assistents which should be the same as (2) above but now you are paying per token as well. I obviously did not even experiment with that. Confused? You won't be after this episode of ... Soap! (the VERY old TV parody, not the protocol). And yes, I am Dutch, nice to see you've been here, likewise I've worked and lived in Michigan for a while. And a few other countries.
You are certainly not one for short concise replies. I can’t tell if you are realizing what it does or doubling down to continue to claim that function calling doesn’t do what the OpenAI docs say it does and instead actually calls the function. But anytime you ask a model for what a new technology does rather than actually verify with the source you are bound to run into questionable answers. I love this stuff but it can still be incredibly stupid.
I went in to verify something just now, but all of openai's services have been down for an hour. on the plus side, my 'function calling' in ollama has worked just fine the whole time.
As someone who was new to the ai thing, I thought I was too dumb to be missing something in how function_calling does not actually mean the model calls function. Thanks for putting the fundamentals out loud.
It's been 4 months, and the OpenAi docs are now as clear as it gets : "In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code."
Great stuff, and you have the perfect accent for my mid-western ears. Thank you. I will say that how I've used the OpanAI Function Calling is different than this. The way I understand OpenAI Function Calling is closer to what LangChain's LangGraph workflows are doing. Where you define 'tools' as different workflow steps and let LangGraph decide which tool to call based on the prompt. Giving OpenAI the list of functions I can call locally (tools), and when each function should be called based on a description, OpenAI decides for me when to call the specific function. This allows me to utilize the power of NLP and perform specific tasks in my code. I call it where AI and Automation meet --> OpenAI is the AI and my code is the Automation.
I see the OpenAI compatible API as potentially beneficial from a product development standpoint because it’s easy to quickly build a scalable proof of concept on OAIs infrastructure and then theoretically take the time to set up and swap to a cheaper/fine-tuned/open source model down the line. I know that second part is getting easier and easier every day though and I’m definitely not up to date with the latest options for actually using something like Mixtral8x7B in production along those different axes of price, ease of setup, reliability, scalability, etc. Great video, and looking forward to hearing more of your thoughts.
This is a great video, I will say though the fading between the code makes it hard to follow. Also seems to go a bit fast to actually see what changed. However I did just find your channel and am loving it.
I have been playing a lot with CrewAI and custom "tools" using OpenAI, and I've been able to rack up tokens and spending. Spinning up CrewAI against a local Ollama LLM keeps my costs at zero. That's when I quickly butted up against the lack of "functions" or "tools" in Ollama's interface and it's killing me. Thank you for this video, it was eye-opening. I suspect by the time I re-implement all my custom tools with json schema ... Ollama or CrewAI-tools will have gone ahead and implemented the features anyways. ha
Wow! By far the BEST explanation of this feature so far. So from my understanding, this is not even something "new"? You could have just asked the model to return the data in a certain format all along. No idea why all other videos have hours long trying to explain this when all you need is 5 minutes...
Well not quite. The feature was added about 8 months ago. There are two requirements. Tell the model to output as json in the prompt and use the format: json parameter in the api call. Optionally also provide a schema and maybe an example for best results.
@@technovangelist Sure, but how exactly is that a "feature"? Why would I not be able to use even old models like ChatGPT 3 to ask it to return the response in a certain json format (schema)? Sounds to me like it is just a "technique" to be able to script better.
Any app that relies on a model to respond can’t really use the output unless it’s formatted as the app expects. To get around problems here folks were having to rely on kludges like practice and guardrails. It was a significant problem.
I‘m new to this format json thing, but it‘s crazy to parse more or less pre-formatted markdown-like strings. But currently, both ways seem to be the same consistent. Have to test this with Llama 3. Maybe this works better now. Thx for the video.
I am completely agree with you! The word of functional calling is cool, but a little misleading. "Format Jason" is less cool, but more correct! However, based on the fact, so many people is using the functional calling feature of OpenAI api, It would be great if you can make Ollama function calling compatible with OpenAI API! I am definitely looking forward to it.
Call it "Function Capable Formatting" (or at least I will) whether done by AI or human. The actual functions can be called via any CLI tooling... whether done by AI or human. I am just waiting for all of the Mamba backed GUI webchat apps called "***GPT" for the traffic hype, as if OPENai wasn't bad enough. There's always Apple to "protect your privacy". LOL. Thank you for the clarification. Now I can read the docs.
Thank you making this video that clarifies and demonstrates the nature of function calling using AI! It seems to me that if someone was using OpenAI to do function calls, it would be because they want the functions censored by their systems. Services like these could be exploted by developers creating websites that can trigger function calls, and may be a security risk to a user. That may be the reason OpenAI moderates them? However as your video makes clear, it is fairly simple for anyone to do something like an uncensored function call, so does it make sense to have a service for it at all? Especially if it makes use of the already existing moderation tuned models. Fantastic tutorial and interesting commentary, I'm looking forward to more!
Awesome video mate, I see you are very active with your community, I created this GPT Called -authority-forge-question-extractor You can download your video's comments, upload the file to this gpt and extract all the questions your community is asking you. This could assist in finding content to create around pain points your own community has. Like I found this useful question. 3 days ago: "Can you please release a video on MMLU and openAI or open source LLM" Could make for some content. Best of luck keep up the great work.
The function API and lately Tools API is much easier to configure than using low level primitives such as inlining everything in system messages taking up context space. With OpenAI Tools API you can specify multiple tools (aka functions) and the AI can respond to a request with multiple tool invocations that can be performed in serial or parallel, thus making for much richer and more powerful interactions. Hopefully we will soon get something similar for Ollama 😊
I'm so happy there is a better way to do this other then straight up begging the model. Can you talk a bit about how this works under the hood? Specifically whether this eliminates all CoT the model does or are we just clipping the json at the end?
The strength of the "OpenAI way of doing things" is that you can include many different functions and have the model choose the relevant function based upon the prompt. This is where it's slightly more complicated than in your example. They will have fine-tuned their models on many examples of complex functional calling examples, all using the same prompt format, hence the reason why you would use their format. Of course, you could tell your prompt to choose the relevant type of output depending on what it thinks it should do... but by doing it your way you don't benefit from the model already being used to a specific pattern/syntax. I do have to agree though - "function calling" is a bad name for it "function requesting" would be better.
I am not sure I see the strength of the OpenAI way because you are describing what ollama does as well. Apart from needing to fine tune for function calling. You can, and many folks do, supply a bunch of functions and the model will choose the functions needed and output the parameters required.
@@technovangelist Yep, we're in agreement. There are various leaderboards out there that try to measure how well models can cope with choosing a relevant function, GPT-4 is consistently on top, so while it might seem a bit long winded, their methods do appear to work well. Any LLM could basically do the technique, but the more heavily trained on examples of function calling with a specific format, the better, especially as things get more complicated, obviously your example is easy compared to a typical real world example that an 'agent' might face.
In this case the strength of the model has little to do with it. Plus no benchmarks have been made that do an adequate comparison of models. The benchmarks are demonstrably wrong most of the time.
Excellent video. Sound is great. I would recommend replacing the "stop motion" for showing code blocks as it's easy to lose sight of what's going on which is a bit of shame as the video is really well made.
There are some 'startup founders' that refuse to share their idea without a NDA. They think the idea is the big thing. All the work that goes into making it real in the way you imagined it is the hardwork. You should share the idea with everyone to get feedback.
Totally agree! Your client and feedback are your best friend! I saw a lot of content about Y Combinator. Do you have any contact where I can connect with startups looking for partners/developers?
Yes, please. That's the whole point in having ollama available through API. A cool requirement is autocorrect. Let's say, When I ask an answer passing a json schema. Then, the function tries to parse the response in that schema. If there is a problem, then the function will ask the LLM to provide the correct format. If the function is not able to parse the response, it should throw an exception that contains the original response, the problem, and possibly the result from the fix attempt
Hi! I trully appreciate your videos. About the output format for the response from llm engines, I think it`s a fundamental feature. It's very usefull for me in my duties once I had to gather and join data from a bunch of sources and formats needing to reshape them in a common one. Hope Ollama keep doing this and attend to any standard format that came to be in order to integrate with the ecossystem around llms. By the way, I really appreciate if you could talk about integration of Ollama with MemGPT and Autogen or CrewAI, mostly I',m interested with Ollama and MemGPT. Thanks for the your videos.
Hi Matt, Thanks for your video on using Ollama! I appreciate your style. Since you invited your followers to give feedback and suggestions, here are some of my proposals for new videos: - Function Calling Schemas Integration I agree that "function calling" might not be the perfect name, but what is your proposal? You didn't specify it in the video (unless I missed it). Also, you criticized the potentially inefficient OpenAI schema, but you didn't show the Ollama schema (in case of multiple functions); The Python example you provided to produce a single JSON for a "function calling" is clear to me, but it might be useful to produce a new video explaining how to manage multiple function schemas. Consider a classic domain-specific case of a specialized assistant that "decides" to call a function (another commonly used term is "tool") among many (e.g., customer care, weather reports, etc.). I mean, using the Ollama way, you mentioned in this video. - Local GPUs vs. Cloud GPU Services You've already published some videos about these topics, but it would be useful, at least for me as I'm not a GPU expert, to have a video about when and how to use local GPUs versus using emerging cloud services to power the Ollama local instance. By the way, on the application use-case side, my article "A Conversational Agent with a Single Prompt?" (available on the social network starting with L...) might be of slight interest. Thanks again, Giorgio
Great video, thank you. While it’s nice to have the OpenAI API in case you have an app that requires it, I much prefer the native library. My language of choice is Python, but there’s no accounting for taste :) I’d like to see you expand on Function Calling or Format JSON in two ways: 1. What Open Source LLMs handle that well? What fallbacks do you build in to ensure they respond in well-formatted JSON? 2. How do you combine Format JSON with RAG? Say in the use case where the RAG contains data that you want to query based on parameters through functions?
Two great ideas. A lot of folks want to know which llm to use and to cover this topic exactly is also good. Combining format json with rag is also an interesting idea.
The first part of your statement is correct in that the model doesn’t call the function. But the second part isn’t required. In fact in most cases it’s faster and better if the model isn’t involved in the second half. The problem is that a lot of folks think the model calls the function which isn’t true as you recognized.
@@technovangelist I have to admit that I thought first that I would give the function to the model and it just will just use it, but then I guess they would name it "function using" 😂. by the way thanks for the video and your way of explain is great.
Liked the video. If I understood what you said correctly, you said that ollama didn't support "json function" and that was OK since it's "more complicated and doesn't offer any benefit.". I wish you'd explain how to achieve similar results in ollama.
Hi. Sorry that is definitely not what I meant to say. Sorry if that’s what came across. What it doesn’t support so far is using the OpenAI compatible api to do function calling. The actual feature is there but doing it the way OpenAI requires you to make the calls does not. I think the way ollama does it is far easier to work with and think about. The whole video was about how to achieve function calling with ollama.
Great video. I see now how to use Ollama to get back a specific scheme form the LLM when requesting it. However, how do I provide Ollama with a list of schemes with descriptions of their use, and have the LLM response give a normal message, with a potentially empty list of schemes with their tool names for my code to handle for the LLM to then convert into natural language response? The killer feature of "function calling" is to allow the LLM to assert what functions (if any) need calling. Proper scheme is just icing on the cake.
the way i have seen others do it, the model just returns the one schema for the output. and the proper scheme is critical if you want to actually call the function, otherwise you need to do a lot of error handling. But generally if you need the model to decide which functions to run you have probably done something wrong in your code.
Thanks for this explanation. Allow me to suggest a subject. Computer Vision using Convolutional NN is so different from Natural Language Processing that I’m curious to know how OpenAI, Google and others integrate the 2 in their LLMs, making such models multimodal.
wow, such a good tip. No more regex parsing. Can you make a video on the easiest way to create a rag pipeline with ollama, that allows you to create different datasets. So that you can (as you can today) specify model to interact with AND RAG/embedding dataset?
Thanks a lot, I was confused about it and I thought the model was actually calling a function. However how is called the feature that allows the LLM to grab data from an external source, like an API?
There is no such feature. No model is able to access anything outside of itself. Anything like that is a developer writing something that accesses the external info and then passes it to the model.
Would you suggest to use a custom schema for tool calling regardless if the chat (ChatOpenAI, ChatOllama, etc etc) function has a default one? Also Is there a way to see the exact prompt that the actual LLM is getting ? I am trying to see what ChatGrog is passing (final form of the prompt) to the respective LLMS because its default function calling (tool.bind) works really well and I want to get ideas to use with ChatOllama.
Thanks for explaining function calls. I think a better name is functionformatting. It's nice to finally understand what the big deal is. I played with your code to better understand it and found that haversine seems to be bad. (pip install so is latests?) But ignoring that, this jsons stuff seems to really taxing the system. 10 minutes for a response on a fairly good laptop, using gpu.
There is a problem I ran into if you have a system prompt that does not tell it to output json it seems to fight with the regular output. It shouldn’t make much difference. But the team knows about it.
Output formatting is really necessary feature. BUT Langchain library already does that. Langchain's StructuredOutputParser does the same. Why to use "function calling" feature ?
Hi, Matt.Great video as usual, i always enjoy the way you put through your videos..simply awesome. Today i do have a request as it relates to this topic, langchain has some framework ccalled the ollamafunctions, it looks good, but i have not been able to get it to work, can you try this out?
No. ... Just kidding, though that probably doesn't come over as well as if I said it. I have been building all my examples in one directory called videoexamples (I am a pro at naming). I have been meaning to push that up. will try to do that today. Wife's sick so I am with my daughter much of the day.
Thanks for this. I was thinking I hated the thumbnail but it was so late I just wanted to sleep so I would fix in the morning. Guess no fixing is required.
It would be great for us not-yet-quite-so-experienced users of the various api's to understand why you think the OpenAI api is so painful to use with functions. It's more verbose but you probably need that to support more complex scenarios with multiple functions and multiple schemas. I mean, OpenAI does have a certain level of experience with these things, they don't add complexity for no reason.
But it doesn’t support more complex scenarios. They were first with a lot of things and made the api overly complex with no benefit to the user. Newer products can come in and simplify knowing what we have all learned since. There will be another upstart in months or years that will allow the same cases and more in an even simpler way. The same was true building gui applications. When windows came out building apps was hard. So much boilerplate you had to write. But in the time since it has gotten simpler while allowing for more complex applications. It’s more complex because they don’t have the benefit of hindsight.
That last sentence is key. Their experience in this instance is probably their biggest weakness which comes up because there are so many alternatives. They have all this cruft in their api that they can’t get rid of without breaking a lot of apps. Being first is really hard if you aren’t really careful about designing the endpoints people interact with. Everyone else gets to benefit from the good parts of your experience and not worry about the bad.
Great video and explanation. Would be great to know how to get the missing parameters to add to the « function calling » when the user doesn’t put all of them. Does the llm manage it? Do the programmer manage it? Maybe an idea for a video ;)
I think function calling capable models are finetuned on function calling data, it's not just format: json. Anyway I have never expected the model to call the API, as the important thing is not the call but the data, also it works for your APIs so how can the model call them? Has anybody even been confused about that?
I think its like layers of an onion. format json is a layer. providing the schema is another layer. telling it to output as json another. few shot. a function calling fine tune. all layers that help complete the feature, all of which make it more capable, reliable. Maybe with more of some layers, you can leave some out. Getting the data out, as you say, is the important part. thanks. I hear so many folks say that what ollama is doing is not function calling because it doesn't call the function. But what was important here was to define function calling as openai applies it. And then ollama does it just as well, and depending on how you look at it, in some cases a bit better.
@@technovangelist Thanks for responding. I actually asked GPT4 what is function calling and it answered that is the capability of the model to actually call external APIs. I guess the difference is in the context, as a developer using GPT, function calling name makes sense as preparation step before you call the function. Maybe with ollama the context is more from user perspective, where they expect the function to be called. Thank you again for the conversation.
I would assume its answering based on what the words imply rather than any specific knowledge of the api. I think that’s the way any model would answer. Models say a lot of things that aren't always true.
Yeah, totally! OpenAI calling it "function calling" had me and my friends like, "What the heck?" It's a good marketing name but kinda ignorant! I don't like when marketers manipulate things like that. Another thing that still I can't get my head around of it is "GPTs", again the marketing aspect is like you are beuilging your own GPT but reality is you just what OLLAMA nicely call it Modelfile! Anyway, beside asking the main model to returns JSON in the system message, one another thing is to use the main large model to get the initial response and then a smaller model which is fine-tuned extract structural info to generate the JSON. Mixing two models can do the job, but again it's not "function calling,". Good content though - I just subscribed!
The model effectively decides when to call the function, and which function is appropriate at the time by formatting the output for the particular function. If I give it a query where taking a headless web screenshot is appropriate, then the output is formatted for that particular function. And the output does NOT particularly need to be in json either, depending on how the function is described to the openai API and how you're choosing to detect that the model "wants" to call a particular function. Respectfully, I think "function calling" is pretty appropriate, and it really seems like there's a misunderstanding of what function calling does here. Not trying to be negative, just pointing out that "just formatting" is missing the point. Of course the model doesn't have access to actually execute code on your local machine. But the "formatting" decided upon (by the model deciding which function to use) is essentially doing the exact same thing. Based on scrolling through the comments here, disappointingly I don't expect you to be very receptive to this feedback. Just my $0.02, and I stand by it. Function calling has a lot of really great uses, and I think "just formatting," while technically true, sorely misses the point. You can (and I do, not just guessing here,) effectively give the LLM a set of tools, trigger those tools based on output from the LLM, and provide the output from those tools back to the LLM if needed. That's pretty impressive I think. All that being said, I really do enjoy your videos. Thank you for sharing, and I look forward to the next one.
Everything you said here describes what ollama does. Function calling is just formatting. That is the entire point of function calling. People try to say it’s more but even open ai disagrees. I am very receptive to anyone who has anything backing up the opinion that it’s more than just formatting. Please. Show something. Anything. Ollama can be given a list of tools and output the appropriate json to allow the user to call whichever function was chosen. But it is just formatting.
@@technovangelist I’ll be presenting my master thesis in a couple of weeks and totally planning on using the term (it’s about practical applications of LLMs in finance)
Agree function calling is a confusing name. But also think ollama falls a bit short here compared to openai. Defining a schema in text as part of the system prompt feels a little imprecise to me. llama.cpp has a feature called Grammars which allows precisely defining a schema in a way the LLM can reliably reproduce. If Ollama allowed a way of defining your own custom grammar as part of the API, or a passing a JSON schema which it converts to grammar, then I think it would be comparable to openai for function calling use cases.
So your comment isn’t about ollama vs OpenAI because they essentially pass it in the context just as ollama does. But you are comparing it to the grammars of llamacpp. I think the reasoning here is that creating the grammars is extremely error prone. If someone came up with a way to do the grammars without using that horrible syntax that llamacpp is using to define them then I think that could be successful.
@@technovangelist my comment was about grammars, but I maintain ollama doesn't compare with open AI function calling in other ways. For example, if a user asks "when is the earliest flight from London to Paris", with openAI function calling, there is a specified mechanism for chatgpt to understand that a function can be called to get this information, and then interpret the return value from that function, before responding with a human readable answer to the user's questions. The equivalent mechanism doesn't exist in ollama. As for Grammars. Yeah they are ugly, unintuitive, an all round bed dev experience. But they are a powerful feature of llamaccp none the less. I believe if we had access to setting grammars, devs would be able to build tooling around ollama that brings it closer to openAI functions.
I think this is another place where the Assistants API is confusing things. The OpenAI function calling core feature won't get the info you mention. You can call it yourself, then pass the output back to the model again for further processing. There is no difference with Ollama. As for grammars, come up with a good way of defining the grammars and I am sure the feature will be merged in. The OpenAI compatible API was a community provided PR. I don't know if this is what the team is thinking, but I attended a talk by Joe Beda a few years ago. He mentioned how sorry he was that they made YAML the tool used for configuring k8s, and they should have done it right so no one had to touch that. They were also annoyed that every project feels like it needs to be in go because k8s was in go. It was in go because k8s was just a starter project at first to learn go. But that is unrelated to this. Designing an API from scratch has to be done very carefully because mistakes made early on have to stick around. I think the team is being very cautious on what to add. But if you or someone else comes up with something that keeps it simple, it will get merged.
@@technovangelist I understand how function calling works and that chatgpt doesn't call the function itself. but it DOES have a mechanism to instruct the host app to call said function and parse a subsequent return value. If you believe Ollama has equivalent, then I challenge you in a future video to create an example where you can ask Ollama for flight times and it returns with a human readable response containing data obtained from a function call.
Misunderstood your comment, I apologize for that. So there is nothing stopping you from doing this in ollama as well. It does not have the ability to instruct the host app to call a function, unless instruct simply means to say, hey dev run this function. It is up to the dev to take the output of the model, then call the function, then send the output back to the model for further processing. and then return the final value to the user. Or maybe there is logic the dev has written to go back and forth a few times. There is nothing stopping you from doing the exact same thing in Ollama. But I am happy to create another video that does that round trip a few more times.
Ok, but how do you get the LLM to output a json ONLY when it thinks a command should be run? Excellent video by the way, this will help a lot with my journey with ollama :-)
Hi Matt, with the frequent releases of Ollama, which installation option on Ubuntu allows the easiest management like upgrading to the latest version or integrating with other tools? I have a dedicated machine. I can run it with Docker or without Docker.
I think just running the install script each time makes the most sense. It’s simple and it works and is easy. Some of the community installers aren’t always updated quickly.
Hi and thanks for the vidéo ! I made a chat bot with ollama and some Mixtral persona, but I can't figure out how to format when it give some code example, maybe it's just iterating in the string chat output, sorry if it's a simple question. I just think maybe it would be easier with this function calling (youuuu). #
Hmm, mmlu is one of the many benchmarks out there. I haven't looked at this one specifically but I am generally not a fan of any benchmark that open sources the source questions. If I can see it, then a model researcher can incorporate those questions into the training, potentially making the model perfect in the eyes of a benchmark but garbage for real life. But I will take a look. Thanks.
Wow, this was a contentious one. Some folks are really eager to prove that function calling does call functions, or that I was just wrong in this video. But they just say it without providing any supporting information.
So if you think that I am wrong, and what I say in this video is not 100% factual, then please provide some sort of documentation or a tutorial or a video or anything that suggests that function calling in the base OpenAI API is not exactly how I described it. Don't just point to some other product as your example. If you have any evidence, I would be happy to do an update video.
This video talks specifically about the base OpenAI API. Otherwise, simply accept that function calling doesn't call functions and instead is all about formatting the output, which is incredibly valuable. It certainly improved at that when the json format option came available later on.
I'm not sure why you're catching flak. It was a fantastic video/explanation...and, of course, you're correct in describing it as you did.
This was awesome, definitely cleaner than pre-loading with a text file of instructions for the model.
Your demo scenario is a good, concise example for what goes on with any model. People who think the model is calling the functions really don't understand how the whole stack works. Maybe you should have a diagram in the video that shows the components and their respective jobs and what flows between? People need to see how much work is done on the calling side vs. the model. Otherwise, they continue to think that these things are doing all sorts of things they aren't.
to call functions you need to call a cell in jupiter notebook ..... and execute it :
if the response is a function call then execute it and return the value to the script. if the response is a response then return it....(simple)
my only problem is getting the jupiter client to work right!!
You don’t need Jupyter for anything
Love the way you add complexity to the answer, it makes the it the whole concept less scary. Thanks for sharing.
This is awesome thanks, and just what I needed. Returning consistent JSON is critical for building applications on top of local LLMs.
Yes it a very important part of building any decent application with llms
I admit that until now I misunderstand function calling. Now I get it. I think "format JSON" is the better name. Thumbs up, Matt!
I was confused about function calling as I could not figure out how the model was calling a function. The model was just returning json formatted data. Thanks for the clear and concise explanation.
Don't care about the naming debate. But want to point out, that you don't pass in just a single function description, you can pass-in multiple. The functions take parameters and the model chooses what parameters to pass to which functions. Some parameters might be optional or have heuristics in their description explaining to the model when they are best used. The model does call a function by responding to you in a certain format, and you end-up calling the function, but the choices of parameters is up to the model. This is not just about formatting the responses and is a great way to enrich the capabilities of an agent with "reasoning" as the results of the function calls expand the context with new extra information. When I need the model to respond in a certain format - I just ask it to do that, without function calling. When I need the model to "try" to "reason" and call the right function with right parameters to expand context - I use function calling.
This reminds me of when people think dogs have feelings or think like humans. We take this thing that we like and assume there must be more to it. But that doesn’t make it true. The model takes input and returns a structured output to a specific schema. That’s it. Wishing or imagining it doing more doesn’t make it so.
@@technovangelist I like it yeah, but that's not what I'm saying. The message I got from the video is - it's just formatting. But I see it more as an approach/strategy of automating the interaction with the model, where we build over multiple request/response calls allowing it to use a set of tools that builds context to solve a problem. It's more than just formatting. This is how you would build a system like Voyager where they let ChatGPT control Minecraft and build skills library.
But it is just formatting. It formats the output to align with a schema. That’s all it does. If you can use that to help improve your application, and it lets you organize the code better, great. But all it does is align to a schema and try to output as valid JSON. At first, it wasn't very good at that second part, so they later came out with the output format json option which forces the output to be valid json, whether or not it makes sense. The two pieces together make it all work and make it on par with format JSON in Ollama.
Thanks Matt for another great super clear video. You are a fine teacher and I'm really happy to have AI related creators that are not just on the hype train but actually thoughtfully looking at the tools in a grounded sort of way. Cheers and I've been to the island where you live a couple of times. Really nice vibe there!
P.S. For other video ideas, I'd love you to explain some approaches to working with longterm memory and chat history in Ollama. It would be nice to be able to save and load past sessions like you can in chatgpt, but beyond that to save the chat history to a vector db and then do RAG as needed for the model to recall certain things we have chatted about before, Like my dog's name or my favorite Band. that would be cool! Thanks
Happy to help! And thanks for those ideas. Memory is tough to get right in LLMs. Most papers have shown that just giving a model all the information confuses it. Larger context usually results in lower performance. And RAG is easy to start, but hard to go further and get right. I would love to look into this more in future videos.
You are great! No hype, no nonsense, straight to the point and super-clear! Thank you for making your videos. And sorry to hear that it took a bite from your family time.
I found this really helpful thank you. An interesting follow-up would be to explore the “instructor” lib that uses pydantic and has auto-retry logic if the LLM misbehaves.
Yes, I think function calling is very important. Maybe a video about fine tuning a model for function calling. ;-). Great video. Thanks for taking time out to share your knowledge.
Interesting. Thanks for the idea
@@technovangelistYes, broadly speaking, it could be useful to investigate the topic of "fine-tuning" in relation to Ollama. So, the basic questions are: how can I fine-tune a pretrained model (available in the Ollama directory) and run these fine-tuned models with the Ollama runtime?
@@technovangelist Waiting for that video.
Since we have seen every attempt at fine tuning for function calling has not improved their abilities to do function calling, I doubt I would do a video on it for a while. When they get better that models that are not fine tuned for function calling, maybe
Damn, this is the type of content, the style of content i pay my internet for. The simplicity when you walk through, no nonsense. Insane!
This is great I was so upseet when i actually found out how function calling would work , i was thinking you would go into pydantic and the amazing instructor library , perhaps that would be a great next video
Yeah, thought the same after I listened to the weaviate podcast on the instructor package yesterday. We might already start brainstorming a new name for it, maybe extractor?
This was really eye opening, thanks for enlightening me! Love your channel! Keep up the good work!
Most accurate name I can think of is "generate function arguments". "Format JSON" could be a beautifier, or transforming a different format to JSON.
I guess the "magic" that people desire is still more in the auto-mapping of a conversational input to a function call (whether that be local or on the Open AI side). If Llama or OpenAI cannot perform a task I want (or 100's of tasks) and I can write a function which helps with that, openAI or Llama can automatically parse natural language to find the correct input to one or more functions I define, take the output and give some natural language result, I think this is maybe what people may conflate with "functional calling" but in either case, it may be incredibly useful for the developer and user to have this openAI or Llama intermediary to handle the magic or dirty work of mapping natural language to parameters and the correct function.
Totally agree, I actually add an LLM Agent in my flows to check the response matches the desired JSON and can be parsed, if it can't it re-querys the previous Agent with extra prompting to say, make sure you follow the JSON. This also works with models without function calling built in.
@@jesongaming2945 I haven’t tried any of this yet, but your approach seems smart and generalisable to me
Matt - brilliant video as always, thank you. Experimenting here with "function calling" so this is of huge benefit. Appreciated.
Glad it was helpful!
this video is literally worth a million bucks (someone is gonna make a million dollar app, haha). TY Matt for sharing with us freely! Nice job!
I like the pace and tone of your videos
Function calling has great benefits. if an output of a function is required to complete the users prompt it will output the function name of it and wait till it get the response from the function which we have to provide from our API. Then it process that data and create an proper result. It's really easy to setup function calling in the playground tool
Since you mention function calling in the playground, I assume you are talking about the Assistants API. This video refers to the core OpenAI Function Calling feature, not the similarly named feature in the Assistants API. The two things are related but different.
I’ve been using functional calling in a project where I ‘sync’ a chatbot conversation with dynamically loaded page content. The idea is to display relevant supporting content on screen during a chat conversation. I have it working perfectly with GPT4 (my current project makes over 20 function calls during a typical conversation) but is flaky when using GPT3.5 - but, for me, function calling has been a game changer in terms of what I can create as a web developer.
Yup. Pretty amazing what you can do with it
Ho Sir, among the videos from some people they don't understand what they present, your one is a deep fresh air. I know this is 8 months old. I am currently trying to add some "functions" in Ollama, working when I use full python, but I would like to see the correct answer in web UI. My plan is the folliwing: we have already a lot of information from observability stack. I would like to grab some info from Grafana for example to answer "what are the last 5 alerts for the server BLA_BLA_BLA".
When I first started programming agent systems, the first problem was consistent outputs for parsing, I had this issue since day 1, so it's great it's becoming standard. Really like that you can give llamacpp bdnf form grammar as well.
Thank you for sharing your insights on the 'function calling' feature. I appreciate your perspective and the effort to demystify this concept for a wider audience. I'd like to offer a clarification on the function calling feature as it pertains to AI models, particularly in the context of OpenAI's API, to enrich our understanding.
Function calling, as it is implemented by OpenAI, does indeed involve the model in the process of generating responses that incorporate the outcomes of predefined functions. The key distinction lies in how 'calling a function' is conceptualized. Rather than formatting the output for external functions to be executed by the user's application, function calling enables the model to leverage internal or predefined functions during its response generation process. This means the model dynamically integrates the results of these functions into its output.
The feature is named 'function calling' because it extends the model's capability to 'call' upon these functions as part of forming its responses, not because the model executes functions in the traditional programming sense where code is run to perform operations. This naming might seem a bit abstract at first, but it accurately reflects the model's enhanced capability to internally reference functions to enrich its responses.
Understanding function calling in this light highlights its innovative approach to making model outputs more dynamic and contextually rich. It's not just about formatting data but about embedding the function's utility directly into the model's processing pipeline. This feature opens up new possibilities for integrating AI models with complex data processing and analysis tasks, directly within their response generation workflow.
I hope this explanation clarifies the intent and functionality behind the 'function calling' feature. It's indeed a powerful tool that, when understood and utilized correctly, significantly broadens the scope of what AI models can achieve in their interactions.
Thanks for sharing this interesting interpretation of function calling. But I think you are confusing two separate topics. I am referring to Function Calling, which is one of the core capabilities of the OpenAI API. But I think you are specifically referring to the Function Calling sub-feature of the Assistants API. I see you have a few assistants available, so you are probably familiar with that API. These two things are different, though it is doubly annoying that OpenAI has given that the same name. OpenAI acknowledges that the features are similar but different.
In the core feature, which this video addresses, the model doesn't know what the function actually does, so cannot execute it. Review the API and their documentation, and it clearly talks about only returning the values that abide by the schema. I agree that it is a powerful tool, but you should understand what it does.
I have not dealt with the Assistants API, so I cannot confirm whether your assessment is valid there. I have to assume that it is completely accurate.
As an interesting, completely unrelated side note, I think I saw somewhere that you are based in Nieuwegein. When I first moved to the Netherlands, I stayed at a tiny hotel in the city center there. Not the Van der Valk, but a tiny one in the center. The company I was working for, Captaris, had an office in Vianen and coworkers lived in Nieuwegein and Houten and other places. Beautiful area. I ended up living on the Prinsengracht at Utrechtsestraat in Amsterdam for a couple of years and then bought a place closer to the Vondelpark for the rest of my 10 years there. Since then, I lived in Boston and now near Seattle.
@@technovangelist Just makes things even more confusingly named 🤣I *WILL* grant you that point! I have 40 odd years experience in IT and this must be the most convoluted naming of features ever.
Just to set the record straight:
1. There is a feature in the OpenAI API -- which both me and ChatGPT-4 (I gave it the transcript!) thought this was refering to. The example for that feature is being able to ask a GPT/LLM what the weather will be like. For that it requires your location and to call a weather services with that location. All of that is in the documentation of the OpenAI API (which is in itself confusing and badly written). For that feature to work, you will have to also include a system prompt and the function call has build in parameter descriptions so the GPT will know what it is and when to use it.
2. When creating a GPTs we have Actions. These come down to the same things as (1) and given a *LOT* of coding experience will work. I created a GPT that is able to take notes by calling a app I wrote for that, discovered it will require confirmation for each note it takes and also that I'd have to pay the bill for the server to run said app and abandoned it for now.
3. In the OpenAI API version of GPTs we have something called Assistents which should be the same as (2) above but now you are paying per token as well. I obviously did not even experiment with that.
Confused? You won't be after this episode of ... Soap! (the VERY old TV parody, not the protocol).
And yes, I am Dutch, nice to see you've been here, likewise I've worked and lived in Michigan for a while. And a few other countries.
You are certainly not one for short concise replies. I can’t tell if you are realizing what it does or doubling down to continue to claim that function calling doesn’t do what the OpenAI docs say it does and instead actually calls the function. But anytime you ask a model for what a new technology does rather than actually verify with the source you are bound to run into questionable answers. I love this stuff but it can still be incredibly stupid.
I am not sure what the confusion is... I wrote code where OpenAI literally calls my function on my server?@@technovangelist
I went in to verify something just now, but all of openai's services have been down for an hour. on the plus side, my 'function calling' in ollama has worked just fine the whole time.
As someone who was new to the ai thing, I thought I was too dumb to be missing something in how function_calling does not actually mean the model calls function. Thanks for putting the fundamentals out loud.
Thanks Matt - Appreciate the real world python code example.
Also, major props for tackling the confusion regarding the term "function calling".
This was really awesome!! Was confused out of my mind after reading in OpenAI documentation but this really cleared it up.
It's been 4 months, and the OpenAi docs are now as clear as it gets : "In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code."
Yup. And yet folks still think I got it wrong. An update is coming tomorrow.
Great video! I think it would be useful to include the use of the stop sequence in this context.
Thanks for silently staring at me 10 seconds, I needed it.
this vidoe really help me to understand what is function calling
Great stuff, and you have the perfect accent for my mid-western ears. Thank you. I will say that how I've used the OpanAI Function Calling is different than this. The way I understand OpenAI Function Calling is closer to what LangChain's LangGraph workflows are doing. Where you define 'tools' as different workflow steps and let LangGraph decide which tool to call based on the prompt. Giving OpenAI the list of functions I can call locally (tools), and when each function should be called based on a description, OpenAI decides for me when to call the specific function. This allows me to utilize the power of NLP and perform specific tasks in my code. I call it where AI and Automation meet --> OpenAI is the AI and my code is the Automation.
yup, that’s just the expansion on what I showed. There is a lot more you can do here, but I wanted to cover the basics
Great explanations. This demystified a lot for me.
I see the OpenAI compatible API as potentially beneficial from a product development standpoint because it’s easy to quickly build a scalable proof of concept on OAIs infrastructure and then theoretically take the time to set up and swap to a cheaper/fine-tuned/open source model down the line. I know that second part is getting easier and easier every day though and I’m definitely not up to date with the latest options for actually using something like Mixtral8x7B in production along those different axes of price, ease of setup, reliability, scalability, etc.
Great video, and looking forward to hearing more of your thoughts.
This is a great video, I will say though the fading between the code makes it hard to follow. Also seems to go a bit fast to actually see what changed. However I did just find your channel and am loving it.
I have been playing a lot with CrewAI and custom "tools" using OpenAI, and I've been able to rack up tokens and spending. Spinning up CrewAI against a local Ollama LLM keeps my costs at zero. That's when I quickly butted up against the lack of "functions" or "tools" in Ollama's interface and it's killing me. Thank you for this video, it was eye-opening. I suspect by the time I re-implement all my custom tools with json schema ... Ollama or CrewAI-tools will have gone ahead and implemented the features anyways. ha
What do you mean? Ollama has the feature. Or are you asking about something else?
@@technovangelist , I might be confusing things with Langchain_community.tools.
Wow! By far the BEST explanation of this feature so far. So from my understanding, this is not even something "new"? You could have just asked the model to return the data in a certain format all along. No idea why all other videos have hours long trying to explain this when all you need is 5 minutes...
Well not quite. The feature was added about 8 months ago. There are two requirements. Tell the model to output as json in the prompt and use the format: json parameter in the api call. Optionally also provide a schema and maybe an example for best results.
@@technovangelist Sure, but how exactly is that a "feature"? Why would I not be able to use even old models like ChatGPT 3 to ask it to return the response in a certain json format (schema)? Sounds to me like it is just a "technique" to be able to script better.
Any app that relies on a model to respond can’t really use the output unless it’s formatted as the app expects. To get around problems here folks were having to rely on kludges like practice and guardrails. It was a significant problem.
Good explanation. I was also confused by this when I was looking into it.
I‘m new to this format json thing, but it‘s crazy to parse more or less pre-formatted markdown-like strings. But currently, both ways seem to be the same consistent. Have to test this with Llama 3. Maybe this works better now. Thx for the video.
I am completely agree with you! The word of functional calling is cool, but a little misleading. "Format Jason" is less cool, but more correct! However, based on the fact, so many people is using the functional calling feature of OpenAI api, It would be great if you can make Ollama function calling compatible with OpenAI API! I am definitely looking forward to it.
Call it "Function Capable Formatting" (or at least I will) whether done by AI or human. The actual functions can be called via any CLI tooling... whether done by AI or human. I am just waiting for all of the Mamba backed GUI webchat apps called "***GPT" for the traffic hype, as if OPENai wasn't bad enough. There's always Apple to "protect your privacy". LOL.
Thank you for the clarification. Now I can read the docs.
Thank you making this video that clarifies and demonstrates the nature of function calling using AI! It seems to me that if someone was using OpenAI to do function calls, it would be because they want the functions censored by their systems. Services like these could be exploted by developers creating websites that can trigger function calls, and may be a security risk to a user. That may be the reason OpenAI moderates them? However as your video makes clear, it is fairly simple for anyone to do something like an uncensored function call, so does it make sense to have a service for it at all? Especially if it makes use of the already existing moderation tuned models.
Fantastic tutorial and interesting commentary, I'm looking forward to more!
Awesome video mate, I see you are very active with your community, I created this GPT Called -authority-forge-question-extractor
You can download your video's comments, upload the file to this gpt and extract all the questions your community is asking you. This could assist in finding content to create around pain points your own community has.
Like I found this useful question.
3 days ago: "Can you please release a video on MMLU and openAI or open source LLM"
Could make for some content. Best of luck keep up the great work.
sounds pretty cool. Too bad it's on OpenAI but I’ll check it out.
The function API and lately Tools API is much easier to configure than using low level primitives such as inlining everything in system messages taking up context space. With OpenAI Tools API you can specify multiple tools (aka functions) and the AI can respond to a request with multiple tool invocations that can be performed in serial or parallel, thus making for much richer and more powerful interactions. Hopefully we will soon get something similar for Ollama 😊
We have had more powerful function calling in ollama for months
I'm so happy there is a better way to do this other then straight up begging the model. Can you talk a bit about how this works under the hood? Specifically whether this eliminates all CoT the model does or are we just clipping the json at the end?
The strength of the "OpenAI way of doing things" is that you can include many different functions and have the model choose the relevant function based upon the prompt.
This is where it's slightly more complicated than in your example.
They will have fine-tuned their models on many examples of complex functional calling examples, all using the same prompt format, hence the reason why you would use their format.
Of course, you could tell your prompt to choose the relevant type of output depending on what it thinks it should do... but by doing it your way you don't benefit from the model already being used to a specific pattern/syntax.
I do have to agree though - "function calling" is a bad name for it "function requesting" would be better.
I am not sure I see the strength of the OpenAI way because you are describing what ollama does as well. Apart from needing to fine tune for function calling. You can, and many folks do, supply a bunch of functions and the model will choose the functions needed and output the parameters required.
@@technovangelist Yep, we're in agreement. There are various leaderboards out there that try to measure how well models can cope with choosing a relevant function, GPT-4 is consistently on top, so while it might seem a bit long winded, their methods do appear to work well. Any LLM could basically do the technique, but the more heavily trained on examples of function calling with a specific format, the better, especially as things get more complicated, obviously your example is easy compared to a typical real world example that an 'agent' might face.
In this case the strength of the model has little to do with it. Plus no benchmarks have been made that do an adequate comparison of models. The benchmarks are demonstrably wrong most of the time.
Thank you, Matt , this was very helpful, Can you please make videos on function calling and agent workflow with Langchain using ollama🔥
great video Matt. could you maybe do a video of what you consider to be good use cases(at least at this time and date)
Excellent video. Sound is great. I would recommend replacing the "stop motion" for showing code blocks as it's easy to lose sight of what's going on which is a bit of shame as the video is really well made.
Had to watch again to figure out what you mean. I don’t have any stop motion. Can you rephrase?
Amazing content Matt. I can’t stop having ideias 💡 to build apps. Need to stop with the top ones and starting building. 🎉
Ideas is the easy part. Follow thru is the tough one.
Yes I will! :) and if you have any interest in doing something, I am a Golang developer looking to follow thru! :) @@technovangelist
There are some 'startup founders' that refuse to share their idea without a NDA. They think the idea is the big thing. All the work that goes into making it real in the way you imagined it is the hardwork. You should share the idea with everyone to get feedback.
Totally agree! Your client and feedback are your best friend! I saw a lot of content about Y Combinator. Do you have any contact where I can connect with startups looking for partners/developers?
I don't. I have worked at companies that went through YC as well as TechStars on the East Coast, but I don't have contacts.
Yes, please. That's the whole point in having ollama available through API. A cool requirement is autocorrect. Let's say, When I ask an answer passing a json schema. Then, the function tries to parse the response in that schema. If there is a problem, then the function will ask the LLM to provide the correct format. If the function is not able to parse the response, it should throw an exception that contains the original response, the problem, and possibly the result from the fix attempt
I read this a few times and not sure if you are asking a question or saying something else. Sorry.
Great video - very informative and easy to understand.
extremely important!
Thumbs up, Matt. keep up with the good work.
Just discovered that langchain offers an OpenAI compatible wrapper AI for function calling 😊
Hi! I trully appreciate your videos. About the output format for the response from llm engines, I think it`s a fundamental feature. It's very usefull for me in my duties once I had to gather and join data from a bunch of sources and formats needing to reshape them in a common one. Hope Ollama keep doing this and attend to any standard format that came to be in order to integrate with the ecossystem around llms. By the way, I really appreciate if you could talk about integration of Ollama with MemGPT and Autogen or CrewAI, mostly I',m interested with Ollama and MemGPT. Thanks for the your videos.
More autogen, crew ai, and memgpt. I’d also like to cover some of ai agents in other languages. Thanks.
Hi Matt,
Thanks for your video on using Ollama! I appreciate your style.
Since you invited your followers to give feedback and suggestions, here are some of my proposals for new videos:
- Function Calling Schemas Integration
I agree that "function calling" might not be the perfect name, but what is your proposal? You didn't specify it in the video (unless I missed it). Also, you criticized the potentially inefficient OpenAI schema, but you didn't show the Ollama schema (in case of multiple functions); The Python example you provided to produce a single JSON for a "function calling" is clear to me, but it might be useful to produce a new video explaining how to manage multiple function schemas. Consider a classic domain-specific case of a specialized assistant that "decides" to call a function (another commonly used term is "tool") among many (e.g., customer care, weather reports, etc.). I mean, using the Ollama way, you mentioned in this video.
- Local GPUs vs. Cloud GPU Services
You've already published some videos about these topics, but it would be useful, at least for me as I'm not a GPU expert, to have a video about when and how to use local GPUs versus using emerging cloud services to power the Ollama local instance.
By the way, on the application use-case side, my article "A Conversational Agent with a Single Prompt?" (available on the social network starting with L...) might be of slight interest.
Thanks again,
Giorgio
Hold that first thought …. For about 5 more hours.
Super video! fuction calling is great to get updated info on some topics
thank you, an excellent video. having the example helped a lot!
This is great! Subscribed, keep em coming!
Great video, thank you. While it’s nice to have the OpenAI API in case you have an app that requires it, I much prefer the native library. My language of choice is Python, but there’s no accounting for taste :)
I’d like to see you expand on Function Calling or Format JSON in two ways:
1. What Open Source LLMs handle that well? What fallbacks do you build in to ensure they respond in well-formatted JSON?
2. How do you combine Format JSON with RAG? Say in the use case where the RAG contains data that you want to query based on parameters through functions?
Two great ideas. A lot of folks want to know which llm to use and to cover this topic exactly is also good. Combining format json with rag is also an interesting idea.
Love your presenting style, very calming
Awesome video. Is this basically the same as using pydantic classes to force json output?
You are the second mention of pydantic.
I wasn't aware of this feature before now...
it "calls" by sending a JSON to you, your code excutes the function, then responds to the call with response so it's literally accurate.
The first part of your statement is correct in that the model doesn’t call the function. But the second part isn’t required. In fact in most cases it’s faster and better if the model isn’t involved in the second half. The problem is that a lot of folks think the model calls the function which isn’t true as you recognized.
@@technovangelist I have to admit that I thought first that I would give the function to the model and it just will just use it, but then I guess they would name it "function using" 😂.
by the way thanks for the video and your way of explain is great.
Liked the video. If I understood what you said correctly, you said that ollama didn't support "json function" and that was OK since it's "more complicated and doesn't offer any benefit.". I wish you'd explain how to achieve similar results in ollama.
Hi. Sorry that is definitely not what I meant to say. Sorry if that’s what came across. What it doesn’t support so far is using the OpenAI compatible api to do function calling. The actual feature is there but doing it the way OpenAI requires you to make the calls does not. I think the way ollama does it is far easier to work with and think about. The whole video was about how to achieve function calling with ollama.
Great video. I see now how to use Ollama to get back a specific scheme form the LLM when requesting it. However, how do I provide Ollama with a list of schemes with descriptions of their use, and have the LLM response give a normal message, with a potentially empty list of schemes with their tool names for my code to handle for the LLM to then convert into natural language response?
The killer feature of "function calling" is to allow the LLM to assert what functions (if any) need calling. Proper scheme is just icing on the cake.
the way i have seen others do it, the model just returns the one schema for the output. and the proper scheme is critical if you want to actually call the function, otherwise you need to do a lot of error handling. But generally if you need the model to decide which functions to run you have probably done something wrong in your code.
Thanks for this explanation. Allow me to suggest a subject.
Computer Vision using Convolutional NN is so different from Natural Language Processing that I’m curious to know how OpenAI, Google and others integrate the 2 in their LLMs, making such models multimodal.
wow, such a good tip. No more regex parsing. Can you make a video on the easiest way to create a rag pipeline with ollama, that allows you to create different datasets. So that you can (as you can today) specify model to interact with AND RAG/embedding dataset?
Can you use the weather Api for function calling. We should be able to prompt the nodel with any question about any location weather
Sure
Great Video!
Is the sample demo code posted somewhere?
Also is this calling OpenAI here or are you calling a local model (and which one)?
Thanks a lot, I was confused about it and I thought the model was actually calling a function. However how is called the feature that allows the LLM to grab data from an external source, like an API?
There is no such feature. No model is able to access anything outside of itself. Anything like that is a developer writing something that accesses the external info and then passes it to the model.
Thanks Matt. How to make LLMs to execute function and get back the value ( say current time ) as response of LLM ? Is it possible ?
No LLMs actually run any functions. It’s always some other piece of code that actually runs the function.
Would you suggest to use a custom schema for tool calling regardless if the chat (ChatOpenAI, ChatOllama, etc etc) function has a default one? Also Is there a way to see the exact prompt that the actual LLM is getting ? I am trying to see what ChatGrog is passing (final form of the prompt) to the respective LLMS because its default function calling (tool.bind) works really well and I want to get ideas to use with ChatOllama.
This sounds like a langchain question. I don’t have an opinion here other than trying to keep it simple and avoid it.
Thanks for explaining function calls. I think a better name is functionformatting. It's nice to finally understand what the big deal is. I played with your code to better understand it and found that haversine seems to be bad. (pip install so is latests?) But ignoring that, this jsons stuff seems to really taxing the system. 10 minutes for a response on a fairly good laptop, using gpu.
Thanks. Yeah function formatting is more appropriate
There is a problem I ran into if you have a system prompt that does not tell it to output json it seems to fight with the regular output. It shouldn’t make much difference. But the team knows about it.
Output formatting is really necessary feature. BUT Langchain library already does that. Langchain's StructuredOutputParser does the same. Why to use "function calling" feature ?
For a lot of things people build langchain and llamaindex and others isn’t needed.
"Deterministically formatted output" isn't as sexy a name, I guess. Thanks for the video!
Hi, Matt.Great video as usual, i always enjoy the way you put through your videos..simply awesome. Today i do have a request as it relates to this topic, langchain has some framework ccalled the ollamafunctions, it looks good, but i have not been able to get it to work, can you try this out?
I hadn’t looked at it yet. Will check it out.
You're 100% correct.
Great video. Thanks for clearing up the concept in Ollama. Appreciate the full explaination. Any chance of the code in a github gist?
No. ... Just kidding, though that probably doesn't come over as well as if I said it. I have been building all my examples in one directory called videoexamples (I am a pro at naming). I have been meaning to push that up. will try to do that today. Wife's sick so I am with my daughter much of the day.
Bro that thumbnail is fire lol that llama was great
Thanks for this. I was thinking I hated the thumbnail but it was so late I just wanted to sleep so I would fix in the morning. Guess no fixing is required.
No worries! I loved it!@@technovangelist
It would be great for us not-yet-quite-so-experienced users of the various api's to understand why you think the OpenAI api is so painful to use with functions. It's more verbose but you probably need that to support more complex scenarios with multiple functions and multiple schemas. I mean, OpenAI does have a certain level of experience with these things, they don't add complexity for no reason.
But it doesn’t support more complex scenarios. They were first with a lot of things and made the api overly complex with no benefit to the user. Newer products can come in and simplify knowing what we have all learned since. There will be another upstart in months or years that will allow the same cases and more in an even simpler way. The same was true building gui applications. When windows came out building apps was hard. So much boilerplate you had to write. But in the time since it has gotten simpler while allowing for more complex applications. It’s more complex because they don’t have the benefit of hindsight.
That last sentence is key. Their experience in this instance is probably their biggest weakness which comes up because there are so many alternatives. They have all this cruft in their api that they can’t get rid of without breaking a lot of apps. Being first is really hard if you aren’t really careful about designing the endpoints people interact with. Everyone else gets to benefit from the good parts of your experience and not worry about the bad.
Great video and explanation. Would be great to know how to get the missing parameters to add to the « function calling » when the user doesn’t put all of them. Does the llm manage it? Do the programmer manage it? Maybe an idea for a video ;)
You still need to do all your usual error handling. That was often the big problem with OpenAI’s implementation.
I think function calling capable models are finetuned on function calling data, it's not just format: json. Anyway I have never expected the model to call the API, as the important thing is not the call but the data, also it works for your APIs so how can the model call them? Has anybody even been confused about that?
I think its like layers of an onion. format json is a layer. providing the schema is another layer. telling it to output as json another. few shot. a function calling fine tune. all layers that help complete the feature, all of which make it more capable, reliable. Maybe with more of some layers, you can leave some out. Getting the data out, as you say, is the important part. thanks.
I hear so many folks say that what ollama is doing is not function calling because it doesn't call the function. But what was important here was to define function calling as openai applies it. And then ollama does it just as well, and depending on how you look at it, in some cases a bit better.
@@technovangelist Thanks for responding. I actually asked GPT4 what is function calling and it answered that is the capability of the model to actually call external APIs. I guess the difference is in the context, as a developer using GPT, function calling name makes sense as preparation step before you call the function. Maybe with ollama the context is more from user perspective, where they expect the function to be called. Thank you again for the conversation.
I would assume its answering based on what the words imply rather than any specific knowledge of the api. I think that’s the way any model would answer. Models say a lot of things that aren't always true.
Yeah, totally! OpenAI calling it "function calling" had me and my friends like, "What the heck?" It's a good marketing name but kinda ignorant! I don't like when marketers manipulate things like that. Another thing that still I can't get my head around of it is "GPTs", again the marketing aspect is like you are beuilging your own GPT but reality is you just what OLLAMA nicely call it Modelfile! Anyway, beside asking the main model to returns JSON in the system message, one another thing is to use the main large model to get the initial response and then a smaller model which is fine-tuned extract structural info to generate the JSON. Mixing two models can do the job, but again it's not "function calling,". Good content though - I just subscribed!
Amazing video as always
The model effectively decides when to call the function, and which function is appropriate at the time by formatting the output for the particular function. If I give it a query where taking a headless web screenshot is appropriate, then the output is formatted for that particular function. And the output does NOT particularly need to be in json either, depending on how the function is described to the openai API and how you're choosing to detect that the model "wants" to call a particular function. Respectfully, I think "function calling" is pretty appropriate, and it really seems like there's a misunderstanding of what function calling does here. Not trying to be negative, just pointing out that "just formatting" is missing the point. Of course the model doesn't have access to actually execute code on your local machine. But the "formatting" decided upon (by the model deciding which function to use) is essentially doing the exact same thing.
Based on scrolling through the comments here, disappointingly I don't expect you to be very receptive to this feedback. Just my $0.02, and I stand by it. Function calling has a lot of really great uses, and I think "just formatting," while technically true, sorely misses the point.
You can (and I do, not just guessing here,) effectively give the LLM a set of tools, trigger those tools based on output from the LLM, and provide the output from those tools back to the LLM if needed. That's pretty impressive I think.
All that being said, I really do enjoy your videos. Thank you for sharing, and I look forward to the next one.
Everything you said here describes what ollama does. Function calling is just formatting. That is the entire point of function calling. People try to say it’s more but even open ai disagrees. I am very receptive to anyone who has anything backing up the opinion that it’s more than just formatting. Please. Show something. Anything. Ollama can be given a list of tools and output the appropriate json to allow the user to call whichever function was chosen. But it is just formatting.
I prefer the term semantic mapping for this feature
Yeah. I could see that being a better term for what OpenAI offered. But we got what we got.
@@technovangelist I’ll be presenting my master thesis in a couple of weeks and totally planning on using the term (it’s about practical applications of LLMs in finance)
thank you, great video!
Glad you liked it!
Agree function calling is a confusing name. But also think ollama falls a bit short here compared to openai. Defining a schema in text as part of the system prompt feels a little imprecise to me. llama.cpp has a feature called Grammars which allows precisely defining a schema in a way the LLM can reliably reproduce. If Ollama allowed a way of defining your own custom grammar as part of the API, or a passing a JSON schema which it converts to grammar, then I think it would be comparable to openai for function calling use cases.
So your comment isn’t about ollama vs OpenAI because they essentially pass it in the context just as ollama does. But you are comparing it to the grammars of llamacpp. I think the reasoning here is that creating the grammars is extremely error prone. If someone came up with a way to do the grammars without using that horrible syntax that llamacpp is using to define them then I think that could be successful.
@@technovangelist my comment was about grammars, but I maintain ollama doesn't compare with open AI function calling in other ways. For example, if a user asks "when is the earliest flight from London to Paris", with openAI function calling, there is a specified mechanism for chatgpt to understand that a function can be called to get this information, and then interpret the return value from that function, before responding with a human readable answer to the user's questions. The equivalent mechanism doesn't exist in ollama.
As for Grammars. Yeah they are ugly, unintuitive, an all round bed dev experience. But they are a powerful feature of llamaccp none the less. I believe if we had access to setting grammars, devs would be able to build tooling around ollama that brings it closer to openAI functions.
I think this is another place where the Assistants API is confusing things. The OpenAI function calling core feature won't get the info you mention. You can call it yourself, then pass the output back to the model again for further processing. There is no difference with Ollama.
As for grammars, come up with a good way of defining the grammars and I am sure the feature will be merged in. The OpenAI compatible API was a community provided PR. I don't know if this is what the team is thinking, but I attended a talk by Joe Beda a few years ago. He mentioned how sorry he was that they made YAML the tool used for configuring k8s, and they should have done it right so no one had to touch that. They were also annoyed that every project feels like it needs to be in go because k8s was in go. It was in go because k8s was just a starter project at first to learn go. But that is unrelated to this.
Designing an API from scratch has to be done very carefully because mistakes made early on have to stick around. I think the team is being very cautious on what to add. But if you or someone else comes up with something that keeps it simple, it will get merged.
@@technovangelist I understand how function calling works and that chatgpt doesn't call the function itself. but it DOES have a mechanism to instruct the host app to call said function and parse a subsequent return value. If you believe Ollama has equivalent, then I challenge you in a future video to create an example where you can ask Ollama for flight times and it returns with a human readable response containing data obtained from a function call.
Misunderstood your comment, I apologize for that. So there is nothing stopping you from doing this in ollama as well. It does not have the ability to instruct the host app to call a function, unless instruct simply means to say, hey dev run this function. It is up to the dev to take the output of the model, then call the function, then send the output back to the model for further processing. and then return the final value to the user. Or maybe there is logic the dev has written to go back and forth a few times. There is nothing stopping you from doing the exact same thing in Ollama. But I am happy to create another video that does that round trip a few more times.
Nice, thank you.
Great videos! Thanks!
Glad you like them!
Ok, but how do you get the LLM to output a json ONLY when it thinks a command should be run?
Excellent video by the way, this will help a lot with my journey with ollama :-)
The model doesn’t run the command. If you was json out then format json is what you want
@@technovangelist I understand. thanks!
Do the llamas spit out the function calls?
Hi Matt, with the frequent releases of Ollama, which installation option on Ubuntu allows the easiest management like upgrading to the latest version or integrating with other tools? I have a dedicated machine. I can run it with Docker or without Docker.
I think just running the install script each time makes the most sense. It’s simple and it works and is easy. Some of the community installers aren’t always updated quickly.
@@technovangelist PERFECT. Thank you, good sir. I am so looking forward to your musings over the next year. Take good care of yourself.
How to integrate gorilla openfunctions in ollama?
hmmm, i don't know anything about that. I'll take a look. Thanks for the comment.
Great videos, thanks a lot. Will also investigate and praxtice. Thanks again
Great video! Cheers!
Many thanks!
What would be a better name?
The names I mentioned in the video or many of the names suggested in the comments here are great
Hi and thanks for the vidéo ! I made a chat bot with ollama and some Mixtral persona, but I can't figure out how to format when it give some code example, maybe it's just iterating in the string chat output, sorry if it's a simple question. I just think maybe it would be easier with this function calling (youuuu). #
Can you please release a video on MMLU and openAI or open source LLM
Hmm, mmlu is one of the many benchmarks out there. I haven't looked at this one specifically but I am generally not a fan of any benchmark that open sources the source questions. If I can see it, then a model researcher can incorporate those questions into the training, potentially making the model perfect in the eyes of a benchmark but garbage for real life. But I will take a look. Thanks.