Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3

Поділитися
Вставка
  • Опубліковано 26 чер 2024
  • Code : github.com/samwit/agent_tutor...
    🕵️ Interested in building LLM Agents? Fill out the form below
    Building LLM Agents Form: drp.li/dIMes
    👨‍💻Github:
    github.com/samwit/langchain-t... (updated)
    github.com/samwit/llm-tutorials
    ⏱️Time Stamps:
    00:00 Intro
    01:27 Phi-3 Model Blog
    02:01 Gorilla Paper
    02:12 Function Calling Leaderboard
    03:00 Code Time
    03:02 Set up Llama 3 with Ollama
    03:45 Set up Prompt Template
    05:31 Get a JSON Output from Llama 3
    08:50 Get a Structured Responses using Ollama Functions
    11:44 Phi-3 Model Demo
    12:57 Tool Use and Function Calling Sample
    15:09 Trying Tool Use and Function Calling with Phi-3
  • Наука та технологія

КОМЕНТАРІ • 71

  • @OliNorwell
    @OliNorwell Місяць тому +16

    I would recommend adding 'Langchain' to the title of the video, most of this is very langchain specific, for those specifically searching for that.

    • @samwitteveenai
      @samwitteveenai  Місяць тому +5

      Very good point. Added! Thanks!

    • @marilynlucas5128
      @marilynlucas5128 Місяць тому

      @@samwitteveenai Great experiment you're running there but please consider using lm studio's new cli as well in your subsequent videos instead of ollama all the time. Also can you try using Anima's air llm library so you can run the llama 3 70B locally using layered inference?

    • @samwitteveenai
      @samwitteveenai  Місяць тому +1

      I haven't heard of Anima's air llm library but will check it out

    • @theh1ve
      @theh1ve Місяць тому

      Lm studio isn't as 'open' as ollama so would restrict the use cases to just personal use.

  • @jiyuhen
    @jiyuhen Місяць тому +3

    Thank you for doing this with Ollama, this was an really good explanation and helped me a lot!

  • @jonm691
    @jonm691 24 дні тому +1

    Great video, very informative, and filled some gaps. Thank you

  • @rickymehra1104
    @rickymehra1104 Місяць тому +3

    Thank u sharing video of ollama with phi3 to run locally, hope u would come up wid more such videos to use ollama locally for different tasks. Pls mk more videos on phi3, llama3 with ollama.

  • @chriskingston1981
    @chriskingston1981 Місяць тому

    Ah really needed this, I kept feeling, I want to learn function calling with llama3. Feels so good to use a local model with function calling, and langchain made it really easy to do. Love to experiment with it now, thank you so much for this video❤️❤️❤️ and thanks to langchain for making it easy to do function calling ❤️❤️❤️

  • @sven262
    @sven262 Місяць тому

    Thank you so much. Super helpful.

  •  Місяць тому +4

    Excellent as usual! For Phi3 3.8b latest it works fine with:
    prompt_phi = PromptTemplate.from_template(
    """{context}
    Human: {question}
    AI:"""
    )
    Otherwise you will get validation errors.
    All the best Sam!

    • @gregorychatelier2950
      @gregorychatelier2950 Місяць тому

      I got validation errors both with llama3 and phi3. Worse, the LLM was answering wrong, it returned Alex.
      Changing the prompt solved it. I tried Mistral v0.3, and it works too.
      Sam, I wonder where you found the recommanded prompt formats ?
      Also I would appreciate a video on how you handle validation errors as they may occur from time to time.

    •  Місяць тому

      @@gregorychatelier2950 Hello it seems that the Llama3 prompt format had been altered a bit a few weeks after model's release (Reddit). To be double checked...

  • @aa-xn5hc
    @aa-xn5hc Місяць тому

    Amazing video 🙏🏻
    Currently using crewai

  •  Місяць тому

    Very useful, thanks!

  • @mshonle
    @mshonle Місяць тому

    For local models I’ve found it’s helpful to at extra context at the very end of the prompt, in the assistant reply section (not the instruction section), kicking things off with “Sure, here is your JSON:” and then adding markdown syntax for preformatted text and then letting one of the end symbols be the final three backticks to close the markdown. It’s also helpful to write a custom grammar (like with llama.cpp) to constrain output to a specific schema even. (Depending on your setup this could slowdown inference if the constrained generation part isn’t running on the GPU.)

  • @eduardovernier7628
    @eduardovernier7628 Місяць тому

    Very cool! I've been using the instructor library with pydantic for structured output and had a lot of success on openai models, but it didn't work very well with local llms. I'll definitely try out your approach!

  • @andyma1146
    @andyma1146 Місяць тому +1

    Thanks for the video! I'd like to see an example of using DSPy to optimize a local model so that it can use tools more reliably. I'm actually not sure if this would work but I'd like to find out. 😃

  • @hienngo6730
    @hienngo6730 Місяць тому +1

    Thank you for the informative videos as always. One note: if you want to run things all locally and want a lot better throughput, running the models using vLLM and serving the API with vLLM's OpenAI-compatible server is definitely the way to go. If you have a 24 GB VRAM GPU like a 3090 or 4090, you can run a GPTQ or AWQ quantized model, or just the full FP16 model and serve a large number of concurrent clients. With batching, you can get thousands of tokens per second in aggregate for responses if you run a lot of parallel clients.

    • @jay-dj4ui
      @jay-dj4ui Місяць тому

      linux only, and I am not sure it has enough performance like that. Multiple API calling contiusely sounds great. just not sure....

    • @marilynlucas5128
      @marilynlucas5128 Місяць тому

      You can run the llama 3 70b model with as little as 4gb gpu using Anima's air llm library which enables layered inference.

    • @hienngo6730
      @hienngo6730 Місяць тому

      @@marilynlucas5128 I've never used this library before, what kind of tokens per second speed can you get? For reference, using LLaMA-3 70B with exllamav2 quantization at 2.4bpw on a single 4090, you can get around 36 tokens/second. With 2x4090s and 5.0bpw quantization, you get around 18 t/s.

  • @alx8439
    @alx8439 Місяць тому +2

    The biggest issue with function calling is that the way everyone suggests to use it is not very viable / economical, if you want your model to choose one out of many functions to call. I'll elaborate: in order for LLM to pick a function to use, you need to announce all those tools in advance and make sure it hasn't forgotten them, if you're going into multy turn chat. This means more context will be used just to make model aware about all these extra tools you want it to use and less context will be available for responses. There's probably some semantic router needs to be introduced in-between to give model only those tools which might be relevant to current question

    • @brianmorin5547
      @brianmorin5547 Місяць тому

      100% my experience as well. In fact, I’ve only had success doing function calling by putting it at the individual run level rather than at the model level and only calling a single function that will be needed

    • @tonyrungeetech
      @tonyrungeetech Місяць тому

      I have a video doing exactly this with a library called semantic router and crew-ai!

  • @kenfink9997
    @kenfink9997 Місяць тому

    Great video as always! In future videos, could you please show how to do this with Ollama and langchain running on separate computers? I'd like to develop on Laptop or Colab with just inference running on my Desktop PC. And since Ollama doesnt currently do API keys, how do we secure the inference server and access it from a Colab notebook?
    Thanks!!

  • @Shiroikage98
    @Shiroikage98 Місяць тому

    would love if you can explain this using the ollama python package. As someone else said this is very specific to langchain and i just cant find good information on how to use function calling with ollama.

  • @CraftPit
    @CraftPit Місяць тому

    Phi3 excels at creative language tasks, surpassing even GPT-4 in my tests. GPT-4 itself ranks Phi3's lyrics higher :)

  • @Carnivore69
    @Carnivore69 Місяць тому

    Great video. I was hoping this would give me a reason to try LangChain vs my own prompt/post-parsing for a web ui, but I'm actually getting better results than this demonstrates. I'm using llama3-8B via LM Studio. I think until these guys get their sh*t together and create a standard for output, this is going to be similar to the browser wars (standards). At the very least, they should all conform to current markdown standards or accept a config/spec for default output. Whoever comes out with an open source competitive model that does this is going to be the clear leader... for me anyway.
    ...And if such a model exists, please point me to it!! :)

  • @user-iu5ue4bv8q
    @user-iu5ue4bv8q Місяць тому

    Thanks for sharing this, how can I use this json output funcution call format to combine the langchian agent functuion call framework , which. Use the llm.blind_tool to replace the llm=ChatOpenAI()? Will this work? Thanks

  • @comfixit
    @comfixit Місяць тому

    I have found Phi-3 truly impressive for its size, getting good results even for general inquiries. I almost wonder if you could just use Phi-3 if you don't need a super refined response. It's so light on resources comparatively for an LLM.

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      Agree it is a nice model especially when you consider its size

  • @jay-dj4ui
    @jay-dj4ui Місяць тому

    Hi<
    Is that because we try to give it as much more accurate and better machine-readable input, so the model does not have to 'think' too much that it can follow the correct format like JSON and some basic function, and it can meet some complex requirements also. The way is more efficient and energy-saving.

  • @pokeastuff
    @pokeastuff 14 днів тому

    Which version of Phi 3 are you using? I'm having trouble replicating your results for the structured_output example as Phi 3 is not returning any "tool_calls".

  • @MukulYadav-pw9se
    @MukulYadav-pw9se Місяць тому

    wow Sam!!!, this video is really helpful but i am facing challenge in running it on server as the response is not coming within 1 min and i am getting 504 Gateway Timeout error, i have used ollama docker image to install ollama but i am not able to find how to increase gateway timeout to 10 mins instead of default 1 min.
    Can you please help if you have faced such issue?

  • @svenvanwier7196
    @svenvanwier7196 Місяць тому

    I see you use a mac mini, could you talk more about what model and OS setup?
    Thinking of fun things to do with my 2011 2ghz i7 16gb ddr3 ram, a local something on my network if I could.

  • @pensiveintrovert4318
    @pensiveintrovert4318 Місяць тому

    I have been running gpt-pilot with Llama3-70b-instruct.Q5_K_M for a couple of weeks. The biggest problem I have, as far as I understand, is not function calling but rather the stability of the framework. It starts developing a bunch of files, but when I provide feedback, it may abandon the old files instead of correcting them, and starts creating a new set of files. Basically makes a mess.

  • @sumanthbalakrishnan285
    @sumanthbalakrishnan285 Місяць тому

    How do I incorporate function calling with follow up questions and memory. Say a user asks “what is the weather”. The model should be able ask “what place are you requesting for” and say the user replies “California”
    It should then make the function call with the mentioned arguments. Please let me know which direction I should look in order to achieve this.

  • @kallebysantos5167
    @kallebysantos5167 Місяць тому

    Is possible to fine tune a small language model for function call?
    For example, if we look to BERT models that perform zero-shot classification we can pass a set of labels to it, so maybe is possible to use a similar approach to get a very performatic model just for function calling, since LLMs are very huge and almost every time requires a GPU. I know that phi3 is very small but in my machine it takes like 3Gb of GPU.

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      Yes very possible to do the key is getting the dataset and most people aren't making their datasets for this public.

  • @MeirMichanie
    @MeirMichanie Місяць тому

    Thanks for the code and the explanation. In order to be usable, you should be able to execute the function feed the info back into the history of the conversation with the result of the function and then the llm should be able to use the results from the function to write the last message.
    For instance, lets say that the weather tool responds with just the temperature and nothing else, then the LLM should be able to respond back 'in Singapore the current temperature is ..." and in the same language as it was asked from the user.

    • @superstippi
      @superstippi Місяць тому

      Absolutely agreed. It seems to be very hard to find information on how to do exactly that. The Phi-3 chat template doesn't seem to introduce a dedicated role for a function call result. So if it seems to be the "user" replying with a function call result, why would the model figure that it needs to phrase that into a coherent message? Also, I fail to get sensible output when there is more than one function declared and the model is supposed to be free to use a tool or reply directly. Often, I get long chunks of what appears to be training data appended to the initial reply.

  • @maths_physique_informatiqu2925
    @maths_physique_informatiqu2925 23 дні тому

    when i try to execute the code , it shows this error : langchain_community.chat_models.ollama.ChatOllama() got multiple values for keyword argument 'format' (type=type_error) , any solution please (i didn't change any thing from the code in github link) ?

  • @swapnil0402
    @swapnil0402 21 день тому

    Function calling is very difficukt. I am trying to do a POST api call with Llama3:8B, Ollama and CrewAI. My use case is i get a text string of OCR data and then i need to map certain foelds from that OCR to a JSON and send that JSON to the POST api to save rhe rransaction. It is way way difficult to build it. But if Langchain tool can solve woth Cloaed models like GPT-4 rhen it can u lock a good enterprise value

  • @harshkesharwani8730
    @harshkesharwani8730 Місяць тому

    How to use chatOllama along with function calling. i want to pass messages along with functions same as open ai v1/chat/completions api provides.

  • @DakshSripada
    @DakshSripada 21 день тому

    Now how do i get the actual output from the function?

  • @peterdecrem5872
    @peterdecrem5872 Місяць тому

    What was the name of the paper that shifts the probabilities to get json as response more likely?

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      Can check it out here github.com/1rgs/jsonformer

  • @madhudson1
    @madhudson1 Місяць тому

    all looked well and good until you try feeding a question into the 'agent' that doesn't relate directly to: "get the current weather in a given location".
    I thought the whole point of function calling/tooling was to present the LLM with the opportunity to use tooling if necessary.

  • @alx8439
    @alx8439 Місяць тому

    At last someone finds a good use for agents - to give them some tasks you want accomplished and give loose it free overnight to use internet :)

    • @willjohnston8216
      @willjohnston8216 Місяць тому +1

      I don't understand how this demonstrated using agents overnight on the Internet? I'd really like to know how to do that. What did I miss?

    • @alx8439
      @alx8439 Місяць тому

      @@willjohnston8216 Mr. Witteven just mentioned this as a possible implication. I was just glad more people to turn their minds into some real world use cases for agentic flows - like giving a topic for your agent and let it research it, find products / software, which you would never find in ads, do some data gathering and processing for you, providing helpful summaries on a hot topics you never have time to investigate properly yourself, etc etc etc

  • @kaushiklade
    @kaushiklade Місяць тому

    Hey, thats very helpful to understand how to run these models locally.
    Can u/anyone tell, how to actually do actual function call and pass that response to llm? Is it possible without LangGraph???
    I want llm to decide which tool to call, once he decide that, llm should do entity extraction and then invoke tool, then returns ans back to llm and gives it to user. This was easy with AgentExecutor in OpenAI examples.
    Similar thing possible in Ollama?

  • @AIvetmed
    @AIvetmed Місяць тому

    has someone tried to load the models other than using ollama like the huggingface transformer pipeline or in other words I would love to know how torun these models in Linux based servers like databricks where I am unable run ollama application in the background like in my windows PC?

    • @MavVRX
      @MavVRX Місяць тому

      Ollama already supports windows

    • @AIvetmed
      @AIvetmed Місяць тому

      @@MavVRX for Linux based servers like databricks server

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      I made a Llama3 review deep dive video and show loading that in HF Transformers there in a colab

  • @RobBominaar
    @RobBominaar Місяць тому

    Well, actually, where are the functions? I only see a Json string.

  • @harshkesharwani5621
    @harshkesharwani5621 Місяць тому

    Can I use function calling with llama.cpp?

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      in theory yes but might need to mess with how to get it accept them etc.

    • @harshkesharwani5621
      @harshkesharwani5621 Місяць тому

      How one can pass multiple functions and let model decide to use particular one. Does it supports multiple functions

    • @MavVRX
      @MavVRX Місяць тому

      The bind function takes in an array of functions so you can simply add the additional functions to the array separated by commas. E.g. [f1, f2]

    • @harshkesharwani5621
      @harshkesharwani5621 Місяць тому

      But how to use function calling along with chat message like user, system and assistance role

  • @StephenRayner
    @StephenRayner Місяць тому

    You are not using latest version. It’s now called “bind” not bind_tools

    • @samwitteveenai
      @samwitteveenai  Місяць тому

      I am using the latest langchain-experimental 0.58 the bind is used in the main function calling with prop models for the OllamaFunction they still have it as bind_tools. If I am missing something send me a link.

  • @AnthonyGarland
    @AnthonyGarland 15 днів тому

    hmmm. my hobby is pizza as well. :)

  • @hightidesed
    @hightidesed Місяць тому

    very cool, but this is kind of useless unless you can mix text responses and function calling with the same prompt

  • @Anthony-dj4nd
    @Anthony-dj4nd Місяць тому

    This is like the reverse of crypto mining. Lol😅

  • @meca_p
    @meca_p Місяць тому

    I hope you to make react agent tutorial with ollamafunction..!