Talk to your CSV & Excel with LangChain

Поділитися
Вставка
  • Опубліковано 4 жов 2024
  • Colab: drp.li/nfMZY
    In this video, we look at how to use LangChain Agents to query CSV and Excel files. This allows you to have all the searching power of a tool like Pandas but done through natural language using an LLM to help.
    My Links:
    Twitter - / sam_witteveen
    Linkedin - / samwitteveen
    Github:
    github.com/sam...
    github.com/sam...
    #LangChain #BuildingAppswithLLMs
  • Наука та технологія

КОМЕНТАРІ • 182

  • @bseddonmusic1
    @bseddonmusic1 Рік тому +12

    You are producing great content that's showing me how to exploit GPT. Thanks.

  • @rickeras
    @rickeras Рік тому +6

    Might be a good idea for a new video is Lang Flow. A GUI based tool for Lang Chain

  • @1MinuteFlipDoc
    @1MinuteFlipDoc Рік тому +1

    i was not aware of this -- cool!
    Welcome to LangChain
    LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an API, but will also:

  • @joelwilson9079
    @joelwilson9079 Рік тому +5

    Great stuff Sam. Quick question - How do we improve the model if it answers a question incorrectly? Is there a "training" mechanism or reward function to let them know it was incorrect?

    • @samwitteveenai
      @samwitteveenai  6 місяців тому

      (just seeing this now) not really you can fine tune the LLM for this task but that isn't a guarantee.

  • @mahenderp2017
    @mahenderp2017 Рік тому

    Good article with a workable example. Great work.

  • @abbuu_
    @abbuu_ Рік тому +5

    hey Sam, great video and content in general, just a quick question, how would you go about adding short term memory to a chain with Dataframe/CSV? The dataframe or csv agents have no parameter for MemoryBuffer. There are ways to read the csv or dataframe using a separate loader, but how do you incorporate it into a chain with an llm, prompt and most importantly, a memory buffer? I am trying to make it remember the questions I asked before (memory in the same chat instance, not historically - e.g. when you correct a question the llm does not understand, "I meant X")
    Thanks much

    • @generic-youtube-user
      @generic-youtube-user Рік тому +1

      Hey, i am also looking for similar functionality. Did you find anything for it? Apparantly we can use the Conversational Memory Buffer but it seems it doesn't integrate well with this csv_agent.

    • @jayrn4596
      @jayrn4596 10 місяців тому +1

      Hello guys. I am also working on a similar use case. Any solution you guys found?

    • @alperenyuksel7184
      @alperenyuksel7184 6 місяців тому

      Hello guys. I am also working on a similar use case. Any solution you guys found?

  • @RedCloudServices
    @RedCloudServices Рік тому +2

    Sam can you make a video showing how to get a reply as a Plotly chart? or a PyVis with networkx graph?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      one of my previous vids should getting replies as triples which you can use in NetworkX. Might look at making something more advanced like that

  • @kennethleung4487
    @kennethleung4487 Рік тому

    Great stuff Sam. Looks like those legacy Excel spreadsheets with macros and multiple indexes still require plenty of cleaning and preprocessing before we can use any agent on them

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes treating the doc as a spreadsheet/table and not a csv file is actually quite different. The spreadsheet way is being baked into Google Sheets and Excel so I wonder how much of a market there is for an open source system. Would love to hear your opinion.

  • @surajkhan5834
    @surajkhan5834 Рік тому +4

    can please tell me how can we use pinecone into this to store memory

    • @eshvarbalaji3950
      @eshvarbalaji3950 3 місяці тому

      If you figured it out please tell me i am intrigued

  • @vikkasgoel2465
    @vikkasgoel2465 Рік тому +1

    Hi Sam,
    Great and very helpful video, thanks.
    I have a question.
    My CSV have many columns and then there is another csv that contains the definition of each column. How to handle such case and stillbe able to ask questions on the csv.
    Vikkas

    • @samwitteveenai
      @samwitteveenai  Рік тому

      you try feeding that info in via the prompt. Just try to keep it concise.

  • @rohiniayyalraj7532
    @rohiniayyalraj7532 5 місяців тому +1

    Wat if the excel is having multiple sheets. Will it work?

  • @TienPham-rx6gk
    @TienPham-rx6gk Рік тому +3

    Hi Sam, thank you for this great tutorial. If possible, can you also show us how to use HuggingFace models for the csv agent? Also, do you have any recommendation which LLMs from Huggingface is great for this kind of task? Look forward to hearing from you soon.

    • @ibrahim-sf9od
      @ibrahim-sf9od 10 місяців тому

      Hey hi @TienPham-rx6gk
      did you find any solution?
      I am looking for an open source pre-trained model too which can do this task?
      did you find any on hugging face?

  • @rickmoni4598
    @rickmoni4598 Рік тому +1

    Possible to use Matplotlib or Seaborn to display Data Visualization as the additional output after we query the data? So you think this would work?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yeah possibly better to try doing it as a custom tool with an OpenAI Function

  • @BerwinSingh
    @BerwinSingh 8 місяців тому +1

    Hey sam,
    Great video! Can i achieve the same using Mistral or Llama 2?

    • @samwitteveenai
      @samwitteveenai  8 місяців тому +1

      with some of the finetunes of Mistral you should be able to get some ok results.

    • @BerwinSingh
      @BerwinSingh 8 місяців тому

      @@samwitteveenai thanks. Will try it out

  • @ibrahim-sf9od
    @ibrahim-sf9od 9 місяців тому +1

    Hey hi sam, I have one main question.
    Is there any open-source model where I can do the same thing ?
    or is there any open-source even close to doing what you have done here ? maybe I can fine tune and use that.

  • @shuntianli9651
    @shuntianli9651 Рік тому +1

    what is the strategy for handling large amount of csv file, for example: over 800K

  • @ambresh009
    @ambresh009 4 місяці тому

    The videos are great. Very helpful. I've a question. After loading the csv file using CSVLoader, which custom chain/agent I can use? Can you share some insights on that? Share any reference/notebook if possible.

  • @Player-oz2nk
    @Player-oz2nk 8 місяців тому +1

    🎯 Key Takeaways for quick navigation:
    00:00 🗂️ *Introduction to LangChain for querying CSV and Excel files*
    - Overview of using LangChain with OpenAI models to extract data from CSV and Excel files.
    01:25 🔒 *Security considerations for CSV agent*
    - The CSV agent runs a Python agent under the hood, caution advised for prompt injection attacks.
    02:22 🛠️ *Setting up the CSV agent with OpenAI language model*
    - How to create a CSV agent and configure it to minimize hallucination by setting the temperature to zero.
    03:48 📊 *Understanding the CSV agent's prompt and scratch pad*
    - Explanation of the CSV agent's prompt structure and the use of a scratch pad for iterative language model calls.
    05:14 🤔 *Asking the CSV agent simple and complex questions*
    - Demonstrating the CSV agent's ability to answer simple queries like row counts and more complex ones involving data filtering.
    07:32 🔄 *Using LangChain with Excel files and custom agents*
    - Converting Excel files to CSV for use with LangChain and the possibility of creating custom agents for specific tasks.
    09:22 🎓 *Conclusion and practical applications of LangChain*
    - Summarizing the capabilities of LangChain for non-technical users to query data and the invitation for feedback and subscription.
    Made with HARPA AI

  • @matthew_berman
    @matthew_berman Рік тому

    Fantastic video, Sam. I’m going to try this but use a pdf instead.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I have some chat your docs vids coming, but they keep getting delayed by LLMs getting released every day :D

    • @matthew_berman
      @matthew_berman Рік тому

      @@samwitteveenai are you just using pure langchain for it?

  • @jacksheen2574
    @jacksheen2574 Рік тому +1

    Great video Sam … I had one question - Could you please tell me how to change the agent.agent.llm_chain.prompt.template ? I will be very grateful to you if you can help me out as I am just starting to learn LangChain

  • @adriangabriel3219
    @adriangabriel3219 Рік тому +1

    could you make a video on how to correctly use a csv_agent in langchain with alpaca? I have tried the approach you showed with Alpaca and it doesn't seem to produce good results at all, so I would be curious to see how you go about it

  • @clray123
    @clray123 Рік тому +2

    Would it be capable of doing (complex) joins between SQL tables to answer arbitrary predicate logic questions using a database?

    • @kyoungd
      @kyoungd Рік тому

      Probably not, but give it a few years. Scary.

    • @thebirdhasbeencharged
      @thebirdhasbeencharged Рік тому

      To some extent if you make it aware of the tables, I've had more luck with text2sql

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      you can use the SQL Agent for that so you get SQL queries and not pandas etc. I might make a vid of that soon.

  • @EdenProvision-ISR
    @EdenProvision-ISR Рік тому +1

    Hey I ran into an issue which I found quite weird. create_csv_agent worked for me as in the video, but then suddenly I started getting an error while running the same code as before on the same file. The error was a token limit error. its only a 157 row csv file and again, it worked before on the same file, but suddenly even upon restarting kernel and reloading everything, it will not query because of this error. Anyone ran into this weird issue?

    • @westonbeck9436
      @westonbeck9436 Рік тому

      I have this issue as well but I have not been able to resolve it. Did you ever find a solution?

  • @kakaraparthiphani9983
    @kakaraparthiphani9983 7 місяців тому

    Good Video..
    I have a doubt you have taken a dataset with all columns of integers. if the columns having strings or characters..?

  • @madhu1987ful
    @madhu1987ful 7 місяців тому

    Great video. BTW, I could not extract the Prompt from the agent using the code specified in this video. It was throwing error

  • @glansingColt
    @glansingColt Рік тому +1

    how can i only print out the final answer?

  • @I_Lemaire
    @I_Lemaire Рік тому +1

    This can affect the jobs of many data workers and analysts. How can they best protect themselves?

    • @samwitteveenai
      @samwitteveenai  Рік тому +4

      I think like many areas the need to people with a surface amount of knowledge may decline, but there will still be a need for people with deep knowledge.

    • @dasman9187
      @dasman9187 Рік тому

      @@samwitteveenai How deep though? Didn't GPT 4 just pass a medical licensing exam with flying colors? I think you could potentially pivot into areas that have to do with AI, because undoubtedly many new jobs will be created from this. Many people will be left behind though.

  • @Aidev7876
    @Aidev7876 9 місяців тому

    This is exactly what I needed but can I use something more secured than langchain. For example Voiceflow on top of chatgpt? My customer is very sensitive about data protection. Thanks a lot for answering.

  • @Passe1811
    @Passe1811 Рік тому

    Could it be that the CSV agent always summarize the text? I have this "Comment" field on my CSV and when i asked for the value of that field in one of the rows, it returns me a summarize of that comment, not the comment itself 🤔.
    The original comment: The products arrived in good condition, but the delivery was delayed more than expected and the customer service did not provide me with a clear solution regarding the matter.
    The value returned by the agent: The products arrived in good condition, but the delivery was very slow.

  • @RS-vu5um
    @RS-vu5um Рік тому

    Great Video. Your sessions are super

  • @harryfinn8460
    @harryfinn8460 Рік тому +1

    Excellent video Sam, I too have a question, lets say i wanted to add to the csv_agent promt - ie tell it how it should handle date periods like "last week", ie specify it to use today as the end of period and ignore all future dates. Is there anyway to extend the csv_agaent? or do you have to write a custom agent?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      You could probably do this just by overwriting the Prompt to add it in there. See how I get the prompt to show what it is and then just assign it to that variable.

    • @harryfinn8460
      @harryfinn8460 Рік тому

      @@samwitteveenai thanks Sam, that exactly what I did! Appreciate you commenting back mate.

    • @benebento9572
      @benebento9572 Рік тому

      Me too

  • @kunalmundada8754
    @kunalmundada8754 Рік тому +2

    I approached this slightly differently by converting CSV/Excel files into SQL tables(named by name of csv). Then using the SQL agent instead of CSV agent, as GPT is well-trained for SQL queries.
    There is one downside that the SQL table do not have the correct schema for the columns. Do you see any other issues arising out of it?

    • @clray123
      @clray123 Рік тому +1

      What do you mean it does not have the correct schema? All SQL columns have names and data types.

    • @samwitteveenai
      @samwitteveenai  Рік тому +9

      I think the key thing with all of these is to experiment and see what works best for you own situation. I may make a video of the SQL Agent as well, it is also very cool.

    • @JesseDahirKanehl
      @JesseDahirKanehl Рік тому +1

      I would love to do this as well since I'm well versed in SQL and all our data is in SQL server. It would be nice to use Wolfram alpha or JavaScript libraries to generate charts or nice looking tables if the user of our chat bot wants it

    • @njokedestay7704
      @njokedestay7704 Рік тому

      @@samwitteveenai I'll be waiting for that 👍👍👍

    • @angelo3108
      @angelo3108 Рік тому

      This is wonderful idea. How long would creating this take?

  • @bourbe
    @bourbe 11 місяців тому

    Hello, I am wonderng About something, when WE se a csv agent, WE don't need to use embeding, Vector data base or a memory ? I am currenly confuse

  • @oscarsotelo898
    @oscarsotelo898 Рік тому

    Great work. I had a question, What could be the problem that it only counts 5 records when I have 200?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      It might be limited to only sending that many back to the LLM, not sure about this as I did it quite a while ago.

  • @p.v.chaitanya655
    @p.v.chaitanya655 Рік тому +1

    which LLM are you using?
    gpt3.5 or gpt4

    • @samwitteveenai
      @samwitteveenai  Рік тому

      from memory that was davinci 3 or 3.5. the code should show it.

  • @RudolphMatongo
    @RudolphMatongo 2 місяці тому

    hello there, thank you for this interesting video. I am trying to replicate this notebook but I am getting errors when I try to view the agent prompt template using this line agent.agent.llm_chain.prompt.template.
    It looks like the library has changed considerably in the time since this video was posted
    Any help would be appreciated to be able to do this step

  • @stlo0309
    @stlo0309 Рік тому

    Hi Sam. brilliant tutorial for doing exactly what the video title says. I do have a question, what actual LLM does the agent call when we simply say OpenAI(temperature=0) without specifying any model parameter?

    • @sankalpyadav373
      @sankalpyadav373 Рік тому

      Is chatgpt api become paid, it is showing that limit has been reached. Do you face same problem

    • @samwitteveenai
      @samwitteveenai  Рік тому

      when I recorded that video (a few months ago) I think it was text-davinci-003, it is probably the same with ChatOpenAI being used for the other OpenAI models.

  • @souviksen7286
    @souviksen7286 6 місяців тому

    Sam, really great demonstration on langchain CSV agents but I am getting the error OutputParserException while running the code in notebook in vs code to chat with my csv file not containing huge data only 1 sheet of 22 rows using langchain create_csv_agent, AzureOpenAI.
    How can I solve this error, Sam could you or anyone out there please give me the solution for this issue with detailed explanation?
    Please revert to me for more details on this.
    Thanks.

    • @samwitteveenai
      @samwitteveenai  6 місяців тому

      they have updated LangChain so the code on this is about 1 year old unfortunately. I will try to make a new version of the video soon.

  • @violasong6592
    @violasong6592 Рік тому

    Very nice tutorial! Thanks! I have a question tho, how do we ask questions to multiple csv files? or even multiple csv files + some txt/pdf documents?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      you can have multiple indexes and query each of them.

  • @kevinehsani3358
    @kevinehsani3358 Рік тому

    Thanks for the great video. I think you already have done pandasAI video. Would you recommend using that in place of an agent from langchain?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Good question. I think the Pandas AI is more if you are using it for personal use but LangChain if making an app for others etc. Can check the prompts from both and see what works best for you and use those as well.

  • @NileshKumarPandey-vr7pw
    @NileshKumarPandey-vr7pw 4 місяці тому

    how to persist that csv in vector db and get similar kind of response ? please help.

  • @leslietientcheu4025
    @leslietientcheu4025 4 місяці тому

    What about csv file without using csv agent please help

  • @31Miko
    @31Miko 6 місяців тому

    In a database of cars would LangChain be able to compare cars with everything about them (brand, series, model, HP, option list, etc) to another to give me a good comparison car for example a Mercedes A-Class to an Audi A3 or something like that.
    Series and model would an input from myself for which car could compare to what and some it should solve itself by comparing body types etc, but option list is not normalised for different car producers. Would vector embedding be needed for that?
    Or is a different model a better solution? For example BERT?
    Would be grateful about a response, thank you.

  • @angelo3108
    @angelo3108 Рік тому

    This is wonderful. How long would creating this app take? you made it look easy!

    • @samwitteveenai
      @samwitteveenai  Рік тому

      writing the backend is not that complicated if you look at the Colab code I provided.

    • @angelo3108
      @angelo3108 Рік тому

      @@samwitteveenai thank you so much.. Will check out out and get back to you. Again thanks for sharing your knowledge

    • @angelo3108
      @angelo3108 Рік тому

      Hi Sam.. Are you available for consultation?

  • @AkhRamy
    @AkhRamy Рік тому

    How scalable is this to large data sets, or to databases with multiple tables?

  • @guilhermeveiga9345
    @guilhermeveiga9345 Рік тому

    Thanksss man, great vid

  • @hummingbird8125
    @hummingbird8125 Рік тому +2

    Please change the davince model to chatgpt model (gpt3 turbo) for this tutorial as it is better and 10x cheap

    • @구본천-k9z
      @구본천-k9z Рік тому +1

      How to change davince model to gpt-3-turbo? when i input model_name='gpt-3.5-turbo' parameter to create_csv_agent function i got error. Could you teach me?

  • @ashishkr.229
    @ashishkr.229 7 місяців тому

    Can i give you my csv assignment?... I've to submit by tomorrow and I don't know how to do😢

  • @KiritiRayChoudhury
    @KiritiRayChoudhury Рік тому

    Sam any idea how to have this on multiple csv files

  • @구본천-k9z
    @구본천-k9z Рік тому +1

    Thank you for your informative video. I have a question for you.
    I followed your method to conduct queries and responses for the product information in my online store's csv file. However, it consumed too many tokens for just a few questions, as shown below: text-davinci, 17 requests - 42,525 prompt + 2,142 completion = 44,667 tokens. I'm wondering if converting the csv file into embedded vector values could reduce the number of tokens used in queries. I'd like to know your opinion on what can be done when the tokens used for queries and responses are excessively high.

    • @samwitteveenai
      @samwitteveenai  Рік тому +3

      Interesting what types of queries were you doing? if it was things like list all the products etc and that was more that 4k tokens yes you will have an issue, if it was just getting Pandas queries it should have that kind of issue. You are right you could use a vector store and do it that way. I have a few videos showing things like that coming out soon

    • @adolforangel1045
      @adolforangel1045 Рік тому

      Hey 구본천, great questions.
      What have you found to be best for an optimal token consumption? I started using embeddings for questions but then got to know agents and started using them. Using this agent method and asking 5 questions on a 15,000 rows table, the consumption was $0.14 USD; not that optimal. Appreciate your feedback!
      And thanks Sam Witteveen for such great content!

    • @sarveswarnaidu717
      @sarveswarnaidu717 Рік тому

      Looking for this solution
      @samwitteveenai any documentation to achieve this?

    • @PaulBenthamcom
      @PaulBenthamcom Рік тому

      @@samwitteveenai Sam, could you point me in the direction of your videos using a vector store with the pandas agent? Or indicate when you might have some videos out on it? I'm currently comfortable with the Pandas agent and adjusting the prompt but it gets expensive!

  • @benebento9572
    @benebento9572 Рік тому

    Hello Sam, when will you make a video about reading csv, pdf or txt data using free LLMs? It would be interesting to learn using alternatives to chatgpt/openai.

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      They need to be fine tuned or find prompts that can get them to stay consistent. most will not work for tools etc.

  • @boopalanm5206
    @boopalanm5206 11 місяців тому

    Hi sam had one doubt how can we chat with .xlsx file or .xls file

  • @maanyajain6105
    @maanyajain6105 Рік тому

    Hi Sam, is there any open source LLM that we could use for the same??

  • @Arocksum
    @Arocksum Рік тому

    What is the name of the OpenAI model you used inside this video ?

  • @rajatkumarsinha2159
    @rajatkumarsinha2159 Рік тому

    Awesome video!!
    Can you/anyone guide me how to load CSV file for question answering using Dolly2.0 with langchain??

    • @samwitteveenai
      @samwitteveenai  Рік тому

      I wouldn't use Dolly as it is very out of date now and the LLaMA 2 models are much better.

  • @damianogarofoli165
    @damianogarofoli165 Рік тому

    Nice video!
    I have a question though, is it possible replicate the code or the idea using a different LLM like Bloom, OPT or GPTNeoX?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      Yes but it wont work with the standard version of those models because they don't do well with these tasks. I did one No OpenAI vid and I plan another later this week, looking at what models can do what etc.

  • @Freedomwithfinance-cha
    @Freedomwithfinance-cha Рік тому

    Hi @Sam - One more question: Can i refine the prompt of the agent?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      Yes all the prompts you can change and should tune depending on the model you are using.

  • @rafaelprudencioleite7291
    @rafaelprudencioleite7291 Рік тому

    Great video! There's some notebook that show how use Alpaca Llama to talk to CSV or any other date file like Json?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      I made one and it didn't work well out of the box, so I need to finetune an Alpaca to do it. Will try to do that this weekend.

    • @rafaelprudencioleite7291
      @rafaelprudencioleite7291 Рік тому

      @@samwitteveenai thanks so much!

  • @supriyodey2461
    @supriyodey2461 10 місяців тому

    How do we aadd past conversations as memory to agent?

  • @MeanGeneHacks
    @MeanGeneHacks Рік тому

    What would be cool would be if we could visualize the data using matplotlib

    • @samwitteveenai
      @samwitteveenai  Рік тому

      this is an interesting direction a few people have mentioned and since I suck at writing Matplotlib code I probably will look into it :D

  • @MrPsycic007
    @MrPsycic007 Рік тому +2

    Can we try to do something similar with Opensource LLMs alpacalora , gpt4all ?

    • @knoopx
      @knoopx Рік тому

      been playing with this, no success so far but surely coming very soon.

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      good question I did try this on Alpaca and was hoping to show that as a follow up video but it wasn't good enough out of the box. That said it should be doable by finetuning the model first. I will have another go at it when I get some time.

    • @pwned1111
      @pwned1111 Рік тому

      @@samwitteveenai fine tune it on various pandas queries ?

    • @madakuse
      @madakuse Рік тому

      Waiting for this. Will be fantastic.

    • @knoopx
      @knoopx Рік тому +1

      Got tools and data QA working but the context size (2048) limits significantly the amount of text you can feed. And it's slow, even on 4bit. We need a non Llama based one for this to be useful.

  • @joseluisbeltramone599
    @joseluisbeltramone599 Рік тому

    Thanks for the great video, Sam. I was doing analytics on a pandas DF using the LangChain agent and came across the model’s tokens limit. Is there any way to overcome it?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      you can use the 16k context model for 3.5-turbo which is 4x longer than the normal 3.5 model

    • @joseluisbeltramone599
      @joseluisbeltramone599 Рік тому

      @@samwitteveenai I'll try. Thank you again Sam!

  • @kaustubhshingana
    @kaustubhshingana Рік тому

    How can we load multiple files ?

  • @kartikeychouhan1738
    @kartikeychouhan1738 Рік тому

    Can we use other language model like LLAMA or Alpace for reading csv like this?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      most don't have enough reasoning for doing that.

  • @xjp
    @xjp Рік тому

    Possible for the agent to query data from 2 csv files instead?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      yes but will need to change some of the internal code

  • @FFF0007
    @FFF0007 Рік тому

    Awesome content! Simple and effective. Congrats :) ((small question: is it possible to use an alternative to OpenAI for this task? Some LLM providers such as SelfHostedPipeline or SelfHostedHuggingFaceLLM?! Thanks in advance.

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Yes you can, but often models like Alpaca etc. weren't trained on instructions that allow this to work, so it would need finetuning.

    • @FFF0007
      @FFF0007 Рік тому

      @@samwitteveenai great to know, thanks. I am going to watch your finetuning video first :)

    • @snat2100
      @snat2100 Рік тому

      Hi @@samwitteveenai, do you have any tips / links on how to build instructions dataset from csv tables to finetune LLMS like Alpacas ?

  • @micbab-vg2mu
    @micbab-vg2mu Рік тому

    Thank you :)

  • @sanakmukherjee3929
    @sanakmukherjee3929 Рік тому

    Nice explanation. Can you help me add this to a custom csv dataset.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      custom csv should work just fine.

    • @sanakmukherjee3929
      @sanakmukherjee3929 Рік тому

      @@samwitteveenai yes I found that but how do access conversationbuffermemory with it

  • @ranausman143
    @ranausman143 Рік тому

    Does it sends / uploads your csv data somewhere? I explicitly wanted to know about data privacy.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      not you full file but if you use OpenAI like this then some of the data will be included in the prompt.

  • @theh1ve
    @theh1ve Рік тому

    Could you do this but not using chatGPT? I would need to use a local LLM is that at all possible?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      yes but you would probably need to finetune the local model for this task.

  • @fiellin
    @fiellin Рік тому

    any idea to process multiple csv/excel data on it?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      you could run it multiple times and then merge the outputs to a summary chain. This would require making a custom agent etc.

  • @chinmaybhat9636
    @chinmaybhat9636 Рік тому

    HI @Sam Witteveen I am getting Rate Limit Error Can you guide me how to do that ?

    • @samwitteveenai
      @samwitteveenai  Рік тому

      That sounds like an OpenAI issue, leave it and try a bit later sometimes their API has issues

  • @GMCvancouver
    @GMCvancouver Рік тому

    I have private documents (Excel &CSV)I can't share it with openai , is there anyway to do it as private GPT ?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      Yes you can try some of the open source models. I am going to revisit this in some more vids soon.

    • @GMCvancouver
      @GMCvancouver Рік тому

      @@samwitteveenai Many thanks Sam, that would change my life I have plenty of CVS & excel files and existing LLM like groovy and snoozy from gpt4all are unable to read CSV & Excel correctly. That would be great to have tutorial video ☺️

  • @JiandiDong
    @JiandiDong Рік тому

    is there a limit for the size of the csv file?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      possibly but they way it works as long as the CSV can be loaded into memory, then pandas queries can be run on it.

  • @MaxKamrani
    @MaxKamrani Рік тому

    what about large csv ?

  • @techieinside1277
    @techieinside1277 Рік тому

    could you make a video on using langchain and llama to connect llama to the internet? maybe using alpaca13b or alpaca7b?

    • @samwitteveenai
      @samwitteveenai  Рік тому +1

      I am looking into this, the challenge is to do it well LLaMa needs to be trained on a unique dataset. Still working on it

    • @techieinside1277
      @techieinside1277 Рік тому

      @@samwitteveenai i see. what about something like vicuna

  • @AshishKumarRajak-xg7il
    @AshishKumarRajak-xg7il 7 місяців тому

    do i have to add open ai api key myself?

  • @rohitchan007
    @rohitchan007 Рік тому

    🔥🔥🔥🔥

  • @MrShivrajansingh
    @MrShivrajansingh 6 місяців тому

    I tried this but the result is not satisfactory

    • @samwitteveenai
      @samwitteveenai  6 місяців тому

      I need to revisit tabular data with these again soon, there are lots of new ways to approach it. I think this vid is close to a year old now

    • @MrShivrajansingh
      @MrShivrajansingh 6 місяців тому

      @@samwitteveenai Thank you so much for your reply, I really appreciate it, The issue seems to stem from LangChain's processing, where it embeds document data and searches for the closest matching data before reconverting it into text. This can lead to errors, particularly with logical answers. For instance, calculating the average expenses for specific categories like food is problematic. This is because the process requires access to the entire CSV dataset, and LangChain struggles to retrieve specific data if the corresponding keyword is missing from the CSV.

    • @samwitteveenai
      @samwitteveenai  6 місяців тому

      @@MrShivrajansingh often for this kind of thing it is better to just get treat the csv as a SQL db and use the LLM ti just write SQL quieries

  • @PokemonParadise2010
    @PokemonParadise2010 5 місяців тому

    Can this do graphing too?

    • @samwitteveenai
      @samwitteveenai  5 місяців тому

      graphin in what sense? plots? you could make an LLM write the code for a plot. if you are talking about Knowledge graphs then yes but in a different way

    • @PokemonParadise2010
      @PokemonParadise2010 5 місяців тому

      @@samwitteveenai So like if i ask " Make me a line plot showing the trend of xyz from 2005 - 2010 using the Plotly library" (assuming I have that data ofc!), I would want it to make me a line graph using Plotly

  • @satishkumar-ir9wy
    @satishkumar-ir9wy Рік тому

    Nice content, I have few queries:
    1. if i use OpenAI API key, is it like my organization's data will get exposed?
    2. Can you make video on how to develop a model to extract question answers from my Organization's data (available in CSV in Excel format only).
    In my case i want to create the similar question answering bot or web app with my organization's data.
    Anyone has any idea about that.

    • @samwitteveenai
      @samwitteveenai  Рік тому

      anything you pass in the prompt will be data OpenAI has access too. So be careful.

    • @satishkumar-ir9wy
      @satishkumar-ir9wy Рік тому

      @@samwitteveenai thanks for quick response
      Can you guide me to create a chat gpt like chat bot to answer queries based on my Excel data

    • @tusharbokade8378
      @tusharbokade8378 Рік тому

      @@satishkumar-ir9wy Hey, I am also looking to create a similar chatbot. Were you able to create one?

  • @chrisweeks8789
    @chrisweeks8789 Рік тому

    Is it possible with alpaca models?

    • @samwitteveenai
      @samwitteveenai  Рік тому +2

      not with the straight Alpaca model. I have tried it and didn't get good results. But I am working on a finetuned version of Alpaca to do it.

    • @chrisweeks8789
      @chrisweeks8789 Рік тому

      @@samwitteveenai i shall sub and eagerly wait for its arrival

  • @priyashn5715
    @priyashn5715 Рік тому

    How can I add custom template/prompts?

  • @patrickmihalcea6480
    @patrickmihalcea6480 Рік тому

    Lol now try doing it in typescript

  • @MrSaixxx
    @MrSaixxx 10 місяців тому

    i need json responce

  • @ranati2000
    @ranati2000 5 місяців тому

    agent.agent.llm_chain.prompt.template
    AttributeError: 'RunnableAgent' object has no attribute 'llm_chain'

    • @samwitteveenai
      @samwitteveenai  5 місяців тому

      this is over a year old they have updated since then. I will make an update at some point.

    • @RudolphMatongo
      @RudolphMatongo 2 місяці тому

      did you ever find a way to fix this issue?