LangChain101: Connect Google Drive Files To OpenAI

Поділитися
Вставка
  • Опубліковано 3 лют 2025

КОМЕНТАРІ • 96

  • @temozarela
    @temozarela Рік тому +6

    I'm so obsessed goin through all of this videos one by one. No better way to spend my Saturday, especially when things work!
    Thanks for your amazing contribution!

  • @adamsardo
    @adamsardo Рік тому +2

    Appreciate what you've been doing and the time you've spent helping the community :)

  • @davidwu3247
    @davidwu3247 Рік тому +1

    awesome vid. can't wait till GPT4 is out and we can use google drive photos/text as multimodal input

  • @moreshk
    @moreshk Рік тому +6

    might be a bit silly to ask, but it would be useful if you can provide some guidance on how to setup the credentials json. Have been fumbling on it.

  • @rossgalvanofficial
    @rossgalvanofficial Рік тому

    Thank you for sharing this, very interested.

  • @fliu5282
    @fliu5282 Рік тому +3

    Python + LangChain + Html basic coding = Big Future = Prompt Engineering

  • @briandao975
    @briandao975 Рік тому +1

    Awesome video thank you. Do you have a video on how to utilize embeddings in the sample scenario. Would like to create something similar but have a lot of docs. Also is there a way to refresh the embeddings automatically or on a schedule? For example, if the doc gets updated, how does that get handled

    • @eracton
      @eracton Рік тому

      Did you figure that out?

  • @weipingwu7852
    @weipingwu7852 Рік тому +1

    thanks very much! I have a question, I want to control the usage of document, only for my company internal use. If I use langchain, is the other party include openai can see my document? thansk

    • @DataIndependent
      @DataIndependent  Рік тому +1

      Yes, if you use OpenAI as your LLM then they can see your data. Check out their data retention policies for more information.
      You could do a self hosted LLM for privacy reasons but that is more set up

    • @Iammikelovin
      @Iammikelovin Рік тому

      Hi, can you recommend info on self hosted LLM? Can I use OpenAI and basically not have them retain my data? Or do I have to use another LLM?

  • @bladeplays6425
    @bladeplays6425 Рік тому +1

    One use case that I would love to see is how this performs on Excel/Google Sheets Data. Given event/log data from a website or a mobile app and documentation on what activity each event type in the log represents, does the model know how to answer questions about frequent (or user-specific) app activity?

  • @carlosterrazas5091
    @carlosterrazas5091 Рік тому

    Great content, just a question about security of the information. Do you know if this way ChatGPT will see the information like if you enter it on their platform?. My concern is if you use for private documents then the info will be in ChatGPT database for everyone to see, thanks

  • @badrinarayanans355
    @badrinarayanans355 7 місяців тому

    Great Insights

  • @VictorCardonan
    @VictorCardonan Рік тому +5

    Hello, thank you for the videos. They are really interesting. I have two questions:
    1) Why are you not using embeddings in this case?
    2) Would it make sense and it is possible to save the state of the summarizer so you don't have to do all the process from scratch if you have +1000 documents?
    Thank you

    • @MK-jn9uu
      @MK-jn9uu Рік тому +1

      I was thinking the same thing..

    • @EstherL-wd9yx
      @EstherL-wd9yx Рік тому +1

      @DataIndependent - My main question is #2: How can we build a database of documents so that the knowledge db grow and not do all of the processing from scratch?

  • @AizzatAffero
    @AizzatAffero Рік тому

    Once langchain read all of it, does it store the data when we reopen it again?

  • @RussellDeming
    @RussellDeming Рік тому

    Definitely interested in implementing in my business

    • @DataIndependent
      @DataIndependent  Рік тому

      Nice! What domain are you in? How are you thinking about using it?

  • @blocksystems202
    @blocksystems202 Рік тому

    You're amazing - thanks for sharing.

  • @ivantan222
    @ivantan222 Рік тому

    4:00 That's a pretty short summary of the long text, is there any parameter to make it longer?

    • @DataIndependent
      @DataIndependent  Рік тому

      You can see here the prompt that is being used to generate this summary
      github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/stuff_prompt.py
      Under the hood it's just a prompt with your text in it. You could adjust the prompt manually (not by using the chain, but doing your own prompt) to get a longer one.

    • @ivantan222
      @ivantan222 Рік тому

      @@DataIndependent ah okay, thanks a lot for your info.

  • @ujjwalgupta1318
    @ujjwalgupta1318 Рік тому

    Is this and directory loader not doing a similar sort of thing?

  • @DheerSinghDel
    @DheerSinghDel Рік тому

    Can u exactly explain the path of credentials folder assuming that I am working with GoogleColab and drive folder path where ipynb file is residing my drive at /ColabNotebooks/LangChain/drivetest.ipynb

    • @DataIndependent
      @DataIndependent  Рік тому

      I would put this question into chatgpt and have it work with you on the details.
      It requires knowledge about your setup which I don't have

  • @federicogiacomarra
    @federicogiacomarra Рік тому

    Not sure if this is explained elsewhere, can you retrieve the source document somehow together with the answer?

  • @leticiaromanbernal4151
    @leticiaromanbernal4151 Рік тому

    Hi, I would like to know if there's any possibility to connect Google Sheets from my Google Drive account as it does with Google Doc. Please help me. Thanks a lot :)

    • @DataIndependent
      @DataIndependent  Рік тому

      big time - you can use langchains drive loader python.langchain.com/docs/modules/data_connection/document_loaders/integrations/google_drive

  • @vinosamari
    @vinosamari Рік тому

    Please do a map-reduce video

    • @DataIndependent
      @DataIndependent  Рік тому

      Here's a video explaining the different chain_types
      ua-cam.com/video/f9_BWhCI4Zo/v-deo.html

  • @AnkurChauhan-n3z
    @AnkurChauhan-n3z Рік тому

    Hi Greg, I am getting an error while trying to connect Google Drive files to OpenAI and the error is below:
    ValueError: Client secrets must be for a web or installed app. May you please me to resolve this error. I am using Azure credentials.

    • @DataIndependent
      @DataIndependent  Рік тому +1

      Because Azure and Google Drive are run by different companies the credentials won't work.
      Try getting google credentials

    • @AnkurChauhan-n3z
      @AnkurChauhan-n3z Рік тому

      @@DataIndependent Thanks Greg 😇

  • @wardaraees4887
    @wardaraees4887 Рік тому

    I want to ask question to my excel files or a dataset which is in csv format (not a text file) or may be want to get a file in a form of table from sql server which is a result of a sql query, is it possible to upload that file in googledrive the same way or this method is for just text files?
    Or is there any direct way to ask question yo my sql table with open ai?

    • @DataIndependent
      @DataIndependent  Рік тому

      Check out the langchain documentation for how to query sql files, it's very doable.

  • @photon2724
    @photon2724 Рік тому

    Another fantastic tutorial! although, what is the credentials.json file? and how can i get my own?

    • @DataIndependent
      @DataIndependent  Рік тому

      Thanks! That is on the google side of the house.
      developers.google.com/workspace/guides/create-credentials

    • @anishmanandhar1203
      @anishmanandhar1203 Рік тому

      and how do we do with it , how do we get the .json file@@DataIndependent

  • @nsitkarana
    @nsitkarana Рік тому

    Nice video. I have one follow up - when i do any kind of interaction with openai (for instance the doc from google drive) or in the other video where i chunk/embed local documents, how safe are the personal documents. in other words, how safe is it to use openai for personal documents ? does anyone have any idea on that.

  • @manyavarshney4399
    @manyavarshney4399 Рік тому

    Hello, can you resolve my error? I gave credentials path and it got executed. But when I loaded document, it displayed "Access blocked to the Google Drive API"

    • @DataIndependent
      @DataIndependent  Рік тому

      Have you googled it? that sounds like a google credential issue

  • @rahuliitm
    @rahuliitm Рік тому +1

    Great tutorial. Absolutely loving it. I'm trying to read a gitbook and summarise it but apparently there's a prompt context length limit.
    "This model's maximum context length is 4097 tokens, however you requested 7592 tokens"
    Not sure where I can set the token limit

    • @jmanhype1
      @jmanhype1 Рік тому

      yea thats why hes selling his service to fill in the gaps

    • @DataIndependent
      @DataIndependent  Рік тому +2

      Nice! Yes there is a context limit for prompts. Check out either my video on asking a question to a 300 page book or else my "work arounds for prompt limit" video

    • @DataIndependent
      @DataIndependent  Рік тому +4

      Nothing to sell here - happy to help with any questions you have though

  • @TreiGamer
    @TreiGamer Рік тому +1

    Hey Data Independent, I'm new to Python and coding in general but AI has been the push I need to really dig into this. I got Jupyter running locally, is there a recommended resource you'd point me towards for bringing your code into it?

    • @TreiGamer
      @TreiGamer Рік тому +2

      Haha never mind, I figured it out. I just asked GPT 🤣
      Love your content.

    • @DataIndependent
      @DataIndependent  Рік тому +3

      Nice! That's great. What I was going to say is:
      Easiest - Copy and paste the code from the github link in the description into your jupyter notebook
      More Robust - Git clone the repo so you can stay up to date with future changes as well

    • @TreiGamer
      @TreiGamer Рік тому +2

      I did the git clone method. Thank you.

  • @Iammikelovin
    @Iammikelovin Рік тому

    Hello, I have just started watching a few of your vids, they’re super interesting and really well explained, thanks! Q: The source files, in my case several PDF docs, are confidential and my idea is to create a internal Q&A. What is the privacy? Does LongChain or OpenAI potentially have access to it? Does it add it to its “brain”? Or is it completely private? Thanks again

    • @bagamanocnon
      @bagamanocnon Рік тому +1

      Data used through the Open AI APIs like the questions fed to the LLM and the answers outputted by the LLM (what Open AI calls prompts and completions, respectively) will be stored on their servers for 30 days before being purged. Per their policy, only a limited number of employees within OpenAI itself - only those employees who are monitoring it for abuse - will have access to the data. For enterprise customers, they might even have the option to totally opt out of having their data stored at all. Look up Open AI API usage policies. I can't paste link here.
      Using their embeddings service also exposes your data to OpenAI.
      The demo in this video doesn't use embeddings but (it reads the text directly) but you almost always want to create a vector index with embeddings for your knowledge base (kb) specially if it consists of hundreds or thousands of documents. LLMs has an easier time 'reading' vector values rather than raw text. cheers.

    • @DataIndependent
      @DataIndependent  Рік тому

      Agree! and if you don't want OpenAI to have your data then you should be using a local model

  • @ahsanahmad3193
    @ahsanahmad3193 Рік тому +1

    Should have shown the structure of credentials file. Maybe add in comment.

  • @cgtinc4868
    @cgtinc4868 Рік тому

    Great video and as founder of startup need this tool! Is there a way not to access Google drive but like Synology Nas (which we use), that will be really really helpful

    • @DataIndependent
      @DataIndependent  Рік тому

      Thank you! I've never heard of Synology. For it to integrate it would either take a custom data loader from LangChain/Unstructured or you'd need to export the files you'd want to another spot.

    • @cgtinc4868
      @cgtinc4868 Рік тому

      @@DataIndependent Thanks! its just a brand for external NAS setup. Maybe you can have a video on local HD drive which with that we can just change the path for wherever the source of the documents are :)

  • @cgtinc4868
    @cgtinc4868 Рік тому

    Sorry for noob question, where to place the "../../desktop_credetnaisl.json" as to admit that I am a non coder, just following your video along the way

    • @DataIndependent
      @DataIndependent  Рік тому

      Nice! You can place your credentials file where ever you want.
      By default your program will usually look in a root folder, but you can tell it to look whereever you need.
      If your credentials were in the same folder as your script you could do "credentials.json" without going up/down from any folder

    • @cgtinc4868
      @cgtinc4868 Рік тому +1

      ​@@DataIndependent Thanks! wrote to you in Twitter as well

  • @adamtemple8677
    @adamtemple8677 Рік тому

    Is it still limited by the prompt token limits, or can you use an entire G-Drive and chat with all your documents?

  • @ahmadzaimhilmi
    @ahmadzaimhilmi Рік тому

    Still studying this langchain module. I'm looking to chain a series of questions, i.e. use result from a question to generate the next question.

    • @DataIndependent
      @DataIndependent  Рік тому

      Nice, that would likely be an agent. What's the example you want to do?

    • @ahmadzaimhilmi
      @ahmadzaimhilmi Рік тому

      @@DataIndependentA business plan aims to develop a research plan for a thesis. The research plan needs to find a research gap, which means an unexplored area in the existing literature. Otherwise, the research would be repetitive and unoriginal. This is a difficult part that involves a lot of writing and concentration. It might take around nine months to finish this part if one is very committed. To do this, one has to go through hundreds of papers, learn about the methods, materials, standards and challenges of similar research. There is a technique for doing this, but LLM simplifies it a lot. My approach is to use Bert or another tool to get relevant keywords from the papers and build on them for the research plan. This way, the researcher spend less time on the writing part and focus on doing the experiment.

  • @joelmartinez7628
    @joelmartinez7628 Рік тому

    Still skeptical in opening our internal information to gpt3. Information will definitely be used to train and internal information that will be public once fed to gpt3. am i wrong to ask if they have a plan they can use the data to train but not as public information?

    • @DataIndependent
      @DataIndependent  Рік тому

      I totally agree - It's a problem that will need to get solved. I actually tweeted about this same question here: twitter.com/GregKamradt/status/1627338667936337921
      AFAIK this isn't on the roadmap for them yet but I hope I'm wrong

    • @VictorCardonan
      @VictorCardonan Рік тому

      why don't you use Gpt4all which can be installed locally and is not sending any data outside? It won't be that good nor straighforward but it can give you a good result.

  • @ezequielmelillan1708
    @ezequielmelillan1708 Рік тому

    Hi man, thanks for sharing, this is amazing. Can you make a video using alpaca/llama integration with LangChain? Is it possible to use embeddings with those open-source AI?

    • @DataIndependent
      @DataIndependent  Рік тому +1

      Yep it's very possible you just need to swap out your embeddings model

  • @haisai4159
    @haisai4159 Рік тому

    amazing tutorial! beginner here: can you do this for a google sheets and instead of juypter notebook a google collab notebook? thank you!

    • @DataIndependent
      @DataIndependent  Рік тому

      What's the use case you'd want to run through

    • @AmineBELALIA
      @AmineBELALIA Рік тому

      ​@@DataIndependent have the same problem. I have a list of product specifications (2000 specs) and I want to build a chatbot that can answer customer questions about these products and explain the technical details of each spec by searching the internet ( google sheet doesn't have thislevel of detail )

  • @coachfrank2808
    @coachfrank2808 Рік тому +1

    Nice!

  • @HumzaAslam-i8l
    @HumzaAslam-i8l Рік тому

    How do I get my credentials path from google?

    • @DataIndependent
      @DataIndependent  Рік тому

      *You* give your credentials path to google.
      This guide may help googleapis.dev/python/google-auth/latest/user-guide.html

  • @frankrobert9199
    @frankrobert9199 Рік тому

    great

  • @neon_Nomad
    @neon_Nomad Рік тому

    What about nextCloud or syncthing?

    • @DataIndependent
      @DataIndependent  Рік тому

      Could you link me to the examples you'd want to see?

  • @learnapplybuild
    @learnapplybuild Рік тому

    Please make a video on onedrive

  • @abdoualgerian5396
    @abdoualgerian5396 Рік тому

    the only bad thing about your content is the disturbing background music not all people can concentrate on a mixiture of more than one voice

  • @zes7215
    @zes7215 Рік тому

    wrg

  • @ryanonvr2267
    @ryanonvr2267 Рік тому

    ---> 76 with open(self.token_path, "w") as token:
    77 token.write(creds.to_json())
    79 return creds
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\info\\.credentials\\token.json' (even though the cred file is correct somewhere else.)
    :( newb

    • @DataIndependent
      @DataIndependent  Рік тому

      You can do two things
      1) Make sure your cred file is in the location your script is looking for (I'm guessing it's the directory you mentioned above)
      2) Tell your script to look elsewhere. This would be the location of your creds file wherever you would like it. I usually do it in my same folder or a parent folder above.