I'm so obsessed goin through all of this videos one by one. No better way to spend my Saturday, especially when things work! Thanks for your amazing contribution!
Awesome video thank you. Do you have a video on how to utilize embeddings in the sample scenario. Would like to create something similar but have a lot of docs. Also is there a way to refresh the embeddings automatically or on a schedule? For example, if the doc gets updated, how does that get handled
thanks very much! I have a question, I want to control the usage of document, only for my company internal use. If I use langchain, is the other party include openai can see my document? thansk
Yes, if you use OpenAI as your LLM then they can see your data. Check out their data retention policies for more information. You could do a self hosted LLM for privacy reasons but that is more set up
One use case that I would love to see is how this performs on Excel/Google Sheets Data. Given event/log data from a website or a mobile app and documentation on what activity each event type in the log represents, does the model know how to answer questions about frequent (or user-specific) app activity?
Great content, just a question about security of the information. Do you know if this way ChatGPT will see the information like if you enter it on their platform?. My concern is if you use for private documents then the info will be in ChatGPT database for everyone to see, thanks
Hello, thank you for the videos. They are really interesting. I have two questions: 1) Why are you not using embeddings in this case? 2) Would it make sense and it is possible to save the state of the summarizer so you don't have to do all the process from scratch if you have +1000 documents? Thank you
@DataIndependent - My main question is #2: How can we build a database of documents so that the knowledge db grow and not do all of the processing from scratch?
You can see here the prompt that is being used to generate this summary github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/stuff_prompt.py Under the hood it's just a prompt with your text in it. You could adjust the prompt manually (not by using the chain, but doing your own prompt) to get a longer one.
Can u exactly explain the path of credentials folder assuming that I am working with GoogleColab and drive folder path where ipynb file is residing my drive at /ColabNotebooks/LangChain/drivetest.ipynb
Hi, I would like to know if there's any possibility to connect Google Sheets from my Google Drive account as it does with Google Doc. Please help me. Thanks a lot :)
Hi Greg, I am getting an error while trying to connect Google Drive files to OpenAI and the error is below: ValueError: Client secrets must be for a web or installed app. May you please me to resolve this error. I am using Azure credentials.
I want to ask question to my excel files or a dataset which is in csv format (not a text file) or may be want to get a file in a form of table from sql server which is a result of a sql query, is it possible to upload that file in googledrive the same way or this method is for just text files? Or is there any direct way to ask question yo my sql table with open ai?
Nice video. I have one follow up - when i do any kind of interaction with openai (for instance the doc from google drive) or in the other video where i chunk/embed local documents, how safe are the personal documents. in other words, how safe is it to use openai for personal documents ? does anyone have any idea on that.
Hello, can you resolve my error? I gave credentials path and it got executed. But when I loaded document, it displayed "Access blocked to the Google Drive API"
Great tutorial. Absolutely loving it. I'm trying to read a gitbook and summarise it but apparently there's a prompt context length limit. "This model's maximum context length is 4097 tokens, however you requested 7592 tokens" Not sure where I can set the token limit
Nice! Yes there is a context limit for prompts. Check out either my video on asking a question to a 300 page book or else my "work arounds for prompt limit" video
Hey Data Independent, I'm new to Python and coding in general but AI has been the push I need to really dig into this. I got Jupyter running locally, is there a recommended resource you'd point me towards for bringing your code into it?
Nice! That's great. What I was going to say is: Easiest - Copy and paste the code from the github link in the description into your jupyter notebook More Robust - Git clone the repo so you can stay up to date with future changes as well
Hello, I have just started watching a few of your vids, they’re super interesting and really well explained, thanks! Q: The source files, in my case several PDF docs, are confidential and my idea is to create a internal Q&A. What is the privacy? Does LongChain or OpenAI potentially have access to it? Does it add it to its “brain”? Or is it completely private? Thanks again
Data used through the Open AI APIs like the questions fed to the LLM and the answers outputted by the LLM (what Open AI calls prompts and completions, respectively) will be stored on their servers for 30 days before being purged. Per their policy, only a limited number of employees within OpenAI itself - only those employees who are monitoring it for abuse - will have access to the data. For enterprise customers, they might even have the option to totally opt out of having their data stored at all. Look up Open AI API usage policies. I can't paste link here. Using their embeddings service also exposes your data to OpenAI. The demo in this video doesn't use embeddings but (it reads the text directly) but you almost always want to create a vector index with embeddings for your knowledge base (kb) specially if it consists of hundreds or thousands of documents. LLMs has an easier time 'reading' vector values rather than raw text. cheers.
Great video and as founder of startup need this tool! Is there a way not to access Google drive but like Synology Nas (which we use), that will be really really helpful
Thank you! I've never heard of Synology. For it to integrate it would either take a custom data loader from LangChain/Unstructured or you'd need to export the files you'd want to another spot.
@@DataIndependent Thanks! its just a brand for external NAS setup. Maybe you can have a video on local HD drive which with that we can just change the path for wherever the source of the documents are :)
Sorry for noob question, where to place the "../../desktop_credetnaisl.json" as to admit that I am a non coder, just following your video along the way
Nice! You can place your credentials file where ever you want. By default your program will usually look in a root folder, but you can tell it to look whereever you need. If your credentials were in the same folder as your script you could do "credentials.json" without going up/down from any folder
@@DataIndependentA business plan aims to develop a research plan for a thesis. The research plan needs to find a research gap, which means an unexplored area in the existing literature. Otherwise, the research would be repetitive and unoriginal. This is a difficult part that involves a lot of writing and concentration. It might take around nine months to finish this part if one is very committed. To do this, one has to go through hundreds of papers, learn about the methods, materials, standards and challenges of similar research. There is a technique for doing this, but LLM simplifies it a lot. My approach is to use Bert or another tool to get relevant keywords from the papers and build on them for the research plan. This way, the researcher spend less time on the writing part and focus on doing the experiment.
Still skeptical in opening our internal information to gpt3. Information will definitely be used to train and internal information that will be public once fed to gpt3. am i wrong to ask if they have a plan they can use the data to train but not as public information?
I totally agree - It's a problem that will need to get solved. I actually tweeted about this same question here: twitter.com/GregKamradt/status/1627338667936337921 AFAIK this isn't on the roadmap for them yet but I hope I'm wrong
why don't you use Gpt4all which can be installed locally and is not sending any data outside? It won't be that good nor straighforward but it can give you a good result.
Hi man, thanks for sharing, this is amazing. Can you make a video using alpaca/llama integration with LangChain? Is it possible to use embeddings with those open-source AI?
@@DataIndependent have the same problem. I have a list of product specifications (2000 specs) and I want to build a chatbot that can answer customer questions about these products and explain the technical details of each spec by searching the internet ( google sheet doesn't have thislevel of detail )
---> 76 with open(self.token_path, "w") as token: 77 token.write(creds.to_json()) 79 return creds FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\info\\.credentials\\token.json' (even though the cred file is correct somewhere else.) :( newb
You can do two things 1) Make sure your cred file is in the location your script is looking for (I'm guessing it's the directory you mentioned above) 2) Tell your script to look elsewhere. This would be the location of your creds file wherever you would like it. I usually do it in my same folder or a parent folder above.
I'm so obsessed goin through all of this videos one by one. No better way to spend my Saturday, especially when things work!
Thanks for your amazing contribution!
Appreciate what you've been doing and the time you've spent helping the community :)
awesome vid. can't wait till GPT4 is out and we can use google drive photos/text as multimodal input
Big time! That is going to be super cool.
might be a bit silly to ask, but it would be useful if you can provide some guidance on how to setup the credentials json. Have been fumbling on it.
same
haven't you found a way to setup the credentials, I put my credentials.json at correct path but it still says not found
Thank you for sharing this, very interested.
Python + LangChain + Html basic coding = Big Future = Prompt Engineering
Nice
Awesome video thank you. Do you have a video on how to utilize embeddings in the sample scenario. Would like to create something similar but have a lot of docs. Also is there a way to refresh the embeddings automatically or on a schedule? For example, if the doc gets updated, how does that get handled
Did you figure that out?
thanks very much! I have a question, I want to control the usage of document, only for my company internal use. If I use langchain, is the other party include openai can see my document? thansk
Yes, if you use OpenAI as your LLM then they can see your data. Check out their data retention policies for more information.
You could do a self hosted LLM for privacy reasons but that is more set up
Hi, can you recommend info on self hosted LLM? Can I use OpenAI and basically not have them retain my data? Or do I have to use another LLM?
One use case that I would love to see is how this performs on Excel/Google Sheets Data. Given event/log data from a website or a mobile app and documentation on what activity each event type in the log represents, does the model know how to answer questions about frequent (or user-specific) app activity?
Great content, just a question about security of the information. Do you know if this way ChatGPT will see the information like if you enter it on their platform?. My concern is if you use for private documents then the info will be in ChatGPT database for everyone to see, thanks
Great Insights
Hello, thank you for the videos. They are really interesting. I have two questions:
1) Why are you not using embeddings in this case?
2) Would it make sense and it is possible to save the state of the summarizer so you don't have to do all the process from scratch if you have +1000 documents?
Thank you
I was thinking the same thing..
@DataIndependent - My main question is #2: How can we build a database of documents so that the knowledge db grow and not do all of the processing from scratch?
Once langchain read all of it, does it store the data when we reopen it again?
Definitely interested in implementing in my business
Nice! What domain are you in? How are you thinking about using it?
You're amazing - thanks for sharing.
My pleasure!
4:00 That's a pretty short summary of the long text, is there any parameter to make it longer?
You can see here the prompt that is being used to generate this summary
github.com/hwchase17/langchain/blob/master/langchain/chains/summarize/stuff_prompt.py
Under the hood it's just a prompt with your text in it. You could adjust the prompt manually (not by using the chain, but doing your own prompt) to get a longer one.
@@DataIndependent ah okay, thanks a lot for your info.
Is this and directory loader not doing a similar sort of thing?
Can u exactly explain the path of credentials folder assuming that I am working with GoogleColab and drive folder path where ipynb file is residing my drive at /ColabNotebooks/LangChain/drivetest.ipynb
I would put this question into chatgpt and have it work with you on the details.
It requires knowledge about your setup which I don't have
Not sure if this is explained elsewhere, can you retrieve the source document somehow together with the answer?
Hi, I would like to know if there's any possibility to connect Google Sheets from my Google Drive account as it does with Google Doc. Please help me. Thanks a lot :)
big time - you can use langchains drive loader python.langchain.com/docs/modules/data_connection/document_loaders/integrations/google_drive
Please do a map-reduce video
Here's a video explaining the different chain_types
ua-cam.com/video/f9_BWhCI4Zo/v-deo.html
Hi Greg, I am getting an error while trying to connect Google Drive files to OpenAI and the error is below:
ValueError: Client secrets must be for a web or installed app. May you please me to resolve this error. I am using Azure credentials.
Because Azure and Google Drive are run by different companies the credentials won't work.
Try getting google credentials
@@DataIndependent Thanks Greg 😇
I want to ask question to my excel files or a dataset which is in csv format (not a text file) or may be want to get a file in a form of table from sql server which is a result of a sql query, is it possible to upload that file in googledrive the same way or this method is for just text files?
Or is there any direct way to ask question yo my sql table with open ai?
Check out the langchain documentation for how to query sql files, it's very doable.
Another fantastic tutorial! although, what is the credentials.json file? and how can i get my own?
Thanks! That is on the google side of the house.
developers.google.com/workspace/guides/create-credentials
and how do we do with it , how do we get the .json file@@DataIndependent
Nice video. I have one follow up - when i do any kind of interaction with openai (for instance the doc from google drive) or in the other video where i chunk/embed local documents, how safe are the personal documents. in other words, how safe is it to use openai for personal documents ? does anyone have any idea on that.
Hello, can you resolve my error? I gave credentials path and it got executed. But when I loaded document, it displayed "Access blocked to the Google Drive API"
Have you googled it? that sounds like a google credential issue
Great tutorial. Absolutely loving it. I'm trying to read a gitbook and summarise it but apparently there's a prompt context length limit.
"This model's maximum context length is 4097 tokens, however you requested 7592 tokens"
Not sure where I can set the token limit
yea thats why hes selling his service to fill in the gaps
Nice! Yes there is a context limit for prompts. Check out either my video on asking a question to a 300 page book or else my "work arounds for prompt limit" video
Nothing to sell here - happy to help with any questions you have though
Hey Data Independent, I'm new to Python and coding in general but AI has been the push I need to really dig into this. I got Jupyter running locally, is there a recommended resource you'd point me towards for bringing your code into it?
Haha never mind, I figured it out. I just asked GPT 🤣
Love your content.
Nice! That's great. What I was going to say is:
Easiest - Copy and paste the code from the github link in the description into your jupyter notebook
More Robust - Git clone the repo so you can stay up to date with future changes as well
I did the git clone method. Thank you.
Hello, I have just started watching a few of your vids, they’re super interesting and really well explained, thanks! Q: The source files, in my case several PDF docs, are confidential and my idea is to create a internal Q&A. What is the privacy? Does LongChain or OpenAI potentially have access to it? Does it add it to its “brain”? Or is it completely private? Thanks again
Data used through the Open AI APIs like the questions fed to the LLM and the answers outputted by the LLM (what Open AI calls prompts and completions, respectively) will be stored on their servers for 30 days before being purged. Per their policy, only a limited number of employees within OpenAI itself - only those employees who are monitoring it for abuse - will have access to the data. For enterprise customers, they might even have the option to totally opt out of having their data stored at all. Look up Open AI API usage policies. I can't paste link here.
Using their embeddings service also exposes your data to OpenAI.
The demo in this video doesn't use embeddings but (it reads the text directly) but you almost always want to create a vector index with embeddings for your knowledge base (kb) specially if it consists of hundreds or thousands of documents. LLMs has an easier time 'reading' vector values rather than raw text. cheers.
Agree! and if you don't want OpenAI to have your data then you should be using a local model
Should have shown the structure of credentials file. Maybe add in comment.
Great video and as founder of startup need this tool! Is there a way not to access Google drive but like Synology Nas (which we use), that will be really really helpful
Thank you! I've never heard of Synology. For it to integrate it would either take a custom data loader from LangChain/Unstructured or you'd need to export the files you'd want to another spot.
@@DataIndependent Thanks! its just a brand for external NAS setup. Maybe you can have a video on local HD drive which with that we can just change the path for wherever the source of the documents are :)
Sorry for noob question, where to place the "../../desktop_credetnaisl.json" as to admit that I am a non coder, just following your video along the way
Nice! You can place your credentials file where ever you want.
By default your program will usually look in a root folder, but you can tell it to look whereever you need.
If your credentials were in the same folder as your script you could do "credentials.json" without going up/down from any folder
@@DataIndependent Thanks! wrote to you in Twitter as well
Is it still limited by the prompt token limits, or can you use an entire G-Drive and chat with all your documents?
Did you figure this out?
Still studying this langchain module. I'm looking to chain a series of questions, i.e. use result from a question to generate the next question.
Nice, that would likely be an agent. What's the example you want to do?
@@DataIndependentA business plan aims to develop a research plan for a thesis. The research plan needs to find a research gap, which means an unexplored area in the existing literature. Otherwise, the research would be repetitive and unoriginal. This is a difficult part that involves a lot of writing and concentration. It might take around nine months to finish this part if one is very committed. To do this, one has to go through hundreds of papers, learn about the methods, materials, standards and challenges of similar research. There is a technique for doing this, but LLM simplifies it a lot. My approach is to use Bert or another tool to get relevant keywords from the papers and build on them for the research plan. This way, the researcher spend less time on the writing part and focus on doing the experiment.
Still skeptical in opening our internal information to gpt3. Information will definitely be used to train and internal information that will be public once fed to gpt3. am i wrong to ask if they have a plan they can use the data to train but not as public information?
I totally agree - It's a problem that will need to get solved. I actually tweeted about this same question here: twitter.com/GregKamradt/status/1627338667936337921
AFAIK this isn't on the roadmap for them yet but I hope I'm wrong
why don't you use Gpt4all which can be installed locally and is not sending any data outside? It won't be that good nor straighforward but it can give you a good result.
Hi man, thanks for sharing, this is amazing. Can you make a video using alpaca/llama integration with LangChain? Is it possible to use embeddings with those open-source AI?
Yep it's very possible you just need to swap out your embeddings model
amazing tutorial! beginner here: can you do this for a google sheets and instead of juypter notebook a google collab notebook? thank you!
What's the use case you'd want to run through
@@DataIndependent have the same problem. I have a list of product specifications (2000 specs) and I want to build a chatbot that can answer customer questions about these products and explain the technical details of each spec by searching the internet ( google sheet doesn't have thislevel of detail )
Nice!
Thanks!
How do I get my credentials path from google?
*You* give your credentials path to google.
This guide may help googleapis.dev/python/google-auth/latest/user-guide.html
great
What about nextCloud or syncthing?
Could you link me to the examples you'd want to see?
Please make a video on onedrive
the only bad thing about your content is the disturbing background music not all people can concentrate on a mixiture of more than one voice
wrg
---> 76 with open(self.token_path, "w") as token:
77 token.write(creds.to_json())
79 return creds
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\info\\.credentials\\token.json' (even though the cred file is correct somewhere else.)
:( newb
You can do two things
1) Make sure your cred file is in the location your script is looking for (I'm guessing it's the directory you mentioned above)
2) Tell your script to look elsewhere. This would be the location of your creds file wherever you would like it. I usually do it in my same folder or a parent folder above.