Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)

Поділитися
Вставка
  • Опубліковано 3 жов 2024

КОМЕНТАРІ • 313

  • @LiamOttley
    @LiamOttley  Рік тому +4

    Leave your questions below! 😎
    📚 My Free Skool Community: bit.ly/3uRIRB3
    🤝 Work With Me: www.morningside.ai/
    📈 My AI Agency Accelerator: bit.ly/3wxLubP

  • @moses5407
    @moses5407 Рік тому +2

    Golden! Clear, concise info and a notebook! If it's too fast for some viewers, I'll remind that they can always show down the replay speed.

  • @borisbadinoff1291
    @borisbadinoff1291 Рік тому +12

    👏👏 Hey Liam, your five-minute tutorial is fantastic! Kudos and thanks for putting the effort to produce it. Your app is exactly what any knowledge worker is craving for: We all have gigabytes of pdf files in some folder named "READ", "TO READ" or "__TO READ" (so it stays on top of the root :), but never get to it (probably distracted by all these tutorials to become more productive we love to watch). A bot that can read that stuff for us, so we can continue to wing it is a true godsend. :D

  • @naturallydope247
    @naturallydope247 Рік тому +4

    This was definitely one of your better videos. You explained Langchain well and I’m glad you used the colab notebook instead of Jupyter or repl.

  • @guilhermeveiga9345
    @guilhermeveiga9345 Рік тому +2

    Thought it would be just another video on the subject, but you summarize in an awesome way! Great vid! Congrats

  • @ryanjames3907
    @ryanjames3907 Рік тому +2

    thank you for time, effort and generosity,
    I wish very good things for you.

  • @stefano94103
    @stefano94103 Рік тому +6

    Excellent! Thank you for your hard work to put these together.

    • @LiamOttley
      @LiamOttley  Рік тому

      My pleasure! Thanks for watching

    • @AlbyTheMovieCreator
      @AlbyTheMovieCreator Рік тому +2

      This video was copied from the beginning to the end from the channel Prompt Engineering

    • @stefano94103
      @stefano94103 Рік тому

      @@AlbyTheMovieCreator Oh wow I totally didn't know that. Thanks for the heads up! SMH😒

  • @1Esteband
    @1Esteband Рік тому +7

    Thank you it worked perfectly despite generating an error on the pip install.
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

  • @AndrewSheves
    @AndrewSheves Рік тому

    Liam, this is a great tutorial, thank you. What I really liked was the explanation of what is happening behind the scenes - anyone (even a non-developer) like me - can cut and paste the code but knowing what the commands are doing is super helpful.
    The explanations in the Colab are great and I took your advice and stole your code. The chatbot was up and running in a few hours (remember: non-developer) but that included building a separate UI. Great work, thank you

    • @aradinac
      @aradinac Рік тому +3

      can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRAIL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.

    • @AndrewSheves
      @AndrewSheves Рік тому

      @@aradinac I used the paid for openAi key

    • @csss142
      @csss142 10 місяців тому

      @@AndrewSheves which one did you buy?

    • @miguelmunoz4135
      @miguelmunoz4135 9 місяців тому

      ​@@aradinac I have the same error because I have the account not paid, if you found another solution, pls let us know

  • @konstantinrebrov675
    @konstantinrebrov675 Рік тому +2

    Wonderful tutorial. Thank you!

  • @walkingwchris
    @walkingwchris Рік тому

    Appreciate your hustle bro

  • @featherly4267
    @featherly4267 Рік тому +2

    Brother can you make video on how to use autogpt for beginners 😊

  • @LaxBau
    @LaxBau Рік тому

    You're awesome, Liam !!

  • @gabijazza1220
    @gabijazza1220 Рік тому

    Cheers, this is a brilliant video. Looking forward to making a bespoke AI.

  • @chandrachoodR
    @chandrachoodR Рік тому

    Thats a fantastic video and to the point and thanks for the code as well

  • @CK-ho7gj
    @CK-ho7gj Рік тому

    Awesome tutorial. Cheers Liam

  • @rishabpoddar3866
    @rishabpoddar3866 Рік тому +5

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.

  • @SedhuujGorem
    @SedhuujGorem 8 місяців тому +28

    The Best tool for this is ua-cam.com/video/bcK7LldB3dk/v-deo.html
    I like some of the transitions, but sometimes they're a bit too much and are seemingly random. Since we use these persistent elements that transition across pages to indicate some kind of relationship between the previous and the next states, some of your transitions confuse me because I can't immediately see what the relationship is.
    For example 1:23 of the selectable tiles (which weren't selected) transition into being two switches... does that mean anything? are they related in some way? I see this as random and a bad use of the design language. However, at 3:14 I like the transition from switches to the ticks on a paper, that makes sense to me. Epic presentation tho

  • @justingu9541
    @justingu9541 Рік тому

    Thank you for your excellent sharing. This is great guidance, and I hope you can continue to share more! If there's anything I can do, please let me know~

  • @sganesh07
    @sganesh07 Рік тому

    Thanks Liam ... neat and fast as always; could you post another similar video doing the same thing with Llama index pls. I thought that was easier.

  • @antonpictures
    @antonpictures Рік тому +1

    i there! As a fellow filmmaker, I find the concept of regenerative agents fascinating. I'm curious, what specific types of agents are you interested in exploring in your video? Additionally, have you thought about incorporating some real-world examples of sim city-like models, such as the ones developed by Stanford, to help illustrate the concept to your audience? Looking forward to hearing more about your project! George Anton

  • @SimonStJohn
    @SimonStJohn Рік тому +4

    Hey Liam! Awesome...could you do one that scrapes data from blog/website for embedded chatbot for a blog?

  • @TheRealUncensoredTravelGuide
    @TheRealUncensoredTravelGuide 11 місяців тому

    It’s convenient because I just completed a Data Analysis course via IBM, and Vanderbilt Promp Engineering course. I created my first Smart Bot for my Dad’s website on Sunday.
    I’d like to dump RFP contractor documents to easily take the 88 pages to question parts of a bid

  • @GiovaDuarte
    @GiovaDuarte Рік тому +8

    This is great. Is it possible to retrieve images from the PDF? I have a PDF with many graphics that help understand the content. Do you have any ideas as to how I can provide images as part of the conversation?

    • @MCroppered
      @MCroppered Рік тому

      What type of graphics are you talking about?

    • @quinnherden
      @quinnherden Рік тому +1

      You could leverage Lang Chain's agent feature set to use computer vision to analyze your images.

    • @GiovaDuarte
      @GiovaDuarte Рік тому

      @@MCroppered the PDF I have has images embedded and I was wondering if how I could recall these during a conversation

    • @GiovaDuarte
      @GiovaDuarte Рік тому

      @@quinnherden I will research this. Thanks!

    • @gaben7
      @gaben7 Рік тому +2

      @@GiovaDuarte if you figure out how to bring images along with the conversation, let us know how please

  • @AmitKumar-ct8df
    @AmitKumar-ct8df Рік тому +2

    couple of things observed.
    1. Its not free because integration with openAI is required
    2. It is too slow. For two page PDF it takes somewhere around 10-20 seconds to respond when I am on a 48GPU machine

    • @shanamin1561
      @shanamin1561 Рік тому

      Is there any way I can test it for free? I used a PDF with only one page and it says "You exceeded your current quota, please check your plan and billing details"

    • @TheAmit4sun
      @TheAmit4sun Рік тому +1

      @@shanamin1561 You can not its entire smartness is dependent of open AIs embeddings and its not free.

  • @noteniceu
    @noteniceu Рік тому +5

    Can you feed it multiple pdf at the same time like a group of 300 or would you have to run each line individually.

  • @joma4284
    @joma4284 Рік тому +1

    A beginner's question, if everything happens locally, why do we need an OpenAI API Key?

  • @vicentesoto1628
    @vicentesoto1628 Рік тому

    Liam your content is unreal
    Some of the best I've seen so far
    This is hard knowledge
    You are brilliant
    What do you mean by 512 tokens on every chunk? Characters?
    I'll be waiting for a detailed masterclass
    Vicente

  • @codyclarke3665
    @codyclarke3665 7 місяців тому

    For anyone runnning into rate limiting issues, I had success using an exponential backoff on the request. I replaced the entire first part of step 2 with the following:
    # Get embedding model
    from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    ) # for exponential backoff
    @retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
    def completion_with_backoff():
    return OpenAIEmbeddings()
    embeddings = completion_with_backoff()

  • @omountassir
    @omountassir Рік тому

    Freaking Great Content! Keep Rocking 💯

  • @shtookaralph5205
    @shtookaralph5205 Місяць тому

    Thanks for the awesome video Liam, do you need a GPU to do such a thing? it looks like a GPT license/subscription is required as well ?

  • @vverboX
    @vverboX Рік тому

    I would love to see a video which helps me to deploy such a chatbot (created on colab) on a webpage.

  • @JerryTrade28
    @JerryTrade28 Рік тому

    Will you be sharing your Marcus Aurelius database u created previously? I was really looking forward to that

  • @joepbaks
    @joepbaks Рік тому +2

    Thanks a lot man, been trying to get this to work via other ways for days. This was so easy, great tutorial. How would you transfer something like this to a user friendly ux/ui?

  • @MichielVermandel
    @MichielVermandel Рік тому +2

    Thanks for the great video! One question: which OpenAI model is used to retrieve the answer? Is it gpt-35-turbo or ada or...? Where is it defined?

  • @luigiseven
    @luigiseven Рік тому

    Awesome work

  • @zoumanakeita8016
    @zoumanakeita8016 Рік тому

    Straightforward and concise! Great explanation.
    How do you extract the exact page number where the answer was found?

  • @GALTechEnterprises-m7c
    @GALTechEnterprises-m7c Рік тому

    Thank you, keep going.

  • @coinhawk
    @coinhawk Рік тому

    Great job... will run this on my writings/ book collection and my code snippets, and build an awesome, MeKnowledgeBase 😎

  • @quangdinhdota2388
    @quangdinhdota2388 Рік тому +2

    Amazing Video. I have a question: Can Your notebook (code) run with muti file pdf?

  • @escoladetecnologia
    @escoladetecnologia Рік тому +6

    Thanks, but it would be interesting to do without the OpenAI API, as it is paid and it would be very expensive for large projects to be analyzing PDFs with it, it could be another hugging face model, I am trying to do something in this direction, if you have any ideas let us know let me know!

    • @utsabkundu27
      @utsabkundu27 Рік тому

      Yeah i am also trying to build something like that without using the OpenAI API.

    • @suhaglal6526
      @suhaglal6526 Рік тому

      Me too

    • @nihonkeizaishinbun2254
      @nihonkeizaishinbun2254 Рік тому

      Do you found something interesting ? For large pdfs project the price is very important

    • @misalambasta
      @misalambasta 6 місяців тому

      Anyone found something?

  • @yiyuanzhang6335
    @yiyuanzhang6335 Рік тому

    will need a video on how to do this for multiple pdfs

  • @armandocapogrossi6689
    @armandocapogrossi6689 Рік тому

    Thanks, very good content. Just a question to understand the market better: did I misinterpret your hourly rate at $997/45 mins?

  • @tibz11c
    @tibz11c 5 місяців тому

    Great Work! Can we do this with a local or a smaller language model ?

  • @timtensor6994
    @timtensor6994 11 місяців тому +1

    great tutorial , can it be modified to support multiple pdfs ?

  • @willyjauregui6541
    @willyjauregui6541 3 місяці тому

    Out of complete ignorance, is Langchaining the best method currently available to increase the perform of our LLMs Chatbots?
    If not, what is it or what other methods are out there that I may be missing.
    Thanks for answering.

  • @mic9657
    @mic9657 5 місяців тому

    Those biceps too! 💪

  • @Ramp_cat_7
    @Ramp_cat_7 Рік тому

    This is amazing! Can you teach us mindai?

  • @Finalform77
    @Finalform77 Рік тому +3

    Hi Liam, I am getting 'authentication Error' when running 2. section of the code "Embed text and store embeddings" . I have not change anything yet just running it as is. Any suggestion?

  • @audacityhour3104
    @audacityhour3104 2 місяці тому

    I appreciate the tutorial.. the lack of dark mode is total eye torture

  • @tuwayne3624
    @tuwayne3624 Рік тому +1

    Thank you, I've learned a lot from your channel. I'm curious about the differences between the llama index and the langchain. Maybe I'm still a beginner in AI and don't quite understand.

  • @minhe9008
    @minhe9008 Рік тому +2

    great tutorial! I have hundreds of research papers in pdf format. Can I use this approach to build a vector db and then chat with chatgpt? Is there a limit to the size of db? any pitfall to avoid?thanks!

  • @TheUselessgeneration
    @TheUselessgeneration Рік тому +1

    I cant wait until we can expand this to all documents. I assume that is what Microsoft 365 Copilot will do.

  • @InsightConsulting-w6i
    @InsightConsulting-w6i Рік тому

    This is great Liam, thank you for sharing, what's the simple automated way to deploy this code to a basic online application/chat page

  • @harshavardhan7097
    @harshavardhan7097 Рік тому +1

    Can i do it on jupyter notebook rather then using colab

  • @ayanbahukhandi1869
    @ayanbahukhandi1869 Рік тому +6

    Can I do it with multiple PDFs? like for each pdf I'll just chunk every page?

    • @ahmedmiftah8308
      @ahmedmiftah8308 Рік тому

      Merge pdfs

    • @audacityhour3104
      @audacityhour3104 2 місяці тому

      @@ahmedmiftah8308what’s the point in merging the pdfs if the chunks is going to break them up anyways
      Each section should be able to be another pdf which makes since anyways

  • @InnocenceVVX
    @InnocenceVVX Рік тому

    So essentially you calculate semantic similarity of the stored vectors and the asked question, then provide the 4 most similar vectors as context in the prompt?

  • @tspang1977
    @tspang1977 Рік тому +5

    Hi Liam, great video. I do have a question, from the following code, i notice that we don't have to specifically turn the "query" into embeddings, before it performs a search against the vector db? Is it because the function "similiary_serach" internally calls the openapi embedding to perform words embeddings?
    query = "Who created transformers?"
    docs = db.similarity_search(query)

  • @ElijahTheProfit1
    @ElijahTheProfit1 Рік тому +8

    Hey Liam, How many PDFs can I use this on? I have 1000+ instructional documents on an information system I use and have been trying to create a chatbot with this database embedded for quick question answering. Would i have to combine all the pdfs? can i put them all through vectorization? What are your thoughts?

    • @LiamOttley
      @LiamOttley  Рік тому +16

      Getting lots of qs like this, will make a video on chatting over many PDFs this week 👍🏻

    • @sammiller9855
      @sammiller9855 Рік тому +4

      @@LiamOttley It would also be cool if it could handle other files types such as epub , doc and markdown files.

    • @neoszhane
      @neoszhane Рік тому

      Liam is there a way to use langchain with open source models such as Vicuna?

    • @ElijahTheProfit1
      @ElijahTheProfit1 Рік тому

      @@LiamOttley Just subbed and turned on notifications!! Love that your responsive and videos are great man!

    • @quangdinhdota2388
      @quangdinhdota2388 Рік тому

      @@LiamOttley waiting your next video.

  • @TheSacredGrove
    @TheSacredGrove Рік тому

    Cool AF!

  • @andym9565
    @andym9565 Рік тому

    Great video! Would be cool to create a video similar with Apify and LangChain.

  • @aipy5147
    @aipy5147 Рік тому +2

    Great video! I was wondering why is it a private chatbot when you're using openAI key and sending the information to LLM GPT-3.5? How can you secure sensitive data with your method? Thank you sharing your knowledge.

    • @lubeckable
      @lubeckable 9 місяців тому

      Using and hosting by yourself a custom open source LLM like llama or mistral

  • @ShriyaShah-tk3nr
    @ShriyaShah-tk3nr 3 місяці тому +1

    I'm facing this issue: module 'openai' has no attribute 'error'

  • @LordPBA
    @LordPBA Рік тому +1

    there is a little error, in the embedding section OpenAIembeddings is not defined, if I am not wrong just add a line -> import openai
    (I wonder on 115.760 views how many has really done the tutorial XD )

  • @flyinonminds6415
    @flyinonminds6415 Рік тому +1

    great video!, it is possible to add more than 1 pdf with that code ?, will be possible to provide a code for multiple pdf ? thank you

  • @maxdranitsa
    @maxdranitsa 6 місяців тому +1

    Liam, if there an option to make the assistant always use the data that has been uploaded to knowledgebase? It doesn't read the KB files every time and uses the links that even doesn't exist

  • @ganashayoutube
    @ganashayoutube Рік тому

    awesome bro

  • @Iatalksbrasil
    @Iatalksbrasil Рік тому

    great video! help me to complete me knowlege about best praticies in prompt!

  • @JJBoi8708
    @JJBoi8708 Рік тому +1

    What is a good way to split text in a textbook pdf because on one page it has 2 columns, text on the left and right side?

  • @chatbotsvideochatbotsforwe1207

    Very Good👍

  • @Miya-ub5qn
    @Miya-ub5qn Рік тому +4

    Thank you very much for this great video!!! One question. On the part of Create chat bot with chat memory (OPTIONAL), I received the following message "DeprecationWarning: on_submit is deprecated. Instead, set the .continuous_update attribute to False and observe the value changing with: mywidget.observe(callback, 'value').
    input_box.on_submit(on_submit)" Why? Would you be able to fix it?

    • @ranjitherusa7139
      @ranjitherusa7139 Рік тому +1

      I am having same issue
      Is the optional segment should be on same py program?

  • @studentjntuh
    @studentjntuh 3 місяці тому

    NameError Traceback (most recent call last)
    in ()
    1 # Get embedding model
    ----> 2 embeddings = OpenAIEmbeddings()
    3
    4 # Create vector database
    5 db = FAISS.from_documents(chunks, embeddings)
    NameError: name 'OpenAIEmbeddings' is not defined

  • @JohnAlexanderEcheverryOcampo

    Thanks

  • @mr.pantherpanther1013
    @mr.pantherpanther1013 9 місяців тому +1

    Hey Liam @ 03.22 you said we can upload pdf data by entering the pdf name. But what if we have more pdf, life for example I have 5 pdf?

  • @johnjoesafatso
    @johnjoesafatso Рік тому

    Amazing content. Thank you!
    Is there a way to do this with PDFs that have graphics and images?

  • @xViperCodes
    @xViperCodes Місяць тому

    I am creating a chatbot to help employees, I have a 220 page contract pdf that I need my chatbot to be able to answer questions about accurately. The issue is, fine-tuning with the data doesn’t produce accurate outputs. Would this be a good way to achieve this?

  • @siddhantmohanty1578
    @siddhantmohanty1578 7 місяців тому

    Hi,
    THANK YOU for sharing your knowledge. Could please let me know how many PDF can we train using this technique and does this LLM remember what PDFs it has been trained on or do we have to train the LLM at before running the query?

  • @Pppljssbs
    @Pppljssbs Рік тому

    As a beginner coding their first ever plug-in, how long would it take to develop a high quality plug-in?

  • @sahansathsara7106
    @sahansathsara7106 Рік тому

    This is great! But how much does it cost

  • @aradinac
    @aradinac Рік тому +1

    can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRIAL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.

  • @jitenbhalavat5738
    @jitenbhalavat5738 9 місяців тому

    What if we have multiple PDfs and we want to fetch the Answer from that pdf ?
    like for an example : I have 20 Pdfs, and if I ask one question then it should fetch the answer from any one of the Pdf (correct obviously) and show me as a output.

  • @marcosemeria97
    @marcosemeria97 Рік тому +1

    Can you suggest alternatives to OpenAI in terms of embeddings and llm? They are too expensive their APIs

  • @jaimeat
    @jaimeat Рік тому +1

    How can I implement this chatbot with my custom Db with a longterm memory and access from the phone ?? Any guide?

    • @jaimeat
      @jaimeat Рік тому

      This would be very interesting

  • @vrynstudios
    @vrynstudios Рік тому

    i am noob here. Is it possible to embed it on a site? If I embed, is it standalone? or still it uses GPT API calls and costs?

  • @monicadesai7928
    @monicadesai7928 5 місяців тому

    Can you share video on voice based search in pdf Document.

  • @vukradovic172
    @vukradovic172 5 місяців тому

    Excellent

  • @bene88597
    @bene88597 Рік тому

    You got my mail buddy GJ

  • @denizkapteina2151
    @denizkapteina2151 Рік тому +1

    Thanks for the super video. I have a question: in the overview you show that ChatGPT3.5 is used, or that the query is last processed by 3.5. But in the code I can't find any reference to it. Where is my mistake?

    • @LiamOttley
      @LiamOttley  Рік тому +1

      The default LLM for Langchains "OpenAI()" is text-davinci-003 and "ChatOpenAI()" is gpt-3.5-turbo I believe

  • @ikjb8561
    @ikjb8561 Рік тому +2

    Can you use .txt files instead of PDFs? Great video and content. Thanks

    • @noteniceu
      @noteniceu Рік тому

      Good question

    • @creneemugo94
      @creneemugo94 Рік тому

      Here’s how to do it with text files:
      ua-cam.com/video/c0YDDSWr3t0/v-deo.html

  • @opticchill2706
    @opticchill2706 8 місяців тому

    Im sorry to ask but can you make a video on putting a chatbot into a website?

  • @themotivationhub1355
    @themotivationhub1355 Рік тому +1

    Yo I’ve made plugins but don’t know how to test it so can you give some ideas .(I don’t have access to the plugins yet.I’m in the waitlist)

  • @georgekokkinakis7288
    @georgekokkinakis7288 Рік тому

    Can you explain how we could use other llms than openai, for example can we use mosaic mpt-7b ?

  • @qwerto-ye5pe
    @qwerto-ye5pe Рік тому

    Hi! I just wanted to ask what are the licenses used in this project? Are they commercial-friendly?

  • @suriyakrishnan5177
    @suriyakrishnan5177 Рік тому

    Can you explain this same example using expressJS? Coz no other tutorial hasn't used expressJS to illustrate this example

  • @we-hb4ni
    @we-hb4ni Рік тому +3

    Is there a limit to the number of PDF chunks you can add to the vector DB?

    • @LiamOttley
      @LiamOttley  Рік тому +2

      Not necessarily, if you cram it full of thousands of chunks I'd assume the recall just gets slower and slower and uses more resources on your system. Best to setup different indexes for different information or use namespaces (Pinecone feature)

  • @rolandowise
    @rolandowise Рік тому

    Anybody running in to this installation issues: DEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg

  • @michaeldblake
    @michaeldblake Рік тому

    I'd love to figure out how to do this.

  • @TheSimoncio
    @TheSimoncio Рік тому

    What about using any other open source LLM instead of GPT? thank you!

  • @tommycondon1918
    @tommycondon1918 Рік тому

    Could you do it using Gradio interface and importing openai module?

  • @frosti7
    @frosti7 Рік тому

    What solution can dynamically add or extract database for an LLM?
    Like your company information that can be accessible by employees