Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

Поділитися
Вставка
  • Опубліковано 28 вер 2024

КОМЕНТАРІ • 525

  • @alejandro_ao
    @alejandro_ao  8 місяців тому

    💬 Join the Discord Help Server: link.alejandro-ao.com/981ypA
    ❤ Buy me a coffee (thanks): link.alejandro-ao.com/YR8Fkw
    ✉ Join the mail list: link.alejandro-ao.com/o6TJUl

    • @legendsanimexy8217
      @legendsanimexy8217 7 місяців тому +1

      Do I need to buy chatgpt API to build this project?

  • @blackhat965
    @blackhat965 Рік тому +39

    I built something similar a while back. You should teach people how to store vectors locally. That would drastically reduce the cost.

    • @atultiwari88
      @atultiwari88 Рік тому +4

      Hi, I am also trying to make a Q&A chatbot based on my single custom document of about 500 pages. But the vectors are quite costly for me. Can you please help with a workaround. I have explained my use case in the comments too, but so far no answer. Thank you.

    • @achmadnorcholis2387
      @achmadnorcholis2387 Рік тому +1

      I also want to know about it 😅

  • @seanfeng5448
    @seanfeng5448 Рік тому +37

    I really appreciate the tiny details and explanations like "since we're using langchain you have to set the variable like this" etc. Very helpful in understanding "why" in addition to "how". Cheers

    • @alejandro_ao
      @alejandro_ao  Рік тому +7

      Thanks! I try to keep it as approachable as possible so everyone can learn it 😊

    • @monarch9586
      @monarch9586 Рік тому

      @@alejandro_ao
      Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
      also can you make a video on environment setup. Please i really need to submit the project within one week

  • @jamesallison9725
    @jamesallison9725 Рік тому +1

    After updating python to the latest version and updating my PyCharm installation, this tutorial ran perfectly. Alejandro is an excellent presenter. Thanks for this terrific tutorial!

  • @XCIII-c2p
    @XCIII-c2p Рік тому +4

    Incredible job. Thank you very much for this project. I have no more than 2 weeks learning python and with your help, I was able to complete this project in my free time. Keep the amazing work Sir. Much respect from the Caribbean 🌴

  • @alejandro_ao
    @alejandro_ao  Рік тому +2

    Hey there! Let me know what you want to see next 👇

    • @Stopinvadingmyhardware
      @Stopinvadingmyhardware Рік тому

      Thanks for the video. I was being told I had to convert them to CSV, which made no sense to me.

  • @chiefwiki
    @chiefwiki Рік тому +61

    I've watched numerous GPT for PDF UA-cam tutorials, but yours stands out for its clarity, conciseness, and thorough explanations. Thanks, Alejandro! I've subscribed and eagerly await more content from you.

    • @alejandro_ao
      @alejandro_ao  Рік тому +2

      You’re far too kind! Thanks :) I’m glad you found it useful! I’ll keep making videos like this 💪🏼

    • @goforit5
      @goforit5 Рік тому +2

      I agree! I’ve been watching videos for months and this one is so clear for someone who was not a coder previously. I’ve subscribed too and will go back and watch prior videos. Thanks Alejandro!

    • @handler007
      @handler007 Рік тому

      tip: you api key is a firewall hole

    • @python360
      @python360 Рік тому

      100% agree with @chiefwiki ! Great tutorial and appreciate the process diagram ✈

    • @mohameddarwish1121
      @mohameddarwish1121 Рік тому

      انا ايضا!
      me too!

  • @elad3958
    @elad3958 Рік тому

    I was just thinking about creating a dashboard to visualize $ spent on requests. You are ahead of the game

  • @ricochetism
    @ricochetism Рік тому +1

    Some may experience issues doing this as there are a few things I believe Alejandro assumed were already done. This is especially true for getting past the first phase which IMO is launching the initial app in you local environment.
    1. You may need to add a .toml file with your open ai keys. You have to add a folder in your project folder titled .streamlit with a file inside it titled secrets.toml
    2. You may have to add import openai in your apps file
    3. if you have issues with firewall (I did) you'll need to add the port
    Open the Windows Defender Firewall settings:
    Go to the Control Panel.
    Search for "Windows Defender Firewall" and open it.
    its a bit involved so best to ask GPT how to allow access to the specified port by configuring Windows Defender Firewall
    after these issues it was a breeze for me
    Any help needed, just ask me
    Cheers from Ireland

  • @homegr0wn
    @homegr0wn Рік тому

    Thank you, thank you, thank you! Your video solidified everything for me. I knew the different pieces, but not how they all meshed together. Now I can finally continue with the project I was working on. Great content. Subscribed.

  • @rayalgar
    @rayalgar Рік тому +7

    Alejandro thanks for this. I’m getting a little more familiar with Python from every video. I believe there is a big opportunity for someone like you to help ‘no coders’ like me get the most from LLM’s.
    It would be helpful to know how multiple documents could be uploaded and what the limits of the LLM are before the responses begin to fail completely or degrade in quality.

    • @dxpdigital5343
      @dxpdigital5343 Рік тому

      I always see people who don’t know sht telling somebody how this is a great opportunity for someone to teach them for free. How about to learn it on your own? “Non-coders” you mean people who jump on bandwagons. If Tylenol became cool tomorrow you should also be a doctor. Maybe you should be more grateful and *gasp* humble. Clearly you would never take the time to learn how to make something secure, so you’re basically just wanting to set up some lame thing where you have put literally no time into. People like you literally ruined crypto, altering its commonality from being useful as utility to being some worthless not-even-penny-stocks.

    • @coolmcdude
      @coolmcdude Рік тому +1

      You cant really call yourself a 'no coder' anymore if you follow enough videos like this

    • @alejandro_ao
      @alejandro_ao  Рік тому +2

      i'm working on a video about that indeed!

    • @rayalgar
      @rayalgar Рік тому +1

      @@coolmcdude You’re right - interesting, the labels we give ourselves.

    • @monarch9586
      @monarch9586 Рік тому

      @@alejandro_ao Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
      also can you make a video on environment setup. Please i really need to submit the project within one week

  • @sanjayojha1
    @sanjayojha1 11 місяців тому

    Awesome and thank you for not using colab or jyupter. Very few people doing actual LLM coding, mostly skims thru colab notebook.

  • @BadBite
    @BadBite Рік тому

    The best step by step explanation. Please keep it up! I suscribed expecting to learn much more. Thank you

  • @trojan6897
    @trojan6897 Рік тому

    Thanks buddy for this awesome content waiting for more videos on langchain and keep explaining everything like you did so that it helps all to understand in detail.

  • @mraaroncruz
    @mraaroncruz Рік тому +3

    Just a note on cost: The embedding also costs money and for a large enough PDF, you might end up with most of your cost coming from this. Especially if you are loading multiple pdfs and not just chatting with one in one session.
    I'm sure you can wrap that embedding step in that callback function (I have never tried so I'm not positive).

    • @MiltonFilhoDev
      @MiltonFilhoDev Рік тому +1

      Hey, thanks for the clarification, that was my biggest question. Is there any way to generate the embeddings locally and avoid paying that extra cost?

    • @horikatanifuji5038
      @horikatanifuji5038 Рік тому +1

      @@MiltonFilhoDev Let me know if you find out the answer to this, I want to know as well

    • @horikatanifuji5038
      @horikatanifuji5038 Рік тому

      @@MiltonFilhoDev Actually, I think if you have a locally running LLM and ask it if the user question relates to the content at each index of the array, save the positives by asking the LLM to give you a code word when it is in fact relatable and while extracting text from the response if your code word is positive, then it will mark the text at that point in the array as related to the user question. It will probably not be as effective as the fakebook algorithm, but it should work at the very least.

    • @grabani
      @grabani Рік тому

      Others who have built similar opensource projects have used online embedding solutions like Pinecone.

    • @mraaroncruz
      @mraaroncruz Рік тому +5

      @@grabani Pinecone stores the embeddings, it doesn't create them. In the video Alejandro is using FAISS, which is an in memory embedding store. The benefits are that it is very fast and free. The downside being that it is not persisted between server restarts. Though you could dump it into a pickle file and reload it on server restarts if you want to persist it.

  • @marklumba1928
    @marklumba1928 Рік тому

    Having watched several GPT for PDF UA-cam tutorials, I must say that yours stands out due to its clear, concise, and comprehensive explanations. Thank you, Alejandro! I have subscribed to your channel and eagerly look forward to your future content.
    By the way, I'm in the process of developing my own Langchain app by replicating the code and incorporating additional features. Could you please guide me on replacing the Open API Key in your .env.example file?

  • @jdjdjdjdjbdnsnsnsj-bs9pg
    @jdjdjdjdjbdnsnsnsj-bs9pg Рік тому

    I made it. I feel like data scientist now

  • @sebastiansanchez4866
    @sebastiansanchez4866 Рік тому

    I really appreciate your videos, great walkthrough. Thanks.

  • @JordanBaumgardner
    @JordanBaumgardner Рік тому

    HA! My Bass instructor just assigned your intro song for me to practice ; )

  • @btscheung
    @btscheung Рік тому

    Nice tutorial video, and thanks for the shout out to my work!

    • @alejandro_ao
      @alejandro_ao  Рік тому

      hello there! thank you for your amazing work! 💪

  • @yusufyilmaz2038
    @yusufyilmaz2038 Рік тому

    Using FAISS part was very good. I would like to see the alternatives and different approaches also. You might consider making different advanced level videos also. Thank you.

    • @alejandro_ao
      @alejandro_ao  Рік тому

      hey there, thanks for your comment. sure thing. as soon as i get back from vacation i’ll be covering more advanced topics! i’ll be posting surveys soon to see which topics you guys prefer. stay tuned :)

  • @TropicalCoder
    @TropicalCoder Рік тому

    Pretty wild! So cool - thanks

  • @sivasingireddi343
    @sivasingireddi343 5 місяців тому

    Hello Aljendro,it's a pretty awasome tutorial for beginers and i have couple of questions ,if you can able to address these it will be very helpful
    1.Suppose in my pdf if it contains the images and tables,graphs and how i can able to convert those things in vector embeddings and perform the querying?

  • @sasukeuchiha-ck4hy
    @sasukeuchiha-ck4hy 11 місяців тому +1

    Is this also working with other programming languages like JS?

  • @polly28-9
    @polly28-9 4 місяці тому

    Thanks for the video! Well Done! I want to know how to make the chatbot to return a list of results. Not only one result, but a list of relevant answer to the input question. I do not know what to change: search index, search parameters, metric_type or what? Can you help me? Thanks!

  • @kishore785
    @kishore785 8 місяців тому

    Awesome. As next step, how to get citation of sources/file names? Any references/material that I can use?

    • @alejandro_ao
      @alejandro_ao  8 місяців тому

      Absolutely! LangChain changed a bit in the latest release, but try to follow their walkthrough to see how to do this. I am uploading a video version of this next week (finally!)
      python.langchain.com/docs/get_started/quickstart

  • @michaelkoltai
    @michaelkoltai Рік тому

    It seems to me that this system does not “hallucinate” at all. If irt still does can you demonstrate it? If not then why bart, chatgpt and the others do?

  • @Kryptikoo
    @Kryptikoo Рік тому

    Thank you for sharing and explaining.

  • @jacoblapkin960
    @jacoblapkin960 Рік тому

    If I’m using 3.5-turbo is there anyway to add a system prompt? Where would I add this?

  • @nguyenngochai6245
    @nguyenngochai6245 Рік тому +1

    Great work. Like and Subscribed. Thank you again for sharing this project. I am looking forward to seeing more of your videos.
    Perhaps, others may have already asked these, but I would like to know if:
    - For PDF files that are non-English languages, would FAISS embedding and the model work as effectively?
    - In the case of multiple pdf files or a large pdf file, is there a way to reduce the embedding cost? (like using non-open AI models, or other embedding methods)
    Thank you for your time.

  • @fatyyyyyyyy
    @fatyyyyyyyy Місяць тому

    i like the way u explain , thanks a LOT !!!!!

  • @programmingwithshobhit6792
    @programmingwithshobhit6792 Рік тому

    Hey!, keep these videos man!

  • @reverse_meta9264
    @reverse_meta9264 Рік тому

    I don't understand how this line works
    embeddings = OpenAIEmbeddings()
    Why is no input variable defined? does it always just assume you have defined a variable named chunks and apply it to that or does it apply it to the most recently defined variable? I would have expected it to be something like this:
    embeddings = OpenAIEmbeddings(chunks)

  • @matthewbonfield956
    @matthewbonfield956 Рік тому

    Great video!! I'm having a problem with exceeding my quota for my OpenAI API key. I don't pay for premium, is it expected that one would exceed their free quota by calling the API for reading a PDF?

  • @dtaulik
    @dtaulik Рік тому

    thanks for such a project tutorial. However, i was wondering if we could use any other fre LLM for this project instead of openai ?

  • @R69-u6q
    @R69-u6q Рік тому

    thank you, man,

  • @yazanrisheh5127
    @yazanrisheh5127 Рік тому

    What GPT model are you using in the video?

  • @yourweebtv8733
    @yourweebtv8733 6 місяців тому

    After uploading the PDF file i got this error : "AxiosError: Request failed with status code 403"

  • @bryanvillalobos757
    @bryanvillalobos757 Рік тому

    Hello, excellent work. If I need it to return a json separating some specific values ​​from the pdf, how would you do it?

  • @arslanabid2245
    @arslanabid2245 Рік тому

    01:04 but what if we want the chatbot to answer that different question using its own knowledge. how can we do it ?

  • @toddstumpf2566
    @toddstumpf2566 6 місяців тому

    Thanks!

    • @alejandro_ao
      @alejandro_ao  6 місяців тому

      you're awesome man, thank you!!

  • @udoigogoi6126
    @udoigogoi6126 Рік тому

    getting an error: index = faiss.IndexFlatL2(len(embeddings[0]))
    IndexError: list index out of range

  • @InkapradnyaPalupi
    @InkapradnyaPalupi Рік тому

    hello sorry for asking how to make the website didn't error thank you

  • @WandyLau
    @WandyLau Рік тому

    Great video

  • @udaydahiya7454
    @udaydahiya7454 Рік тому +1

    Hi, I love this video and the entire project you made. I have one request for the Github Repository. Could you please specify what license this project uses, and whether it is the standard MIT license or not, as I could not find any information regarding that.

    • @alejandro_ao
      @alejandro_ao  Рік тому

      thanks mate! i'm glad you liked it! 🔥 i didn't think of adding a license to it as i don't really think of adding extra code to it. i'd rather that the linked repo remains as it is rn, so that students have the exact same code as in the video! but feel free to fork it and use it as you want. then you can share what you build on as an issue!

    • @ricochetism
      @ricochetism Рік тому

      ​@@alejandro_ao I believe if no license is mentioned it reverts to the default one. Try searching for the default license.

  • @TheAzerue
    @TheAzerue Рік тому

    Good video. One question. How to save embeddings and if i want to improve my knowledge base over time. Do i need to create knowledge base from start each time or i can add in realtime.

  • @alii4334
    @alii4334 Рік тому

    Should we do all the backend work (file chanking and file embedding) every time a user send a query?

  • @PhaniVenkatsai
    @PhaniVenkatsai Рік тому

    Hi
    I am using gpt-3.5-turbo model
    I have uploaded the the first 200 lines of Python Wikipedia data
    but some how I am getting the answer for "who is the creator of java"
    I think so it taking data from internet
    could you please help me on this..
    I just want data from my Chroma DB only.
    Here is my code:
    persist_directory = 'db'
    embedding = OpenAIEmbeddings()
    vectordb = None
    vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
    llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.7 )
    chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vectordb.as_retriever())
    user_question = sys.argv[1]
    response = chain.run(user_question)

  • @Ewakaa
    @Ewakaa Рік тому

    How many of us left the conversation when he said track how much we will be spending

  • @virtualviewing360
    @virtualviewing360 Рік тому

    Working throught your code, and very new to Python etc. When i go to run the code i get an error:
    Traceback (most recent call last):
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\lib
    unpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\lib
    unpy.py", line 86, in _run_code
    exec(code, run_globals)
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\Scripts\streamlit.exe\__main__.py", line 4, in
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\__init__.py", line 55, in
    from streamlit.delta_generator import DeltaGenerator as _DeltaGenerator
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\delta_generator.py", line 43, in
    from streamlit.elements.arrow_altair import ArrowAltairMixin
    File "C:\Users\danie\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\elements\arrow_altair.py", line 36, in
    from altair.vegalite.v4.api import Chart
    ModuleNotFoundError: No module named 'altair.vegalite.v4'
    Am I doing something stupid ?? Cant find anything online. btw, love your videos, veruy instructional and well paced.

  • @rizkiananda352
    @rizkiananda352 Рік тому

    Could you explain. What set of skill I should have to be able to follow your explanation and make the App. I have not learn python yet. Could list what I can learn before practice making what you you have told.

    • @alejandro_ao
      @alejandro_ao  Рік тому

      hey, it's great that you are getting into programming! it really can be life-changer.
      so i would suggest that you start with regular python for beginners courses. you can probably get around it quite quickly. just be sure to understand the basics like variables, lists, dictionaries, loops.
      once you finish with loops, start coding some of these tutorials line by line (don't copy and paste, that's important). even if you don't understand some lines, you can always look up what they do or ask chatgpt when you have a doubt (it's great for explaining code).
      and don't worry about not understanding some things. nobody know everything!!

  • @SoftYoda
    @SoftYoda Рік тому

    How do we know what open-ai llm do we use (Is it gpt2, 3, 3.5, 4?)?
    Wouldn't it be possible to make the price visible in streamlit also ?

  • @kosisochukwuhillary9130
    @kosisochukwuhillary9130 8 місяців тому

    what about a pdf that contains tables. would it analyze it properly?

    • @alejandro_ao
      @alejandro_ao  8 місяців тому

      totally! but i recommend to load tables in a csv or markdown format instead of PDF. It's easier for the model to read this way because tables are represented in pure text.

  • @sovopl
    @sovopl Рік тому

    Great tutorial! I'm curious about generating questions based on the book's content. It seems like I would need to feed the complete book content into the GPT model to achieve that? What are your thoughts on this?

    • @alejandro_ao
      @alejandro_ao  Рік тому

      thanks man! great idea. just make sure to store the vector store in a database like pinecone so that you don't have to pay for the embeddings every time you need them (they can be the most expensive part)

    • @sovopl
      @sovopl Рік тому

      @@alejandro_ao I think the vector store is not a suitable tool for generating questions. I have a project in mind to create an exam simulation that generates questions from PDF documents, and I realize that passing the entire content to an OpenAI model may be expensive. Therefore, I want to explore alternative solutions. One option I'm considering is using only the table of contents, index, or glossary of the document or book to generate questions. This could potentially reduce the amount of data needed for processing and result in a more cost-effective solution.

  • @scottregan
    @scottregan Рік тому

    Is it possible for me to use GPT-4 if I am willing to pay for it? Like if I sign up for the GPT-4 thing (not the chat-plus which I already have) will I get a GPT-4 API for use in a custom bot built with langchain??

    • @alejandro_ao
      @alejandro_ao  Рік тому

      hey mate, you need to sign up to their waiting list. in their api docs, under gpt4 model, there is the link to join the waitlist. but you need to have created an account already because they ask your for your account id (organisation id)

  • @류정현-l9u
    @류정현-l9u Рік тому

    That's awesome clip. thank you.🥰
    i have a quesiton.
    if i want to get longer answer, how can i do?

  • @David-yq2dp
    @David-yq2dp Рік тому

    Does any of this require admin rights on Windows?

  • @mohameddarwish1121
    @mohameddarwish1121 Рік тому

    how to insert Image lable

  • @PerfectionInDetail
    @PerfectionInDetail Рік тому +1

    Hi Alejandro, is it possible to receive information from a database such as firebase instead of pdf? If so what codes of line would I need to change?

    • @alejandro_ao
      @alejandro_ao  Рік тому +1

      Hey Wilson, thanks for the comment. It is possible indeed. It requires a different parsing method. I'll be publishing a video about that next week!

    • @PerfectionInDetail
      @PerfectionInDetail Рік тому +2

      ​@@alejandro_ao Oh thank you Alejandro! Can you also ensure there is memory there so that when you have a conversation with the AI it remembers the conversation and keeps things in context?

    • @eddiegarcia9520
      @eddiegarcia9520 Рік тому +1

      @@PerfectionInDetail Good question. Eager for response :)

  • @Blackfeet
    @Blackfeet 4 місяці тому

    Genius.

  • @saasfamily2831
    @saasfamily2831 Рік тому

    Hi Alejandro, Thank you for great project. I am hitting this error and not able to find any good ans on resolution, This is happening on FAISS line----> Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current
    quota, please check your plan and billing details

  • @aryangautam7506
    @aryangautam7506 Рік тому +1

    The app does not recognizing my API Key . It is showing errors while validating it . I need help
    Anyone ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????

  • @citrussoundworks
    @citrussoundworks Рік тому +9

    If you are getting an error that says "ModuleNotFoundError: No module named 'altair.vegalite.v4'" - you have to downgrade your Altair to version 4.1.0 as version 5 is giving the error. You can do this by running the command "pip install altair==4.1.0"

  • @dw8200
    @dw8200 Рік тому +20

    Thank you so much for the very nice tutorial.
    OpenAI has "gpt-3.5-turbo" available which is much cheaper and should have better performance than its default model here, we just need to specify the model as this:
    llm = OpenAI(model_name="gpt-3.5-turbo")

    • @SoftYoda
      @SoftYoda Рік тому

      what is the default model used ?

    • @JacobSamro
      @JacobSamro Рік тому

      @@SoftYoda Its text-davinci-003

    • @lucasfescina
      @lucasfescina Рік тому +1

      There ins't any free model?

    • @scottregan
      @scottregan Рік тому

      Is it possible for me to use GPT-4 if I am willing to pay for it? Like if I sign up for the GPT-4 thing (not the chat-plus which I already have) will I get a GPT-4 API for use in a custom bot built with langchain??

    • @dw8200
      @dw8200 Рік тому +3

      as of 2023-06-27, LangChain has set the default model to gpt-3.5-turbo. No need to do the above setting.
      There are other open-source models available right now, such as flan-alpaca-large.
      llm = HuggingFaceHub(repo_id="declare-lab/flan-alpaca-large", model_kwargs={"temperature":0, "max_length":512})

  • @alejandro_ao
    @alejandro_ao  Рік тому +7

    Langchain is a framework that allows you to build apps with LLMs like ChatGPT or GPT4All🔥
    Edit: Be careful with adding super long documents as the embedding also has a cost. Pricing is minimal but it can scale up pretty quickly if you embed long texts. Here are OpenAI's embeddings prices: openai.com/pricing#embedding-models
    Let me know if you want a detailed course on Langchain 💪

    • @yaseenkhan-oq4ih
      @yaseenkhan-oq4ih Рік тому +1

      regarding your Chat gpt video about maths when how can we copy and paste into word

    • @sean9901
      @sean9901 Рік тому

      yes !! i find the documentation on their website hard to use for creating own apps, a nice series on how to use Langchain for more applications would be nice (details on how to use agents, models, tools, ...) so others can do experimental apps. There are other videos on youtube explaining how to in general but not in detail like you did with this one. Kudos

    • @alejandro_ao
      @alejandro_ao  Рік тому

      @@sean9901 there are definitely more videos about langchain and its features coming very soon 😎

    • @alejandro_ao
      @alejandro_ao  Рік тому

      @@yaseenkhan-oq4ih i’m actually adding that feature to the extension, so be sure to install it from the chrome web store to get the updates ;)

    • @jalalabadass
      @jalalabadass Рік тому

      Is this why you used FAISS instead of OpenAI embeddings? Would OpenAI embeddings given better results though?

  • @Stingray-le1pq
    @Stingray-le1pq 5 місяців тому +1

    Thanks for the video, its been very helpful. However I keep getting the following error 'AttributeError: module 'openai' has no attribute 'error'' especially when i run these scripts: chain = load_qa_chain(llm=llm, chain_type="stuff")
    response = chain.run(input_documents=docs, question=user_question).
    Not sure if i'm missing something. Thanks!

  • @Anonymous-lw1zy
    @Anonymous-lw1zy Рік тому +14

    Fabulous job! Great use of the diagram to provide the high level view. I like that you kept the code simple and focused on the main objective, and covered each part step-by-step. I also like that you used FAISS, since it is free.

    • @alejandro_ao
      @alejandro_ao  8 місяців тому

      I'm glad you appreciated it! The new release of LangChain is out now btw!

  • @appsbusiness899
    @appsbusiness899 Рік тому +1

    Can you please make same video with open source (i.e free) LLM? Thanks.

  • @yamani3882
    @yamani3882 Рік тому +2

    If I am getting charged by using my openai key then I automatically lose interest since I can’t scale this without worrying about cost.😢

    • @alejandro_ao
      @alejandro_ao  Рік тому

      Don't let that discourage you! Rather think of ways to monetize your app :) In this example, if you charge your users $30 a month, they would need to make ~1500 requests per month to break even. If they do fewer than that, all the rest is profit for you ;)

  • @SomanshuDubey
    @SomanshuDubey Рік тому +1

    RateLimitError: You exceeded your current quota, please check your plan and billing details. I am getting this message please how to solve it

    • @WesFang
      @WesFang Рік тому

      I'm getting the same

  • @kennardplays3325
    @kennardplays3325 Рік тому +2

    First of all thank you so much for making this tutorial. This is one of the best tutorials I've seen on this topic. So I've edited your code to be able to use Google's flan-t5-xl model using Hugging Face API, but I always encountered "ValueError: Error raised by inference API: Model google/flan-t5-xl time out" time out error. Do you have any idea on how I can fix this? I'm still new in coding too so It'll be great if you can help with this issue. Thanks again!

  • @GeorgeTrialonis
    @GeorgeTrialonis Рік тому +1

    Hi Alejandro. Thanks for the tutorial. I would like to ask you this: My prompt to 'Ask-your-PDF' is this: "What is the name of the girl in the beginning of the story?" and I get this response: InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 10716 tokens (10460 in your prompt; 256 for the completion). Please reduce your prompt; or completion length. Why? Surely, the prompt is very very short. Thank you.

    • @GeorgeTrialonis
      @GeorgeTrialonis Рік тому

      Following more tests with other PDF documents, which proved successful, I concluded that there is nothing wrong with the code but with the initial document I tested. It contained a number of 'funny' characters or the encoding was not approriate.

    • @alejandro_ao
      @alejandro_ao  Рік тому

      Hello George, I suspect that it might have been an issue with the text splitting. When you ask a question to the LLM, actually the full prompt that is sent to the LLM contains not only your question but also the chunks of text that are related to it. You can try reducing the size of the chunks, for example. I hope this helps!

  • @roguesecurity
    @roguesecurity Рік тому +3

    Thanks for this video, it was the most well explained video I have found on UA-cam till now.
    One question, how is this knowledge base created using FAISS performs against vector databases like Pinecone for semantic searches?

  • @andreyseas
    @andreyseas Рік тому +8

    You are a really good teacher. Love how you explain things step by step without overwhelming. Keep it up!

    • @alejandro_ao
      @alejandro_ao  Рік тому +1

      Thanks, it means a lot!

    • @monarch9586
      @monarch9586 Рік тому

      ​@@alejandro_ao
      Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
      also can you make a video on environment setup. Please i really need to submit the project within one week

  • @nogool111
    @nogool111 Рік тому +1

    Can I use the information in the knowledgebase to write a paragraph or summarize information rather than "question and answer"? And how can i do that? Thank you.

  • @thehogchop
    @thehogchop Рік тому +1

    Great tutorial, very thorough. I'm getting an error however, I wonder if anyone else has encountered it?
    "ImportError: cannot import name 'LLM' from partially initialized module 'langchain.llms.base' (most likely due to a circular import)"

    • @laptopuser5198
      @laptopuser5198 Рік тому

      I had this error, but i believe it was fixed by 'pip install openai' i did finally get it running.

  • @Dj_Nizzo
    @Dj_Nizzo Рік тому +1

    Is there a way to have AI do all of this for me?

    • @alejandro_ao
      @alejandro_ao  Рік тому

      you can always get help from ChatGPT and GH Copilot. But I recommend learning to do it yourself before using those tools extensively!

  • @ReddSpark
    @ReddSpark Рік тому +4

    Holy smokes - the conciseness of you code given all that it's doing behind the scenes is breathtaking! Impressive!

  • @sudipshrestha5633
    @sudipshrestha5633 Рік тому +1

    wats the IDE u used? it would be helpful if you had spent a minute to explain how you set up your project. Great videp

    • @alejandro_ao
      @alejandro_ao  Рік тому +1

      hey there, i use vs code. i might make a video in the future on how to set up a project for building an app with source control and virtual environments!

  • @abemindepth
    @abemindepth 16 днів тому

    Hey bro some type of "insufficient_quota" error is occuring on me with openai, what shall I do? Please answer soon, wanna learn it quickly, and also the OpenAI llm isn't showing that it's imported by showing it in green font.

  • @ovskihouse5270
    @ovskihouse5270 8 місяців тому +1

    I love this content.. thanks for sharing

  • @almahdibakkali8007
    @almahdibakkali8007 Рік тому

    What color is the sky? - How to add custom templates to ConversationalRetrievalChain?
    This is actually important to add. If you ask about unrelated topic, like "What color is the sky", it will still answer related to the video...
    I tried combine_docs_chain_kwargs={"prompt": prompt} with no success. Also, I can't make the code work with LLMChain, like:
    ` chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, human_message_prompt]
    )
    chain = LLMChain(llm=chat, prompt=chat_prompt)`

  • @大黄-f2x
    @大黄-f2x Рік тому

    Why this error??
    InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5457 tokens (5201 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

  • @hillmanlai3270
    @hillmanlai3270 7 місяців тому

    First of all, thanks for your clear explanation. Very impression tutorial.
    I followed your codes and encountered an error "AttributeError: module 'tiktoken' has no attribute 'model'" when reached the show user input session
    user_question = st.text_input("Ask a question about your PDF:")
    if user_question:
    docs = knowledge_base.similarity_search(user_question)
    st.write(docs)
    I received the same error when I ran the codings cloned from your git. How to fix this?

  • @会飞的猪-s8d
    @会飞的猪-s8d Рік тому

    How do I modify InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 5329 tokens (5073 in your prompt; 256 for the completion). Please reduce your prompt; or completion length. thanks

  • @tushaar9027
    @tushaar9027 Рік тому

    Hi Alejandro , great video man superb explanation step by step ... i am trying to use Azure open ai with this ........changed llm to get from AzureOpenAI
    llm = AzureOpenAI(
    # deployment_name="",
    # model_name="",
    # )
    Getting the below Error from AzureOpenAI any idea how to solve this?
    InvalidRequestError: Too many inputs. The max number of inputs is 1. We hope to increase the number of inputs per request soon

  • @Snakeeater9210
    @Snakeeater9210 Рік тому

    Thanks so much for the course. Sorry there is something that I'm very confused about. Where is the API key used here? api_key = os.getenv('OPENAI_API_KEY'). api_key is never used right yet it works. I'm confused.

  • @gerardorosiles8918
    @gerardorosiles8918 Рік тому +1

    Are the embeddings computed locally when the FAISS object is initialized or are the embeddings done using the OpenAI API? (e.g. you upload your chunks and then get back the embeddings)

  • @Dave-nz5jf
    @Dave-nz5jf Рік тому

    Sooo .. i rewrote this app to use a local LLM , and while it works, it's incredibly slow . But when I use a generic script to load and query the model -- it's lightning fast. Are the embeddings my problem? While it's slow I can see Langchain is pegged at 100% cpu .. so I suspect it may have something to do with the conversation.

  • @PritamBhakta-m8d
    @PritamBhakta-m8d Рік тому

    ImportError: cannot import name 'BaseLLM' from partially initialized module 'langchain.llms.base' (most likely due to a circular import) I getting this error Can anyone tell me what's the issue

  • @sukriadnansangadji8232
    @sukriadnansangadji8232 Рік тому

    thanks for your good explanation Alejandro, I have a question about how to apply Langchain PDF app for arabic literatur

  • @monarch9586
    @monarch9586 Рік тому

    @alejandro_ao Hi can you help me resolve this error --> No module named 'altair.vegalite.v4', I am not able to start streamlit
    also can you make a video on environment setup. Please i really need to submit the project within one week

  • @MohitKumar-gp6nr
    @MohitKumar-gp6nr Рік тому

    I have some JSON files which I want to use for chatbot data source. How to store the JSON information in Croma DB using embedding and then retrieve it based on the user query. I googled a lot but did not find any answers.

  • @yashruparelia8662
    @yashruparelia8662 Рік тому

    I have used this same code for understanding but when I call OpenAIEmbeddings functions it shows me an error like you exceed your current quota please check your plan and billing details what this error is about please let me know so that I can solve it.

  • @prazyraj1735
    @prazyraj1735 5 місяців тому

    I have this use-case where there are different types of documents. I can parse documents using document loaders using langchain. But, there are images also in these documents. I want to store them as metadata and if answer generated from a context chunk it show the image also. Please help.

  • @dark_legions2227
    @dark_legions2227 Рік тому

    Awesome.. Can you please make like upload python script and ask questions about script from chatgpt

  • @Sports-Made
    @Sports-Made Рік тому

    This is great , but the cost is too high.. Open Ai is milking everyone now because 6 months down the line we'll have cheaper options. I appreciate what open AI is doing but there price model is off and if they dont fix it they will be in trouble sooner or later.

  • @AshokkumarA-u3z
    @AshokkumarA-u3z Рік тому

    Hi brother , I've watched your videos , it was amazing and well explanation by you bro , I need one help i am new to this, gave me suggestion , i am try to upload the huge data file , so help me in any other API will accept huge character data pdf

  • @ilducedimas
    @ilducedimas Рік тому

    Aren't you re-creating the chunks embedding at each calls ? It seems redundant and costly api-wise, what do you think?

  • @mainakmukhrjee6328
    @mainakmukhrjee6328 11 місяців тому

    Could you pls show how to use open source LLMs and embeddings just at the end of the vid real quick? It will really help students like me