Langchain + ChatGPT + Pinecone: A Question Answering Streamlit App
Вставка
- Опубліковано 17 чер 2023
- In this exciting video tutorial, I walk you through creating a Streamlit application that allows you to search and query PDF documents effortlessly. Using cutting-edge technologies such as Pinecone and LLM (OpenAI's ChatGPT), I guide you step-by-step in harnessing the potential of these tools.
By leveraging Pinecone as a vector database and search engine, we enable lightning-fast search capabilities for PDF documents. Additionally, we employ LLM to enhance the search functionality with question-answering capabilities, making your app even more versatile and intelligent.
To ensure smooth data preprocessing, chains, and other essential tasks, we utilize the incredible Langchain framework. With its powerful features, Langchain simplifies and streamlines the development process, enabling you to focus on building an exceptional PDF query search app.
Whether a beginner or an experienced developer, this tutorial provides a comprehensive guide to building your own Streamlit app with Pinecone, LLM, and Langchain. Join me as we dive deep into natural language processing and create a game-changing application together!
Don't forget to like, share, and subscribe to stay updated on the latest advancements in AI/ML.
GitHub Repo: github.com/AIAnytime/QA-in-PD...
OpenAI API: platform.openai.com/account/a...
Langchain Doc: python.langchain.com/docs/get...
Pinecone Vector DB: www.pinecone.io/
Streamlit Chat Repo: github.com/AI-Yash/st-chat
LLM Playlist: • Large Language Models
#ai #python #coding
Your Queries:-
pinecone ai tutorial
pinecone ai memory
embeddings from language models
langchain
langchain tutorial
langchain agent
langchain chatbot
langchain tutorial python
chatgpt
chatgpt explained
chat gpt
chatgpt how to use
chatgpt tutorial
question answering in artificial intelligence
question answering nlp
question answering app
streamlit tutorial
streamlit python
streamlit web app
Langchain + ChatGPT + Pinecone: A Question Answering Streamlit App - Наука та технологія
thanks so much for posting this - it's been very helpful!
Just wanted to ask about the doc_preprocessing fx - I sometimes get "ValueError: zero-size array to reduction operation maximum which has no identity" when trying to run streamlit
I first got the error when I downloaded a google sheet file (containing text) as a pdf. So I deleted that file and then retried with downloading a google word file as pdf and streamlit loaded and worked fine.
But if I have both of the above mentioned files, then the error recurs again. I'm assuming it must have something to do with the data type of the google sheet based pdf messing with the directoryloader module. But it's interesting how it ends up being a zero-size array.
Just wondering if you had any insights into the issue?
Yo bro, great video! However, I got an error 'batch size exceeds maximum'... Does that mean I use too many documents? And can I fix that?
In the scenario of conversational robots, how to limit the token consumption of the entire conversation?
For example, once the consumption reaches 1,000, it will prompt that the tokens for this conversation have been used up.
Great. Thanks for the video. Do you know how I can make it show the sources from where it was consulted? or for example to show the links from where the information was extracted (for the case when doing web scrap)?
Please look at my other videos in the LLM playlist. I have shown source citation, etc. For web scraping examples, look at open AI function and Langchain agent video.
Thank you!!! Great resource. Pinecone has moved into a serverless model, and apparently there have been quite a bit of movements with the langchain packages. Would it be possible for us to have an updated script as of Apr 2024? Otherwise, I would be very interested in a private meet to discuss this. Would greatly appreciate it!
Great video, thanks! Would it work if we replace openAI with LaMini-LLM? in order to run it on a cpu.
What a timing @jorgerios4091.. just finished recording the exact same video. Langchain + Sentence Transformers + Chroma DB+ LaMiNi LM.... The video will be available tomorrow. End to end with workflow explanation. Plz subscribe to the channel after tomorrow's video if you like.
@@AIAnytime I'm in Sir! Big thanks!
Here we go: ua-cam.com/video/rIV1EseKwU4/v-deo.html
This one is good , but i have one question if pdf have information in table format so it will still able to retrieve data from it?
Hi Akshay, it should retrieve from the table also. You can also check out TAPAS that works well on table and also open source. Find it here:huggingface.co/docs/transformers/model_doc/tapas
@@AIAnytime what about the text documents like txt file or docs file
so we only use openai for generating embedding and using pinecone store embedding and querying result ?
Hi Abdul, in this case yes! OpenAI embeddings 002 for embedding through Langchain integration..... Pinecone acts as a vector DB which has many features. First is information in lower dimensional spaces, inbuilt semantic search capabilities, algorithms like Cosine are built, etc etc. It is faster as well when you compare it to traditional mechanisms.
And LLM is again used for humans like output when it retrieves the information from embeddings.
@@AIAnytime can you build one generative chatbot without using openai api key and also storing embedding in some free space
thanks heaps for this tutorial. Are you able to add from langchain.prompts import (
ChatPromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate
)
and
from langchain.chains import ConversationChain
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
import streamlit as st
from streamlit_chat import message
and get this app you make in this tutorial have buffer memory and retrieve answers only from the /data folder (corpus of PDFs?) or is this not posssible. I can't find a video that explains how to QA my own corpus and use the buffer memory.
Thank you for this video, would this work to generate q/a from pdf ?
Watch my Question Answer Generator video. In the LLM playlist.
@@AIAnytime Thank you, If I want to store vectors on SupaBase is it the same process as Pinecone ?
Hi
i am getting this error in your code
can please check this
, in partition
elements = partition_pdf(
NameError: name 'partition_pdf' is not defined. Did you mean: 'partition_xml'?
Few steps if you are getting partition_pdf not found error:
1. Check Unstructured version. You need to install pip install unstructured==0.7.12
2. If the above doesn't work, do a pip install langchain == 0.0.251
can you make a video on langchain streaming response using RetrievalQA and pinecone
Very good idea. I am working on it. Will post soon.... Thanks