Thank you so much for sharing the details, Pradip! Your informative UA-cam videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!
@Pradip Nichite : Help me with the below queries pls 1. Is chromadb good for prod since it uses in memory db 2. What is limit of docs we can store in a chromadb collection? 3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work. 4. Is there any other vector db which has hosted application?
can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer. Any thoughts?
thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help
@@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt. Kindly let me know how can i control this?
Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?
from chromadb.config import Settings client = chromadb.Client(Settings( chroma_db_impl="duckdb+parquet", persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory )) Here persist_directory="pet_db" is local directory
Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.
We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it. Do you have any particular usecase in mind that I can cover?
Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb") idea_collection_emb.add( documents=documents, embeddings=embeddings, metadatas=metadatas, ids=ids )
@Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%
Hi If I give n_results=2, then I am getting all docs {'ids': [['id2', 'id1']], 'distances': [[0.8069301247596741, 1.648103952407837]], 'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]], 'embeddings': None, 'documents': [['This is a document about car', 'This is a document about cat']]}
Thank you so much for sharing the details, Pradip! Your informative UA-cam videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!
My pleasure!
Thank you Pradip for the first I watched your video its amazing and excellent. Its great tutorials to learn keep doing...
Thanks and welcome
Where is the coffee button :) great work! let me know if you offer paid one on one tutorials
Hi Thanks, No I dont offer one on one coaching. Now days very much busy with freelancing work
Adorable , thanks #PradipNichite for your time and efforts👍👍👍
Thanks for your efforts, waiting for chromadb along with langchain similar to your previous video (pinecone with langchain)
Just Published: ua-cam.com/video/5NG8mefEsCU/v-deo.html
@Pradip Nichite : Help me with the below queries pls
1. Is chromadb good for prod since it uses in memory db
2. What is limit of docs we can store in a chromadb collection?
3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work.
4. Is there any other vector db which has hosted application?
can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer.
Any thoughts?
thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help
What usecase you have? you can try different embeding model and use some threshold for semantic score.
@@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt.
Kindly let me know how can i control this?
Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?
from chromadb.config import Settings
client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory
))
Here persist_directory="pet_db" is local directory
How can I change similarity search algorithm in query
Like cosine ecludien etc..
Where the db space will create and how it can be user given path
Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.
Thank you Pradip! Hopefully, soon will see besides normal pdfs/txts extractions, what about extracting data from Tables and store them in DB?
We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it.
Do you have any particular usecase in mind that I can cover?
@@FutureSmartAI , You can consider PDF Query tool to QA on tabular data, especially PDF's like 10K, 10Q documents, Tanks in advance #Pradip
Nice video. Which is the best open source Vector DB? Pinecode FREE tier is really limited.
Hi Open Source mostly I have used only chroma DB and its working well. Slowly I am exploring others
Thanks for the video. Question how to get the vectors from chroma db where my documents already stored in the persist db?
you mean you already have db?
@@FutureSmartAI yes. i already have the db
Great 👍
Thanks for the visit
Hi sir . I have a couch db . Can I create a vector db for it
HI Pradip, Its a basic question but need to instruct my team, can you please let me the know the tool which you used to run the code.
google colab bro
Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb")
idea_collection_emb.add(
documents=documents,
embeddings=embeddings,
metadatas=metadatas,
ids=ids
)
were you able to solve it?
sir how to set it as a pinecone alternative and use as per your previous video with gpt 3.5 turbo ?
Check this : ua-cam.com/video/5NG8mefEsCU/v-deo.html
@Pradip How to store and retrieve data in vector format
Hi can you ellaborate on exactly what you want to do?
Where do we get the Pets data
its sample file generated using chatgpt
i am facing issues with chromadb installation on unbuntu
You know you know you know you know you know
But I was hoping for semantic search
Hi What you are expecting in Semantic Search. May you can check other tutorials where I explained
Embeding, Semantic Search , QnA etc.
@Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%
Thanks for suggestions
hater
Hi
If I give n_results=2, then I am getting all docs
{'ids': [['id2', 'id1']],
'distances': [[0.8069301247596741, 1.648103952407837]],
'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]],
'embeddings': None,
'documents': [['This is a document about car',
'This is a document about cat']]}