Semantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative | Code

Pradip Nichite

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 гру 2024

КОМЕНТАРІ • 49

@Andromeda26_ Рік тому ⁺³
Thank you so much for sharing the details, Pradip! Your informative UA-cam videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!
@FutureSmartAI Рік тому ⁺¹
My pleasure!
@mukundhachar303 3 місяці тому ⁺¹
Thank you Pradip for the first I watched your video its amazing and excellent. Its great tutorials to learn keep doing...
@FutureSmartAI 2 місяці тому
Thanks and welcome
@avishsharma8852 Рік тому ⁺¹
Where is the coffee button :) great work! let me know if you offer paid one on one tutorials
@FutureSmartAI Рік тому
Hi Thanks, No I dont offer one on one coaching. Now days very much busy with freelancing work
@zaheerbeg4810 Рік тому ⁺¹
Adorable , thanks #PradipNichite for your time and efforts👍👍👍
@moralstorieskids3884 Рік тому ⁺¹
Thanks for your efforts, waiting for chromadb along with langchain similar to your previous video (pinecone with langchain)
@FutureSmartAI Рік тому
Just Published: ua-cam.com/video/5NG8mefEsCU/v-deo.html
@shinycaroline3722 8 місяців тому
@Pradip Nichite : Help me with the below queries pls
1. Is chromadb good for prod since it uses in memory db
2. What is limit of docs we can store in a chromadb collection?
3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work.
4. Is there any other vector db which has hosted application?
@SatyendraJaiswal-hz1cb 5 місяців тому
can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer.
Any thoughts?
@prasanosara1944 Рік тому ⁺²
thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help
@FutureSmartAI Рік тому
What usecase you have? you can try different embeding model and use some threshold for semantic score.
@prasanosara1944 Рік тому
@@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt.
Kindly let me know how can i control this?
@vinven7 Рік тому ⁺¹
Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?
@FutureSmartAI Рік тому
from chromadb.config import Settings
client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory
))
Here persist_directory="pet_db" is local directory
@idveernegi1021 8 місяців тому
How can I change similarity search algorithm in query
Like cosine ecludien etc..
@SoundTamilan 7 місяців тому
Where the db space will create and how it can be user given path
@test12345265 Рік тому
Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.
@quantadotonium3654 Рік тому ⁺¹
Thank you Pradip! Hopefully, soon will see besides normal pdfs/txts extractions, what about extracting data from Tables and store them in DB?
@FutureSmartAI Рік тому
We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it.
Do you have any particular usecase in mind that I can cover?
@zaheerbeg4810 Рік тому
@@FutureSmartAI , You can consider PDF Query tool to QA on tabular data, especially PDF's like 10K, 10Q documents, Tanks in advance #Pradip
@karamjittech Рік тому
Nice video. Which is the best open source Vector DB? Pinecode FREE tier is really limited.
@FutureSmartAI Рік тому ⁺¹
Hi Open Source mostly I have used only chroma DB and its working well. Slowly I am exploring others
@henkhbit5748 Рік тому
Thanks for the video. Question how to get the vectors from chroma db where my documents already stored in the persist db?
@FutureSmartAI Рік тому
you mean you already have db?
@henkhbit5748 Рік тому
@@FutureSmartAI yes. i already have the db
@satheeshthangaraj5614 Рік тому ⁺²
Great 👍
@FutureSmartAI Рік тому
Thanks for the visit
@ravitejavemula333 Рік тому
Hi sir . I have a couch db . Can I create a vector db for it
@chet3118 Рік тому
HI Pradip, Its a basic question but need to instruct my team, can you please let me the know the tool which you used to run the code.
@kien3848 Рік тому ⁺¹
google colab bro
@OlaPraveenMishra Рік тому
Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb")
idea_collection_emb.add(
documents=documents,
embeddings=embeddings,
metadatas=metadatas,
ids=ids
)
@kamalinichauhan4407 10 місяців тому
were you able to solve it?
@shrutinathavani Рік тому
sir how to set it as a pinecone alternative and use as per your previous video with gpt 3.5 turbo ?
@FutureSmartAI Рік тому
Check this : ua-cam.com/video/5NG8mefEsCU/v-deo.html
@TheRealWakanda Рік тому
@Pradip How to store and retrieve data in vector format
@FutureSmartAI Рік тому
Hi can you ellaborate on exactly what you want to do?
@mwanthidaniel1254 Рік тому
Where do we get the Pets data
@FutureSmartAI Рік тому
its sample file generated using chatgpt
@ShrutikaGulave Рік тому
i am facing issues with chromadb installation on unbuntu
@notmeno4881 Рік тому
You know you know you know you know you know
@avishsharma8852 Рік тому
But I was hoping for semantic search
@FutureSmartAI Рік тому
Hi What you are expecting in Semantic Search. May you can check other tutorials where I explained
Embeding, Semantic Search , QnA etc.
@wasimsalafi Рік тому ⁺¹
@Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%
@FutureSmartAI Рік тому
Thanks for suggestions
@biniyam106 11 місяців тому
hater
@rageshantony2182 Рік тому
Hi
If I give n_results=2, then I am getting all docs
{'ids': [['id2', 'id1']],
'distances': [[0.8069301247596741, 1.648103952407837]],
'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]],
'embeddings': None,
'documents': [['This is a document about car',
'This is a document about cat']]}

Наступне

Автоматичне відтворення

Building a Document-based Question Answering System with LangChain, Pinecone, and LLMs like GPT-4.