Semantic Search with Open-Source Vector DB: Chroma DB | Pinecone Alternative | Code

Поділитися
Вставка
  • Опубліковано 10 гру 2024

КОМЕНТАРІ • 49

  • @Andromeda26_
    @Andromeda26_ Рік тому +3

    Thank you so much for sharing the details, Pradip! Your informative UA-cam videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!

  • @mukundhachar303
    @mukundhachar303 3 місяці тому +1

    Thank you Pradip for the first I watched your video its amazing and excellent. Its great tutorials to learn keep doing...

  • @avishsharma8852
    @avishsharma8852 Рік тому +1

    Where is the coffee button :) great work! let me know if you offer paid one on one tutorials

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Hi Thanks, No I dont offer one on one coaching. Now days very much busy with freelancing work

  • @zaheerbeg4810
    @zaheerbeg4810 Рік тому +1

    Adorable , thanks #PradipNichite for your time and efforts👍👍👍

  • @moralstorieskids3884
    @moralstorieskids3884 Рік тому +1

    Thanks for your efforts, waiting for chromadb along with langchain similar to your previous video (pinecone with langchain)

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Just Published: ua-cam.com/video/5NG8mefEsCU/v-deo.html

  • @shinycaroline3722
    @shinycaroline3722 8 місяців тому

    @Pradip Nichite : Help me with the below queries pls
    1. Is chromadb good for prod since it uses in memory db
    2. What is limit of docs we can store in a chromadb collection?
    3. I have tried using Pinecone through langchain, could see my stored index in hosted application, but not able to retrieve the embeddings from it and use it. Eg. Pinecone.from_existing_index() doesn't seem to work.
    4. Is there any other vector db which has hosted application?

  • @SatyendraJaiswal-hz1cb
    @SatyendraJaiswal-hz1cb 5 місяців тому

    can we query from our csv dataset as well, i tried creating vector embeddings but while querying any prompt am not able to get exact answer. for eg i have a airlines data set and i am giving prompt like "how many customers are frequent flyer" so in response i am expecting overall count but its giving me name and id for a particular customer.
    Any thoughts?

  • @prasanosara1944
    @prasanosara1944 Рік тому +2

    thanks pradip for great tutorial! i have couple of questions, 1. is chromadb is best in semantic search? can you please do a comparison video for vector dbs? 2. these semantic search are giving too many unwanted and un related results, how to filter them out? Kindly help

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      What usecase you have? you can try different embeding model and use some threshold for semantic score.

    • @prasanosara1944
      @prasanosara1944 Рік тому

      @@FutureSmartAI thanks for the response! for example , i have a requirement to get set of survey questions relevant to group of people who can be described as "Single men at the age of 45", these vector search is giving higher priorities to the questions such as "What is the age of your children?" instead of "What is your age?" from the list of questions i send as part of prompt.
      Kindly let me know how can i control this?

  • @vinven7
    @vinven7 Рік тому +1

    Thanks so much Pradip for this great video and explanation of ChromaDb! The code file that you shared seems to be missing the section where you store the db in a local directory. Can you made also explain how to do a client.upsertion if the document corpus is very large?

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      from chromadb.config import Settings
      client = chromadb.Client(Settings(
      chroma_db_impl="duckdb+parquet",
      persist_directory="pet_db" # Optional, defaults to .chromadb/ in the current directory
      ))
      Here persist_directory="pet_db" is local directory

  • @idveernegi1021
    @idveernegi1021 8 місяців тому

    How can I change similarity search algorithm in query
    Like cosine ecludien etc..

  • @SoundTamilan
    @SoundTamilan 7 місяців тому

    Where the db space will create and how it can be user given path

  • @test12345265
    @test12345265 Рік тому

    Thank you Pradip for a great video. What is the limitation of Chroma DB (ie number of MBytes, number of documents, etc)? I tried to index 2000+ PDF files for semantic search, however, it always stopped at PDF #273. No error message was given.

  • @quantadotonium3654
    @quantadotonium3654 Рік тому +1

    Thank you Pradip! Hopefully, soon will see besides normal pdfs/txts extractions, what about extracting data from Tables and store them in DB?

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      We can do that you can even use libraries like pdf plumber to extract and then GPT to answer qouestions from it.
      Do you have any particular usecase in mind that I can cover?

    • @zaheerbeg4810
      @zaheerbeg4810 Рік тому

      @@FutureSmartAI , You can consider PDF Query tool to QA on tabular data, especially PDF's like 10K, 10Q documents, Tanks in advance #Pradip

  • @karamjittech
    @karamjittech Рік тому

    Nice video. Which is the best open source Vector DB? Pinecode FREE tier is really limited.

    • @FutureSmartAI
      @FutureSmartAI  Рік тому +1

      Hi Open Source mostly I have used only chroma DB and its working well. Slowly I am exploring others

  • @henkhbit5748
    @henkhbit5748 Рік тому

    Thanks for the video. Question how to get the vectors from chroma db where my documents already stored in the persist db?

  • @satheeshthangaraj5614
    @satheeshthangaraj5614 Рік тому +2

    Great 👍

  • @ravitejavemula333
    @ravitejavemula333 Рік тому

    Hi sir . I have a couch db . Can I create a vector db for it

  • @chet3118
    @chet3118 Рік тому

    HI Pradip, Its a basic question but need to instruct my team, can you please let me the know the tool which you used to run the code.

    • @kien3848
      @kien3848 Рік тому +1

      google colab bro

  • @OlaPraveenMishra
    @OlaPraveenMishra Рік тому

    Expected each value in the embedding to be a int or float, got getting this error buddy, can you help with this. idea_collection_emb = client.create_collection("idea_collection_emb")
    idea_collection_emb.add(
    documents=documents,
    embeddings=embeddings,
    metadatas=metadatas,
    ids=ids
    )

  • @shrutinathavani
    @shrutinathavani Рік тому

    sir how to set it as a pinecone alternative and use as per your previous video with gpt 3.5 turbo ?

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Check this : ua-cam.com/video/5NG8mefEsCU/v-deo.html

  • @TheRealWakanda
    @TheRealWakanda Рік тому

    @Pradip How to store and retrieve data in vector format

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Hi can you ellaborate on exactly what you want to do?

  • @mwanthidaniel1254
    @mwanthidaniel1254 Рік тому

    Where do we get the Pets data

  • @ShrutikaGulave
    @ShrutikaGulave Рік тому

    i am facing issues with chromadb installation on unbuntu

  • @notmeno4881
    @notmeno4881 Рік тому

    You know you know you know you know you know

  • @avishsharma8852
    @avishsharma8852 Рік тому

    But I was hoping for semantic search

    • @FutureSmartAI
      @FutureSmartAI  Рік тому

      Hi What you are expecting in Semantic Search. May you can check other tutorials where I explained
      Embeding, Semantic Search , QnA etc.

  • @wasimsalafi
    @wasimsalafi Рік тому +1

    @Pradip Nichite can you remove like 99% of "you know"s from your recordings and re-upload please? perhaps 1% of the time it is ok or better remove them 100%

  • @rageshantony2182
    @rageshantony2182 Рік тому

    Hi
    If I give n_results=2, then I am getting all docs
    {'ids': [['id2', 'id1']],
    'distances': [[0.8069301247596741, 1.648103952407837]],
    'metadatas': [[{'category': 'vehicle'}, {'category': 'animal'}]],
    'embeddings': None,
    'documents': [['This is a document about car',
    'This is a document about cat']]}