PostgreSQL as VectorDB - Beginner Tutorial

Поділитися
Вставка
  • Опубліковано 20 гру 2023
  • Want to get started with freelancing? Let me help: www.datalumina.com/data-freel...
    Need help with a project? Work with me: www.datalumina.com/consulting
    🔗 Links in this video
    github.com/daveebbelaar/langc...
    github.com/pgvector/pgvector
    dev.to/confidentai/why-we-rep...
    👤 Connect with me on LinkedIn
    / daveebbelaar
    👋🏻 About Me
    Hey there, my name is @daveebbelaar and I work as a freelance Data Scientist / AI Engineer and run a company called Datalumina. You've stumbled upon my UA-cam channel, where I give away all my secrets when it comes to working with data. If you want to learn more about what I do, then head over to www.datalumina.com/

КОМЕНТАРІ • 37

  • @ConnorLeech
    @ConnorLeech 7 днів тому

    in the video you are creating the data from text files, but it seems like a main advantage of having it on your postgres db is being able to use / query the data in your tables.
    i'd love to see how to build a full text search or something from data stored in regular postgres tables!

  • @fabsync
    @fabsync Місяць тому +1

    A new fan here! It will be great to see a video where you use streamlit or something else to create a search with pgvector (full text search)

  • @gr8tbigtreehugger
    @gr8tbigtreehugger 6 місяців тому

    Thanks for this! I was leaning towards pgvector and your video convinced me so!

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r 6 місяців тому +1

    One of the things I learned in the past few months working with RAG-based LLMs is that it's definitely not one size fits all. The quality of inference depends on the embedding algorithm as well as the indexing and retrieval mechanism of the vector database.
    This was a great video!

  • @abhishekchopda4100
    @abhishekchopda4100 5 місяців тому

    Great Video! Helped me in my work! Thanks :)

  • @myhificloud
    @myhificloud 6 місяців тому

    Clean solution. This is helpful, thank you for this.

  • @tushaar9027
    @tushaar9027 18 днів тому

    Hi Dave, this is great video thanks for sharing the knowledge , i really liked the idea of using postgres sql , can you pls make one video on setting up postgres on azure

  • @krunkey
    @krunkey Місяць тому

    Thanks for the video. I'll be trying PGVector! Do you know of any good alternative to OpenAI embeddings that can be run locally?

  • @Michael-jl7wn
    @Michael-jl7wn 5 місяців тому

    How would this work if you were using more structured data that needed to be stored in columns and rows?

  • @EmilioGagliardi
    @EmilioGagliardi 6 місяців тому +1

    THis was super interesting. Do you have a video that explains your PGVector setup (do you install the database locally or do you have a cloud account)? I'd love to have a setup where I can view my document collections and embeddings in my editor like that. I use VSCode right now, so not sure ... good stuff!

    • @daveebbelaar
      @daveebbelaar  6 місяців тому

      I talk about this near the end of the video

  • @erwinl7794
    @erwinl7794 5 місяців тому

    What about an open source vector store like qdrant?

  • @say.xy_
    @say.xy_ 6 місяців тому

    Hi Dave, I’m also using Pgvector but output are not really that good, could you make a video on improving performance of RAG pipeline in langchain and pgvector, thanks.

  • @touma4659
    @touma4659 Місяць тому

    thank you💖💖

  • @eyemazed
    @eyemazed 6 місяців тому

    thing that bothers me about using postgres for RAG is that the vector search works fine, but its full text search capabilities are severely handicapped. it doesn't support partial or fuzzy matching, so you can't really do a nice reciprocial rank fusion between resources retrieved by multiple channels (vector + full text). i'm going to try ElasticSearch next, as i've previously worked with it and its really good at full text search (TF/IDF, fuzzy search, partial search, stemming...), and the newer versions also support vector search. the downside is having to sync elastic with your main db all the time...

  • @anand-st7mo
    @anand-st7mo 2 місяці тому

    Bro, did you do any indexing?

  • @MaliciousCode-gw5tq
    @MaliciousCode-gw5tq Місяць тому

    I have follow up question if let say 1 chapter of a book total words count is 3k will it be able to store all the 3k words ?

  • @jennymelia
    @jennymelia 5 місяців тому

    LOL dave i was googling if i can use postcres somehow instead of pinecone and your video popped up 🤣🤣👍🏽👍🏽👍🏽 Love it!

    • @daveebbelaar
      @daveebbelaar  5 місяців тому +1

      Haha you're becoming a true engineer Jenny. Those are some pretty serious Google searches haha. Let me know if you need further help!

    • @jennymelia
      @jennymelia 5 місяців тому

      @@daveebbelaar for sure dude! 🤌🏽 trying to get in that coder level 😂😂😂

  • @SigAiOC-ke3ss
    @SigAiOC-ke3ss 6 місяців тому +1

    I didn't fully understood it from the video but are you comparing times between using Pinecone on a remote host vs Postgres ran locally?

    • @daveebbelaar
      @daveebbelaar  6 місяців тому

      Not only processing time (because I know that's not a true fair comparison), but also easy of use and data management.

    • @SigAiOC-ke3ss
      @SigAiOC-ke3ss 6 місяців тому +2

      @@daveebbelaar I get that, but in a production environment it makes a big difference especially when you think of use cases. I would be curious to see a comparison between a cloud hosted postgres and pinecone or,between the locally hosted postgres and something like chroma

  • @izzatirfan2794
    @izzatirfan2794 2 дні тому

    Greatt!! I enjoy watching your video. I have tried to hands-on the code from your GitHub but i am facing an error ModuleNotFoundError: No module named 'pgvector_service'. Then, I tried to pip install pgvector_service but this occured. ERROR: Could not find a version that satisfies the requirement pgvector_service (from versions: none)
    ERROR: No matching distribution found for pgvector_service
    Do you have any ideas how to overcome this?

  • @henkhbit5748
    @henkhbit5748 5 місяців тому

    Thanks for showing pg vector. weaviate is also free and can be run locally using docker. I agree I am for open source.

  • @DanielWeikert
    @DanielWeikert 6 місяців тому

    How do you update the vectorstore (e.g. replace outdated data?
    br

    • @gr8tbigtreehugger
      @gr8tbigtreehugger 6 місяців тому

      Just update the outdated data like you would in any db.

  • @MichaelHoughton_
    @MichaelHoughton_ 6 місяців тому +1

    Could you put the vectors inside fire base ? That’d be epic

    • @3wcdev878
      @3wcdev878 6 місяців тому

      Nope, firbase has a limit, tried it.

    • @MichaelHoughton_
      @MichaelHoughton_ 6 місяців тому

      @@3wcdev878 dang that’s unfortunate

  • @3wcdev878
    @3wcdev878 6 місяців тому

    But you tested it with a small dataset, most relational databases go slower as they grow.

  • @gilbertb99
    @gilbertb99 6 місяців тому

    pinecone is managed isnt it? theres more reasons why enterprises would use and pay for it. For simple side projects, then yeah pgvector locally makes sense.

  • @greendsnow
    @greendsnow 6 місяців тому

    pgvector is the WORST performing vector db according to all comparison charts.
    you need to tell people if you're sponsored by supabase, otherwise this is not ethical.

    • @daveebbelaar
      @daveebbelaar  5 місяців тому +3

      Can you share some more insights on this? And no, I am not sponsored or affiliated with Supabase.