"How to give GPT my business knowledge?" - Knowledge embedding 101

Поділитися
Вставка
  • Опубліковано 22 чер 2024
  • A step by step guide on how to create your own knowledge base embedding, from prep knowledge data to retrieval augmented generation
    🔗 Links
    - Follow me on twitter: / jasonzhou1993
    - Join my AI email list: www.ai-jason.com/
    - My discord: / discord
    - Finetune LLM video: • "okay, but I want GPT ...
    - No code alternative: relevanceai.com/
    - Github repo: github.com/JayZeeDesign/Knowl...
    ⏱️ Timestamps
    0:00 What is Knowledge embedding?
    4:21 Core business use cases
    5:52 Step1 Prep knowledge data
    6:25 Step2 Create embedding
    8:34 Step3 Similarity search
    9:55 Step4 Retrieval augmented generation (RAG)
    12:23 Step5 Deploy
    14:49 No code alternatives
    👋🏻 About Me
    My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
    #gpt #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #langchain #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #embedding #openaiembeddings #wordembeddings
  • Наука та технологія

КОМЕНТАРІ • 312

  • @AIJasonZ
    @AIJasonZ  10 місяців тому +57

    A few people asked “why only vectorise one column instead of the whole csv?”
    Adding a few more explanation here:
    So vectorise is mainly for search, and the column to vectorise can be considered as “index” or “id” of the dataset; while the data it return will still be in question/answer pair;
    The reason I want to vectorise only one column is because:
    1. It save cost - vectorise using embedding model which means every token we vectorise generate cost
    2. It increase accuracy, in this case I want to only search for past customer email instead of sales response; search both column might return wrong answer “e.g. search for “interested in learning more”, it can return pair: “client: stop sending me emails; sales: understood, let us know if you are interested in learning more in future!”
    Hope this help!

    • @ozfish17
      @ozfish17 10 місяців тому +1

      It seems Embedding enriches your search query. how about answers? In your example, do you 'train' llm with Q&A pair?

    • @AIJasonZ
      @AIJasonZ  10 місяців тому +1

      @@ozfish17 yep, it return both Q&A pair!

    • @Taskade
      @Taskade 9 місяців тому +1

      Jason, brilliant step-by-step guide on knowledge embedding! Your breakdown of the process was super insightful. I'm curious about how AI Agents in Langchain perform, especially in long-running scenarios. Hope you'll consider diving into that topic in the future. Keep up the stellar content!

    • @sandeepbansal1195
      @sandeepbansal1195 8 місяців тому

      So if you want the output response email to be generated by the LLM based on a specific tone, why wouldn't the 2nd column be a part of vectorizing the dataset?

    • @csss142
      @csss142 7 місяців тому

      Hey Jason! What would be the best way to do this with financial PDFs? I want to ask questions and get accurate insights from the large documents. Would using embeddings be best or the fine tuning from your other video? Thanks! @AIJasonZ

  • @psychxx7146
    @psychxx7146 11 місяців тому +32

    Small channels like this are the ones that hold the most values.

  • @Helpsmallbusinesses
    @Helpsmallbusinesses 10 місяців тому +94

    In 2 minutes and 54 seconds you explained what is vectoring better than any other video online. You made it easy. Thank you!

  • @SaminYasar_
    @SaminYasar_ 11 місяців тому +4

    Keep it up man probably one of the only channels with incredible value

  • @_arman_
    @_arman_ 8 місяців тому +5

    Man... you have a serious gift for teaching! This is super helpful. Thanks.

  • @fuxxs5994
    @fuxxs5994 11 місяців тому +20

    I really love your style, first explaining the theory and then demonstrating it by an example

  • @muhammadanasazambhatti2772
    @muhammadanasazambhatti2772 10 місяців тому +6

    Thank you very much! Nobody explained Embedding and Vectorization like this! Thank you again!

  • @shivamroy1775
    @shivamroy1775 10 місяців тому +8

    Absolutely great video, I loved that you took the time to explain everything in theory and then went on to give a detailed walkthrough of the code. Please keep posting such videos !

  • @sidavidsin
    @sidavidsin 11 місяців тому +27

    Thank for sharing your knowledge with us, your channel is literally a gold mine of information. Keep doing what you doing, Jason!

  • @davidkwon1233
    @davidkwon1233 11 місяців тому +3

    one of the best channels out there, really appreciate your content!

  • @Optable
    @Optable 10 місяців тому +2

    You have helped the community so much with this valuable content. Keep it up my friend, i'll be watching!

  • @funkyboodah
    @funkyboodah 4 місяці тому +3

    man you have a really rare ability to explain super complicated things in a very simple way and organize the information so it's even more clear. Bravo and thank you

  • @humadi2001
    @humadi2001 3 місяці тому +1

    I've watched many video on this topic and I can say that your simple examples has covered most of what I need to know. Thanks Jason.

  • @half_way_expert
    @half_way_expert 11 місяців тому +3

    Another great video! Thanks Jason, keep up the excellent work

  • @koen.mortier_fitchen
    @koen.mortier_fitchen 11 місяців тому +2

    Thanks for your work Jason. You're one of the best, and I follow tons.

  • @stepkurniawan
    @stepkurniawan 11 місяців тому +2

    yo bro.. i really like when you explain all the step-by-step and all relevant tools out there! thank you!

  • @JJ-vq8mu
    @JJ-vq8mu 9 місяців тому +2

    Great job and appreciate a lot on sharing your knowledge. Looking forward for Open LLM content.

  • @normanluismadrid422
    @normanluismadrid422 9 місяців тому +3

    this is virtual gold, mad props to jason for clearly describing complex topics and even showing practical application, saved me hours of research lol, it'd be great if you can touch up on the various services out there that offer AI services that embed, and how they compare in performance, pros / cons etc.

  • @michalf16
    @michalf16 11 місяців тому +1

    Love your content good sir, tuned for all next videos you are the leader

  • @jasonfinance
    @jasonfinance 11 місяців тому +3

    the best video about embedding ive seen; thank you!

  • @TheDestint
    @TheDestint 9 місяців тому +2

    This is super duper helpful man ! Great work and thanks !

  • @christhornham
    @christhornham 2 місяці тому

    Outstanding. Your ability to explain complicated topics is incredible. Thank you.

  • @wojpaw5362
    @wojpaw5362 7 місяців тому

    Absolutely outstanding. I liked, subscribed and shared. Best explanation of knowledge embedding I have come across!!!!

  • @PlectrumShorts
    @PlectrumShorts 10 місяців тому +2

    Great tutorial! You covered a LOT of ground quickly, but thoroughly. Haha. Nice work.

  • @devinoutfleet1998
    @devinoutfleet1998 8 місяців тому +1

    Bro... you are incredibly smart and are a great teacher. This is going to provide 10x value to my users

  • @verasalem5071
    @verasalem5071 11 місяців тому +34

    Love your content, very easy to digest and understand. The only recommendation I would give is to use other embeddings and LLM models besides OpenAI. Mid/Large sized companies cannot use OpenAI in their environment because of legal issues around OpenAIs data retention policy. Alot of companies want to develop their own implementations so including other models like Llama 2, Vicuna, etc would allow you to reach a bigger audience.

    • @AIJasonZ
      @AIJasonZ  10 місяців тому +4

      yea great points, thanks for the recommendation! totally get that company dont want to send any data to OpenAI LOL

    • @Ascended23
      @Ascended23 10 місяців тому +2

      +1 for using more open models. I love your content and the approach you take to your videos. But even though I'm not a big company I just value using systems that are open instead of closed.

  • @pietdebeer7972
    @pietdebeer7972 10 місяців тому +1

    I'm blown away. Thank you!!

  • @growthub8541
    @growthub8541 10 місяців тому +3

    So helpful! I started using relevance ai because of your videos & just as a no-code developer been able to build some sick ass LLM chains with Zapier Custom HTTP Requests.
    I have my development team even using it & it’s definitely speeding up our velocity to iterate🙌🔥

    • @AIJasonZ
      @AIJasonZ  10 місяців тому

      thats great to hear! 🤘

  • @aliq6709
    @aliq6709 2 місяці тому

    This was super helpful. Thank you, Jason!

  • @ridg2806
    @ridg2806 6 місяців тому

    Really high quality content, thank you Jason!

  • @gautamdawar5067
    @gautamdawar5067 10 місяців тому +1

    This is pure gold. Thank you so much!

  • @averagegamer9513
    @averagegamer9513 11 місяців тому +26

    Great video as always, Jason. Thank you for making one of the few channels with genuine AI tools video that actually demonstrate implementation and applications rather than hyping up the content through sweet talk then simply dropping an affiliate link.

    • @senxo.visuals
      @senxo.visuals 11 місяців тому +4

      This! I feel so grateful that the UA-cam algorithm blessed me with Jason's channel. Beautiful explanations and clear steps.

    • @koen.mortier_fitchen
      @koen.mortier_fitchen 11 місяців тому +1

      Yeah, he's one if the real ones. I've asked him if he could add a github for the code. It's the only thing this channel lacks imo.

    • @frankchangshow
      @frankchangshow 8 місяців тому

      @@senxo.visualssame feelings here

  • @stevi32800
    @stevi32800 10 місяців тому +2

    I really like your video. You knows how to reach the people attention. Please make more videos like this 😊

  • @kylelau1329
    @kylelau1329 10 місяців тому

    have been waiting for this video, Thank you!

  • @aliyousefi9735
    @aliyousefi9735 10 місяців тому +1

    you're the man Jason, great content!

  • @king94596511
    @king94596511 10 місяців тому +2

    The video is very inspiring and straightforward, a valuable lesson

  • @manideepatalukdar9201
    @manideepatalukdar9201 10 місяців тому +1

    Great video! Very simple to understand.

  • @robertcormia7970
    @robertcormia7970 5 місяців тому

    Very well done! Straightforward to follow!

  • @manojnaidu619
    @manojnaidu619 Місяць тому

    Cannot be more valuable than this. Loved it 🎉

  • @nguyenvanduc2000
    @nguyenvanduc2000 Місяць тому

    I have the same idea in mind. I have tons of product documents that I wish I could just ask an agent something about it instead of scrolling hundreds of word pages. I really appreciate your video man.

  • @VaibhavShewale
    @VaibhavShewale 10 місяців тому +1

    this is just awesome, now people who didnt had idea now dont only have idea but also reference

  • @rahuliyer6007
    @rahuliyer6007 7 місяців тому

    Came here after the fine tune model video - looking for exactly this. Thanks!

  • @kylearnold9647
    @kylearnold9647 8 місяців тому

    Thank you! This was incredibly helpful

  • @kurtcampher4716
    @kurtcampher4716 9 місяців тому

    thank you for this
    As a dev with no AI experience, you really make it easy to understand

  • @AssassinUK
    @AssassinUK 11 місяців тому +2

    This was 🔥🔥🔥. If I hadn't already subscribed, I would have. Excellent use case! Looking to impliment this using Flowise.

  • @scratch123
    @scratch123 8 місяців тому

    Thanks Jason this was a great tutorial! :)

  • @xulipaTV
    @xulipaTV 11 місяців тому +1

    You are the man Jason!

  • @farid3101
    @farid3101 3 місяці тому

    I am really surprised that these tools can help so many businesses doing the low-cost and autonomous response specifically for customer service! Great video!

  • @naimneman
    @naimneman 11 місяців тому +2

    Amazing video Jason! Pretty useful information. I would love to see a video about GPT4All as a personal assistance for everyday life.

  • @AlessaOxygen-ot4rl
    @AlessaOxygen-ot4rl 5 місяців тому

    This is hilariously good. Thanks for this wonderful ressource!

  • @ivant_true
    @ivant_true 2 місяці тому

    you make really useful videos man

  • @NickWatching
    @NickWatching 4 місяці тому

    Amazing explanations, thank you!

  • @chrisvienneau3366
    @chrisvienneau3366 10 місяців тому +1

    Great content and love the intros

  • @davide.2349
    @davide.2349 11 місяців тому +1

    Jason you are awesome!

  • @davidwylie8491
    @davidwylie8491 11 місяців тому +1

    Amazing! Thanks for sharing

  • @karankanchetty8320
    @karankanchetty8320 3 місяці тому

    Great job. You deserve more subscribers.

  • @shethromesh
    @shethromesh 10 місяців тому

    Loved to see similar demo of knowledge search with open source models not with openai models

  • @photon2724
    @photon2724 11 місяців тому +19

    Anyone looking to make a great startup in AI,you have to jump on this!

    • @i_forget
      @i_forget 10 місяців тому

      Working on it!

    • @dragoon347
      @dragoon347 10 місяців тому

      Working on it now

  • @patriciodiaz2377
    @patriciodiaz2377 9 місяців тому +1

    Thanks a lot for the info!! Greetings from Mexico 🤙

  • @markieuanroberts
    @markieuanroberts 7 місяців тому +1

    Awesome explanation, thanks.

  • @MrDe0
    @MrDe0 9 місяців тому +1

    This is GOLD !!
    Thank You !

  • @YangYang-rh8uy
    @YangYang-rh8uy 3 місяці тому

    Exactly want I want , thanks Jason.

  • @adi2hot
    @adi2hot 10 місяців тому +1

    Fantastic content, thank you.

  • @coldestlin
    @coldestlin 4 місяці тому

    当中间向量查询的结果出来, 一下子就了解了整个流程, 非常赞. 原来是拿向量查询的结果, 再去扔给llm, 当作promt instruction, 然后让llm给出答案.

  • @CyberSQUID9000
    @CyberSQUID9000 10 місяців тому +1

    More excellent content, thanks mate

  • @takeshikriang
    @takeshikriang 10 місяців тому +1

    Great video, subscribed.

  • @arunkabilan
    @arunkabilan 10 місяців тому +1

    Great explanation

  • @tauraik
    @tauraik 11 місяців тому +1

    Amazing content my guy Amazing

  • @SS-rt8oo
    @SS-rt8oo 4 місяці тому

    Great video, thank you

  • @camach28
    @camach28 11 місяців тому +8

    It would be amazing if you could make a video creating a knowledge base using long pdfs as source,, and use gpt as well to make an expert assistant in a topic.

    • @frankchangshow
      @frankchangshow 8 місяців тому

      Yes like if the data source is like a book and we want to search the contents in it giving relative data like “I remember this part of the book saying something like this… where was it?” … or “the book had this story … where was it and the main ideas”

  • @shrvn110
    @shrvn110 11 місяців тому +2

    this dude is on FIRE 🔥

  • @tahunal
    @tahunal 11 місяців тому +1

    Bro you are awesome.

  • @alvropena
    @alvropena 10 місяців тому

    Thank you for sharing!

  • @oscarcharliezulu
    @oscarcharliezulu 10 місяців тому

    Excellent vid thank you !

  • @user-nt2fs7qp6c
    @user-nt2fs7qp6c 8 місяців тому

    this is the best video on your channel.

  • @gkennedy_aiforsocialbenefit
    @gkennedy_aiforsocialbenefit 11 місяців тому +4

    Hey Jason thanks for the always excellent presentations and information. The Streamlit and RelevanceAI information were interesting and useful. Relevance reminds me of another great product, Flowise.

    • @frankchangshow
      @frankchangshow 8 місяців тому

      I don’t know if I should use stack ai, relevance ai, or flow wise. Going into decision fatigue now

  • @maciejbalasinski2419
    @maciejbalasinski2419 10 місяців тому +1

    Thanks for No coding alteratives

  • @ludwigvanbeethoven61
    @ludwigvanbeethoven61 10 місяців тому

    I wonder why those AI channels, like yours, are not exploding. This is so important for the future what you all are doing. Only a few people get this!

  • @user-gv6ek5tg2f
    @user-gv6ek5tg2f 5 місяців тому

    Dude. You. Are. Awesome!

  • @KarlJuhl
    @KarlJuhl 10 місяців тому +3

    Great resources Jason, I will add to the flood of comments - you are a great communicator and you move at a good speed. Thanks for sharing!
    It is interesting how many langchain UI apps are being built. Relevance AI looks to be the most integrated from end to end, with such an easy deploy process.
    I am curious to know your thoughts on using a UI tool like flowise or relevance AI versus custom programming.

  • @lesteroliver911
    @lesteroliver911 11 місяців тому +1

    you are just amazing :)

  • @AI_Ron
    @AI_Ron 11 місяців тому +2

    These are gems

  • @Artificial_Noob
    @Artificial_Noob 11 місяців тому +7

    Great video man! I hope you can cover more "No Code Methods" for beginners like me that are not very technical! The last part of this video was GOLD for me. cheers!

  • @ayusharora2019
    @ayusharora2019 11 місяців тому +1

    Amazing!!

  • @andrzejpec4886
    @andrzejpec4886 10 місяців тому +1

    Big thank you ❤

  • @Gingeey23
    @Gingeey23 11 місяців тому +8

    Great video Jason, however the biggest challenge for companies will be ensuring that commercially sensitive information isn't fed into hosted LLM models due to security concerns. Would be really interested to see how you would approach this challenge, and potentially try to deploy this tool locally? keep up the good work!

    • @AIJasonZ
      @AIJasonZ  11 місяців тому +6

      Thanks mate! Yea I agree, I heard business talk about sensitive information a lot, especially ones with clients data;
      There are 2 ways I see it can be solved now:
      1. Self host LLM, using Azure self host version or even using open source models; so you don’t send info to openai
      2. Anonymoulyse your input/output data, so openai don’t have a clear idea that data A is from company A;

    • @senxo.visuals
      @senxo.visuals 11 місяців тому +2

      If using hosted LLM like OpenAI's this would probably 1. require just a lot of manual work with clearing all the data or 2. first pushing the data through lighter local LLM with a task to clear any sensitive information (like they used one LLM to create training prompts for another LLM). Just a thought, tho

  • @user-ps3jj1ey5k
    @user-ps3jj1ey5k 10 місяців тому

    解釋得非常清楚

  • @sameergamer4567
    @sameergamer4567 9 місяців тому

    Great video

  • @kiraakamaru
    @kiraakamaru 11 місяців тому +4

    This is exactly what I was looking for, I have a question Jason: How can we secure our company personal data?

  • @nealshah5874
    @nealshah5874 22 дні тому

    This is the greatest video ever created

  • @fenderbender2096
    @fenderbender2096 9 місяців тому +1

    Very nice video.

  • @desiderata2745
    @desiderata2745 10 місяців тому +1

    Thanks!

  • @slimyelow
    @slimyelow 9 місяців тому

    Excellent.

  • @___Madara__
    @___Madara__ 5 місяців тому

    top tier content!!!!

  • @kalifalau1455
    @kalifalau1455 5 місяців тому

    very helpful!

  • @rverm1000
    @rverm1000 9 місяців тому +1

    great video! is that enough info to go out and start building a customer response ai for other people or businesses?

  • @Grumptr0nix
    @Grumptr0nix 7 місяців тому

    This is exactly what I was looking for... I have a tremendous amount of assets (Requirements docs, project plans, etc) that we've created over and over for all our engagements, and I'm trying to find a way for us to stop reinventing the wheel. All of which are in our Google Drive, but I'm having trouble conceptualizing how I'd be able to turn that into vectored data (you talk about text splitter, but I'm still a bit confused about its application). Anyways, I'll do more research but this is amazing content. Thank you.

    • @Grumptr0nix
      @Grumptr0nix 7 місяців тому +1

      And for sure, the legal issues with our business data and OpenAI that is discussed in other comments have been a blocker for us as well, but at least there's options.

  • @satyamgupta2182
    @satyamgupta2182 11 місяців тому +2

    Thank you so much for your video. Its very helpful.
    At the same time, is there a way to run this with Llama-2 or other open source LLM's?
    Edit: If security is my main concern, how do I go about embedding?

  • @vb7913
    @vb7913 8 місяців тому

    Hi Jason, fantastic video! So if i understand correctly, this whole concept works purely on the quality of your examples and more importantly how your prompt is structured, as the prompt contains instructions, input and examples ?

  • @ozfish17
    @ozfish17 10 місяців тому +1

    Great video Jason! In the sample you shared, does the llm get trained every time you have a new message? Or you train it once, then you can ask multiple questions?

    • @bobwilkinson8053
      @bobwilkinson8053 9 місяців тому

      I have the same question. Did you find the answer?