PD_CloudTech
PD_CloudTech
  • 3
  • 63 135
Search your data using OpenAI Embeddings
In this video, we'll take a look at vector embeddings based semantic search, a powerful technique for finding relevant information in large text datasets. We'll explore the fundamentals of natural language processing, explain how vector embeddings represent words and phrases in high-dimensional space, and demonstrate how semantic search can effectively identify related content by measuring the similarity between these embeddings.
Links:
Blog - partee.io/2022/08/11/vector-embeddings/
OpenAI API - platform.openai.com/docs/guides/embeddings/limitations-risks
0:00 - Introduction
1:50 - OpenAI Documentation & common use cases
4:48 - VSCode, loading libraries, utils, basics of vectors
8:03 - Generate embeddings for list of words
9:56 - Cosine similarity function
11:49 - Vectors in 3D space explanation
14:07 - Cosine similarity applied to dataframe
14:35 - Working with longer, realistic documents/text
16:38 - OpenAI synthetic document/knowledge-base generation
17:15 - Semantic search on knowledge-base
Socials:
www.linkedin.com/in/pratheekdevaraj/
patdevaraj?igshid=NTc4MTIwNjQ2YQ==
Переглядів: 12 756

Відео

Connect ChatGPT to your Enterprise Data using Cognitive Search
Переглядів 45 тис.Рік тому
Lets take a look at how we can overcome the challenges of not being able to fine-tune ChatGPT with your own data, and why this approach is not ideal even if it were possible. Using a search service like Azure Cognitive Search allows us to connect and index our data for fast information retrieval, which will be passed on the ChatGPT prompt for query responses based on your data. Link to demo-rep...
Azure OpenAI & ChatGPT overview with demos
Переглядів 5 тис.Рік тому
Azure OpenAI documentation - azure.microsoft.com/en-us/products/cognitive-services/openai-service Request Access - customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu&culture=en-us&country=us Request Access for GPT4 - customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxK...

КОМЕНТАРІ

  • @user-rp9iis1en6h
    @user-rp9iis1en6h 19 днів тому

    Great. I have a question. Can we retrive data based on the meaning of the documents? Does vector search allow that?

  • @louisamakye8561
    @louisamakye8561 10 місяців тому

    great video. where can i find the notebook in the repo

  • @matheahildre4278
    @matheahildre4278 Рік тому

    Has anyone tried something like this with hundreds or thousands of documents? I wonder how well the querying for relevant documents work, especially within a specific domain. Does it manage to find the most relevant documents?

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo Рік тому

    where are embedding getting stored??locally??same directory??

  • @VageeshPrasad
    @VageeshPrasad Рік тому

    Great video. Can you clarify how can we filter the contents of AI response based on specific users and specific documents. For example, user 1 is allowed to view Document 1 and, user 2 is allowed to view Document 3 and user 3 is allowed to view Document 2 and 3. In this situation, each user should be allowed to see responses coming from documents that they are allowed to see.

  • @Idlepug
    @Idlepug Рік тому

    Point of note for enterprise use, be sure to be clear on the Microsoft ToS for Azure OpenAI service and how Microsoft employees can potentially end up reviewing your company data processed in the service if it is flagged by their abuse monitoring service (potential for false positives). Microsoft do make it pretty clear on this in their Azure documentation. There is an exemption form that can be submitted to Microsoft which, if they approve it, will exclude the subscription's OpenAi service from abuse monitoring.

  • @AnnieCushing
    @AnnieCushing Рік тому

    This was a fantastic overview. Thank you!

  • @rathnaprakash86
    @rathnaprakash86 Рік тому

    Great Content :)🙂

  • @TheLeads
    @TheLeads Рік тому

    If I add a new document in the storage will the model identify it and include in the search or I had to again drop and retrain the model?

  • @davidchoi1655
    @davidchoi1655 Рік тому

    This really helps! Thanks.

  • @akashvarkala4040
    @akashvarkala4040 Рік тому

    love your videos

  • @danielsasson3746
    @danielsasson3746 Рік тому

    Amazing video, super clear. It the first time I'm starting to understand these areas. Keep up with the great job 💪

  • @amirbadawi3548
    @amirbadawi3548 Рік тому

    since there is no powershell on mac does that means that this code won't work on arm based chips?

  • @Pure_Science_and_Technology

    Seems expensive. All Microsoft….

  • @magicofjafo
    @magicofjafo Рік тому

    Oh man, I have a project that includes searching a ton of small documents. And this is the exact information I need.

  • @z-ro
    @z-ro Рік тому

    Fantastic video!! Your explanation is so clear and thorough. I keep coming back to this video to get a better understanding of Azure Openai and Cognitive Services. Just one question, when you ran the repo locally, did you have to configure the AZURE_PRINCIPAL_ID in your .env file? I was able to get the repo running locally but the API requests are failing... and I'm wondering if its because I'm missing this environment variable that I don't have access to.

  • @nipunwalia6990
    @nipunwalia6990 Рік тому

    hey there, I was experimenting around this and found out that it cannot analyse the context of data. For eg. I trained it on 10-k of amazon and asked which sector made the most profit, it was unable to answer but if I asked the information with a targeted question like profit earned in automation sector, It gave a accurate answer. Any idea how can I work it out

  • @SQLKC
    @SQLKC Рік тому

    I would like to see a demo using some public database like large city traffic data into ChatGPT. Like some data set with millions of rows, hundreds of tables

  • @AxelJimenezC
    @AxelJimenezC Рік тому

    I tried this architecture but it works terrible for low volume data. To bug it, you only have to change a few words so that the architecture responds as if there were no information in the knowledge database. And the architecture forces us to split the PDF into pages, so there are more errors when the information is contained in two pages (there is no context in the response when you start with some context and the first page and the pdf has two pages). the only way i solved these problems is to pass the information to gpt to make multiple versions of the document with different layout. Or in an easy way you can use Embeddings with llama-index and remove azure search engine which is not the best way to embed data to gpt.

  • @doncristobal33
    @doncristobal33 Рік тому

    Amazing, keep it up!

  • @mahandolatabadi2600
    @mahandolatabadi2600 Рік тому

    Good. Keep making new contents.

  • @VozDelEvangeliio
    @VozDelEvangeliio Рік тому

    As someone else mentioned, having a comprehensive video tutorial on setting up this demo app would be incredibly beneficial. It would provide us with a deeper understanding of the concept and make the process much clearer. Thank you in advance for considering this request!

  • @dreboyle167
    @dreboyle167 Рік тому

    Good content. How do you see this approach scaling in terms of performance? As your database of vectors increases, you’ll need to do cosine comparisons with every entry which will become increasingly memory and processor intensive, and slow. Any thoughts on how to avoid this would be good.

  • @voravitmuensri7327
    @voravitmuensri7327 Рік тому

    Good content but need to improve only the screen recording resolution enchantment. Cheer!

  • @lets-talk-ai
    @lets-talk-ai Рік тому

    great tutorial thanks for sharing

  • @rensimmons
    @rensimmons Рік тому

    Great video. Your screen recordings have too much wasted space while the text is almost unreadable. You should make the text MUCH larger

  • @77netwar
    @77netwar Рік тому

    I like the video and the contents, but I would very appreciate it if someone at Microsoft shows from start to finish how to make a Azure environment, install OpenAI, Cognitive Search and other things that are needed, how to periodically upload my company data from a file server or other location, and how to make a website/webinterface that my company or someone from the outside can access. No offense, but I have seen four Microsoft youtube videos now with the same content that showcase this demo, but nothing goes into depth. I do not understand this. At the moment everybody is trying to race towards the finish line to get such a service running (especially using open source models like MPT-7B and Llama derivatives and interfaces like oogabooga and pipelines like Langchain) and Microsoft claims that they have the pipeline already there, but there is no in depth instructional video. If you make it a course or something, people will be happy to pay, including me.

  • @jimmynguyen3386
    @jimmynguyen3386 Рік тому

    Hi! Great video. I’d like to incorporate Open AI service in my org and I’m looking to hire a consultant to get us up and running. I didn’t see any contact info on your channel but if you’re interested, I’d love to chat

  • @CaioCosta-sf7hn
    @CaioCosta-sf7hn Рік тому

    It would be very helpful if you could make a video tutorial, step by step, setting up this infrastructure in Azure and using your own PDF to customize the model.

  • @adithyaskolavi
    @adithyaskolavi Рік тому

    A really informative video helped out a lot

  • @ScottzPlaylists
    @ScottzPlaylists Рік тому

    I have a gread Idea you could code. Here's my idea for "Offline AI DVR for UA-cam" Have an input folder of Files to Classify by the filenames that are the title of the youtube video called Input_Folder Have a Output_Folders directory, where the Subfolder Names under Output_Folders are the Classes to train a neural net, to specify where the files should be copied to. I already have this setup and lots of youtube videos already manually moved into the Subfolders. This could be used to train the classifier. once trained I'd like to run the classifier on each filename in Input_Folder, the classifier would tell me the folder name to copy it to (I would call it Folder_Destination). Python would copy the file to \{Output_Folders}\{Folder_Destination. This would be an awsome little AI to organize files when you already have started the organizing. I use a nice little tool called "WinX UA-cam Downloader" to get the files I've put in a public folder named like "12"(the Date I started a new download list) as I browse UA-cam, I "Save to Playlist" and put it in "12" to put on my HD Later. WinX even puts them in a folder called "12 in this example. Then when I'm ready to Download and organize I copy the URL of the Playlist, paste it into Winx and it gets all the video files and puts them in "12". Then there's the long process of organizing them into categories on my HD like described above, so I can view Videos by catagory at my leisure. ( I used to be Offline most of the time, there was a method to my madness to get lots of videos for watching later). See all my Playlist names, and you can see I have a lot of interests. I keep the best videos by category on youtube playlists. It would be so nice to have the program I described and it would be the main component in a larger program later that would be like an Offline AI DVR for UA-cam. Thanks for reading this long post, hope you whip up the code as a better programmer than me!

  • @Laughbankrip
    @Laughbankrip Рік тому

    Amazing content. Please continue to create more content like this. You are helping accelerate my career.😊

  • @Laughbankrip
    @Laughbankrip Рік тому

    I recently presented this at an AI hackathon within my company and it was well-received.

  • @Marco_Antonio_CV
    @Marco_Antonio_CV Рік тому

    Excellent video, hope we have more like this soon

  • @derRittervonnebenan
    @derRittervonnebenan Рік тому

    Your tutorials are awesome. Straight to the point and loaded with information

  • @darenbaker4569
    @darenbaker4569 Рік тому

    That was absolutely brilliant it's just clicked I now understand the cosin similarity at last.

  • @aitou96
    @aitou96 Рік тому

    Is it possible that chatgpt/ gpt-4 can give answers to users regarding KPI structure in a dashboard by giving access to the data sources (raw data) like SAP etc. ?

  • @TheWhalzz
    @TheWhalzz Рік тому

    Awesome, thanks

  • @mohammedumarfarooq2372
    @mohammedumarfarooq2372 Рік тому

    How can I create chat bot using azure open ai chatgpt3 plz?

  • @JBuckk
    @JBuckk Рік тому

    This looks interesting. Do you have Experience with langchain aswell?

    • @PD_CloudTech
      @PD_CloudTech Рік тому

      I'll try to make a video on langchain next time :)

    • @JBuckk
      @JBuckk Рік тому

      @@PD_CloudTech all good i was just wondering ^^

    • @vigneshnagaraj7137
      @vigneshnagaraj7137 Рік тому

      @@PD_CloudTech Please make a video on langchain.

  • @albertliu5760
    @albertliu5760 Рік тому

    thanks the update

  • @RaynardL
    @RaynardL Рік тому

    Good to see you doing great things with your knowledge Pratheek!!

  • @albertliu5760
    @albertliu5760 Рік тому

    Very nice introduction. Could you also demo how to build an app using azure openai?

    • @PD_CloudTech
      @PD_CloudTech Рік тому

      Working on a new video that shows how to integrate OpenAI with Cognitive Search, stay tuned!

    • @PD_CloudTech
      @PD_CloudTech Рік тому

      ua-cam.com/video/cTe3VaYqtBU/v-deo.html here you go! new video which you can look into how to spin up your own ChatGPT app connecting to your own data

  • @saint_kendrick
    @saint_kendrick Рік тому

    Excellent content! Would love to see a few time stamp chapters included as well for easy sharing

    • @PD_CloudTech
      @PD_CloudTech Рік тому

      Thank you for the suggestion! I have added the chapters into the video :)