Vector Embeddings Tutorial - Code Your Own AI Assistant with GPT-4 API + LangChain + NLP
Вставка
- Опубліковано 10 лют 2025
- Learn about vector embeddings and how to use them in your machine learning and artificial intelligence projects. Learn how to create an AI assistant with vector embeddings. You'll use OpenAI's GPT-4 API, LangChain, and Natural Language Processing techniques (NLP).
✏️ Course created by @aniakubow
⭐️ Contents ⭐️
⌨️ (00:27) Introduction
⌨️ (01:49) What are vector embeddings?
⌨️ (02:14) Text embeddings
⌨️ (07:58) What are vector embeddings used for?
⌨️ (11:05) How to generate our own text embedding with OpenAI
⌨️ (14:37) Vectors and databases
⌨️ (16:02) Getting our database set up
⌨️ (18:05) Langchain
⌨️ (19:24) Let’s build an Ai Assistant
🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
👾 Oscar Rahnama
--
Learn to code for free and get a developer job: www.freecodeca...
Read hundreds of articles on programming: freecodecamp.o...
❤️ Support for this channel comes from our friends at Scrimba - the coding platform that's reinvented interactive learning: scrimba.com/fr...
So much love for this incredible community! Hope you like this video
Thanks for the work, hope you gain 1 million subs🎉
seriously, you guys have no idea how much I've learned from this channel. The value is incalculable. Thank you so much!
You look like A.i generated model😅
Thank you for all your work 🙏 you are an inspiration 😊 I hope one day I can be as good as you
@@rameezalam1968 😂
This lecture on vector embedding is undoubtedly one of the best I've encountered! Huge thanks to Ania and FCC! Kudos to you all!
do you agree that this legendary channel is better than most paid courses for coding out there?
Having programmed since 1963, it looks like a return to COBOL. Programming today is 90% getting things hooked up. 10% getting things done. 100% being clueless as to what's going on under the covers. Follow the bouncing ball programming.
A few months following their courses and their YT channel. I believe it's as useful (if not more) than my university degree, which took a lot of money and years.
I just completed their certification course in responsive web design and actually learned a lot from any other course and only in 40 days (300-350hours )and everything for free too
what, do you mean there is such thing as a paid course?? I am already 100% clueless. .
@@Jd-zd6bh I said it's free means it is free all courses on their website with certificate
10:51 THIS is a 'golden nugget' right here: "The core advantage of vector embeddings..." Such a great summation of exactly what an ai model really is. Thanks for such a fantastic video. I love it !
Can we stop just for a moment and appreciate her!!! learned loads of thing from you!!! hats off 🎓🎓 ❤❤ love you from Ethiopia
She's an actor. There's tonnes of people behind the scene to write this content.
@maciej2320 wow amazing me too from Ethiopia ❤
How do you know she's creating the content? You learned loads of stuff but is it making your more money?
@@maciej2320 I made this myself :)
@@jay_wright_thats_right I made it myself :)
Ania, that was freaking amazing! You simplified all the concepts without going too high-level and dumbing it down altogether. You told us what happens and showed us HOW they happen. I found this very informative and you answered so many questions I have been pondering. I'm not a developer or an AI person, I'm a network engineer. So, thank you...
By the way, I used an embedding model to map your face, and the semantic engine returned the words "gorgeous," "lovely," "beautiful," etc... 🙂
Wow the instructor for this vid was actually amazing. I only clicked on it because It was 30 minutes long, having no real intention to actually learn and just have play in the background while I read a textbook for fizz. The instructor was phenomenal, I understood everything she said, every instruction was clear to follow although I only really know some JavaScript and cpp. I actually learned a few things. Before the video began like I said no real intentions of implementing this but since I actually learned and understood pretty much all of it I could see myself actually implementing it on some project and adding it to my resume. Would be cool. Thanks
ok simp
🎯 Key Takeaways for quick navigation:
00:00 📘 Anikubo's course covers vector embeddings using OpenAI's GPT-4.
01:49 🖥️ Vector embeddings transform various data types into numeric forms for algorithm processing.
06:12 📈 Numbers can represent complex data, and cosine similarity helps compare them.
08:04 🌐 Embeddings find applications in recommendation systems, NLP tasks, and more.
14:04 🛠️ LangChain, an open-source framework, enhances AI interactions, chaining models and data.
23:25 🛠️ The tutorial walks through setting up a Python environment and key scripting steps.
24:22 ⚙️ Essential packages and tools are installed for AI development.
33:17 🤖 The AI assistant, using vector-based search, fetches relevant documents from a database.
Made with HARPA AI
You're a lifesaver
Really great, she has great presentation skills , to the point which needed demos ...clarity in tone, content and body language...thanks a lot.
Nice introduction to vector embeddings with clear explanation. One of the best lecture/videos encountered on you tube related to this topic.
Simply love this presentation! That vec math (King - man + woman = Queen) just blow my mind!
You have a talent to deliver complex information in a very interesting manner! Waiting for more videos!
Wow. Probably the best lecture to meaningfully explain what vector embeddings are, and how cosine similarity works. Thank you very much!
You're dazzling and wise, a true blend of grace and intellect.
dishing it out like piece of cake,
This girl is an ideal perfect educator!
This is a masterclass!!! Thanks so much! Really appreciated!
Excellent video and Anna did a great job trying to make the topic accessible. Onion news 😢 😂
Fun fact : GPT3 has vector embedding sizes of around 12,888 which is 100 times more than tiny models and 25 times more than normal NLP models.
Dumb question: are they interoperable? Like creating embeddings for a dataset with GPT-3 but then comparing them to a new embedding created by a different model
@@definitive_solutions No, the tensors of GPT-3 and GPT-4 differ significantly in size because the two models have vastly different architectures and scales.
I must say this was the best I mean the best explanation of embeddings vectors and the rest. How the similarity scores were defined and the basic idea behind it I was brilliant. If you read this I want this same content but wihtout needing any cloud services. I cant have 10 different cloud providers to create solutions. Please provide example of opensource local running solution.
Incroyable !
This channel is goldmine
I did this in my Numerical Analysis course using Maple and MatLab. Then i did some analysis on images when i took Fourier analysis. Shame i never got much chance to use it professionally, as i tend to work with financial data.
Those skills are usefull for grad and research
Wow mashallah you are amazing freecodecam we love you from Ethiopia 🇪🇹
I was take certificate of responsive web development that was amazing ❤❤❤
Ania is the best.
I just studying on this...Thanks
thank you. a TIMELY course for my projects!
So well done and well put together! Thanks for the value ❤
@AniaKubow you're great at explaining. The only thing lacking in this otherwise excellent video is Poetry ;)
Great video very detailed 🎉 0:35
Love it !
Great tutorial, thanks! Lol at those answers it was spitting out though...
This was a super helpful tutorial. I have a quick question if you have a sec: what is the role of hugging face in the above tutorail?
Thank you so much for this amazing video! I learnt a lots from it!
Thank you for sharing the information and knowledge.
Thank you for so clearly and articulately presenting these lessons for us for free! Your eyes, your smile, and your beauty all together it is incredibly distracting to me though haha! I mean you are one of the most gorgeous women I've ever seen in my life and I appreciate your time. Sorry to come off like a creep but you're stunning.
Amazing explanation! How could I use an existing Access database for my data set? It actually contains text reports and keywords for each report.
Awesome Video to get started in AI. Any reason why you used datastax instead of a vector DB like pinecone?
Thank you! this was really insightful.
tks a lot amazin video
Brilliant!
very good your class of tecnology
Wow. .Lovely
Thanks for making this video
This is a very good video but I would like to understand why do we need datastax and store in DB if the intension is just to use as prompt and get answer. We can get directly from the OPenAI with key and prompt without storing or anything to do with vector embedding and those will be internal to OpenAI, I wanted to understand the use case of approach.
Valid question
I believe this method is better than fine-tuning and significantly superior to using prompts, especially when you have a lot of information; the chat will provide much better answers.
I think this is more for AI to answer questions on your data..hence she downloaded data from hugging face . But this could also be your own data vectorized..stored in db and queried . I may be wrong but this what I infer.
I'm first woohoo tho I can only write "hello world" 🙂
This came just in time as I just discovered Flowise which is just a code-less LangChain and wanted to play around with long term memory for my models
Thank you for tutorial. does some llms models and tools like chatgpt handles all the tasks related to from storing data to vectorbases and querying relevant data ? doesnt openai provide any database for storing the embedded text so that we used cassandra for this purpose?
Why to use OpenAI instead of some open source code?
Damn. Jay took one for the team learning vector embeddings 💀
Any educational course should always first explain what prerequisites are necessary to understand and learn the course material.
Think its safe to assume if youre here that you know a bit of CS at least.
say you wahnt the computah to scan this for whads...
i appreciate the video, and your accent.
dam now I need the TIME x)
i like the course which is less than 1 hour
Exactly my thought
a git with that would be awesome
download secure bundle is not showning and where to get the client secret id , please help
someone
8:50 this is where I’m stuck in my project. Looking for a way to “inject” rule based text embedding, like an indicator variable, the shows the text belongs to some parent structure. Similarly, want to create custom embedding function that will be highly weighted for certain regex text patterns. Need to guarantee that cos similarly is 100 percent if regex matches. Are text embedding the right approach?
This tutorial is very interesting, except that without a paid account on OpenAI we cannot really put it into practice. But I can't afford to pay 20 euros per month just to set up a tutorial.
You don't need to sign up for OpenAI Plus in order to create an API Key, they are billed separately. You also get a free 3-month API credit when you first create an account, the amount varies, I think they've decreased it now to about $5 (unfortunately I missed out on my credit, since I created my account last year and wasn't coding anything)
@@donaldoalmazan thx for info
actually the 20$ for chatgpt is different from the api, for the api you can for example buy a 10$ credit and use it for as much as you would like, as long as it's used to finetune it will last long
Can we do this for image search? Can we see embeddings of images? Can langchain do that? Thanks
I'm new to this. At 13:18 what terminal is she using to input the code?
That is the simple Terminal. Later she uses Visual Studio Code
It is the linux terminal....curl is a linux command..but if u are on windows ..u may use wsl to use linux terminal on windows
Hello @aniakubow...I am trying to understand how can I access this demo chat app in reactjs? any pointers?
9:50 "...we also use it for information retrieval...": How does it deal with misspellings, either in the query or in the training data?
With a lower score than expected
The model has seen misspellings before, and it knows they are related to the correct spelling more than other words.
Amazing. How could it know? @@technolus5742
Is there a tutorial on doing these using LLaMA?
When we do ask question using the vector database like "what are the biggest questions in science" does it consume token from open api also?
so fine
we have to have a GPT-4 prenium suscription fo follow this course ?
What is the prerequisites of this course?
passion to learn and explore
When calculating cosine similarity, does a value closer to 1 mean more similar and a value closer to -1 mean less similar?
1 indicates an identical vector, or very close semantic meaning, or even identical text. (Note that it's the similarity of the vector's direction only, not scale.) 0 indicates an orthogonal relationship, ie, unrelated semantically. -1 in theory represents complete semantic opposition, but in practice, a perfect -1 is rare in natural language contexts.
Is it possible to connect this to a custom GPT for the openai store?
Hey, I would love you could somehow make a video on bun and scyllaDB, been trying to learn them but theres no source 😥
How do ML models create embeddings for new, or novel, words? For example, what if I fed it "Hexamethylenetetramine" (an organic compound)? My brain is frying thinking about this...
Is python good language to learn dsa?.. because on the internet there are lot of guys who are telling that you should learn java/c++
Is there a repo with the code?
It seems secure bundle (20:36) option is no longer available. Can someone acknowledge it if it is so?
Yeh it seems to have moved location, open your table and on the right side under 'Database Details', 'Region' click the three dots and select download SCB :)
Link to a gist file with the code would be helpful please.
perfect 👌
Somebody having the problem of "You exceeded your current quota, please check your plan"
Embeddings vs Fine tuning?
🎯 Key Takeaways for quick navigation:
00:14 📉 *Sam Altman was fired by OpenAI's board for not being consistently candid in his communications, leading to implications of lying by omission.*
01:11 🤯 *Various theories circulated, including speculation about dangerous AI developments, financial ties with Saudis, and a letter from former employees alleging dishonesty.*
01:53 🌐 *OpenAI employees expressed discontent, with over 500 threatening to quit, potentially joining Microsoft to dominate the AI space.*
02:21 🔄 *After negotiations with Microsoft failed, Altman and Brockman formed a new AI research team at Microsoft, but Altman eventually returned to OpenAI as CEO on November 21st.*
03:02 ❓ *Uncertainty remains about the true reasons behind Altman's firing, with speculation about conflicts of interest, AI commercialization, or a possible publicity stunt involving the board, Microsoft, and Altman.*
Made with HARPA AI
thank you ! sadly you dont go in deep into the needed data...hoy big are the documents etc...but still good thanks!
please guys make video on laravel react js
Great video, but @AniaKubow, if you do not mind, you could have had a very successful career in modelling. Refreshing to see that you chose computing and specifically AI.
4:40 "...Joe is 38 on the 0 to 100 scale... so -.4 on the -1 to 1 scale...": How is that? I get -.24. If it's -.4 on the -1 to 1 scale, that's 30 on the 0 to 100 scale. Please fix my math.
Context has two flavors, near and "not near". Joe is 38% near. Maybe Alice is 40% "not near", which would equate to a negative value (-.4).
So context is more than "this one is like the other", it's also "but it's not like this other thing". If we just used a single dimension, then literally everything would be "like" everything else, which makes it a little difficult to differentiate.
"The school bus is yellow. A banana is yellow. A bus is NOT LIKE a banana." Dimensions in dimensions.
Id like for her to embed my vector
Can non-coder take this course?
help with this error on this part of code anyone?
llm = openai(openai_api_key=OPENAI_API_KEY)
TypeError: 'module' object is not callable
ive creted 4 new api keys but the same error prevails, do we have to pay for the gpt4 model?
"error": {
"message": "You exceeded your current quota, please check your plan and billing details.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_quota"
}
In the beginning or early launch of gpt, open ai gave some free quota to dev account. probably that one has expired with the current email you are using
Make video about godot for unity users
Too bad nowadays openAi won't get you use api key unless you are a paying customer
I was going to say, I cannot run the initial vector embedding program because of billing issues.
I did not understand how the LLM (OpenAI) uses the embeddings stored in the DB.
LangChain: "chaining" resulted in two answers for each prompt: "I don't know" and the headlines. The first answer came from OpenAI's LLM, the second answers (the headlines) came from the vector-DB (Astra/Cassandra) that she set up outside OpenAI. LangChain was the bridge between the two.
It's a simple little example without much relevance, but it shows the bones. There is a lot more work to make something useful.
For instance, you could use a pre-trained LLM to perform the organizational tasks and composition (the language parts) using current data from a real-time source. For instance, "what kind of activities would be good at Los Angeles beaches today?"
The LLM could contextualize the meaning of the question using pre-trained data (an understanding of what constitutes beach sports and the condition necessary for each is something that won't much change over time), and then use an external source (weather channels, surf sites, diving data sites, sailing sites, etc.) to search for real-time conditions at LA Baches. The LLM - using the current real-time data - could then look for nearness matches based on how the conditions match up to certain sports.
So instead of a generic pre-trained answer like, "people like to sunbathe, swim, dive and surf at beaches" you can get a specific answer such as, "The conditions at Redondo Beach suggest it's a good surf day, but rip tide warnings suggest it is a bad day for people to be swimming. There is a water quality alert for bacteria in Santa Monica Bay."
The LLM used the external real-time data to give accurate point-in-time suggestions that it would otherwise never have using data from training months earlier. That's where LangChain can help - merging "new" or custom data into the pre-trained contextual LLM model.
Hope this helped.
Vectors of 2704 table matrix
does someone knows cheaper alternatives to OpenAI GPT's API ?
Google bard API
It made me really laugh how she speaks about word and calls it "text". LOL
lol i am not able to find api keys and lclient secret keys for astra
i don't code don't how to, i am here due to the thumbnail
How to use chat GPT API key
did anyone run into this error
llm = openai(openai_api_key=OPEN_API_KEY)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'module' object is not callable
She doesn't explain the basics. What is the terminal that she uses?
you can use a vscode terminal
@@chidiebere I'm using VS Code and I'm getting errors about the API key I created. Is there a way to validate a key?
👸🌟💝🌹💕🌹🌹🌟🌟