Advanced RAG with Knowledge Graphs (Neo4J demo)

Johannes Jolkkonen | Funktio AI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025

КОМЕНТАРІ • 84

@rmt3589 3 місяці тому ⁺²
This is basically what I've wanted to do for years. (Originally GPT-2 & Neo4j) Haven't gotten far, but keep returning to it. So glad someone else had the same idea!
@w_chadly Рік тому ⁺⁴
this is incredible! I see so many use cases opening up. thank you for sharing this!
@timothyspottering 11 місяців тому ⁺¹
This concept & this video are truly amazing.
I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!!
Great work - thank you.
@SafetyLabsInc_ca Рік тому ⁺²
This is a great video.
It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.
@itsdavidmora 8 місяців тому
Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.
@jonathancooper7068 11 місяців тому
Very nice demo. It showed why and how to use the graph database for RAG and answered questions that I came up with while watching.
@NLPprompter Рік тому ⁺⁵
I love when data engineers making videos it's so easy to understand side look even the description is structured 👍
@alchemication Рік тому ⁺²
Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!
@ePreneurs 2 місяці тому
wow that's great!
@chrisogonas 9 місяців тому ⁺¹
Well illustrated! Thanks
@michaeldoyle4222 Рік тому
great content and delivery - love your work
@MuhammedBasil Рік тому ⁺¹
Wow. This is amazing
@AssassinUK Рік тому ⁺⁴
I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.
@WisherTheKing Рік тому ⁺¹
Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc.
P.S. the music is great 😂
@evetsnilrac9689 Рік тому ⁺¹
Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits?
Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.
@johannesjolkkonen Рік тому ⁺¹
Hey Steve, thank you!
You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years.
I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs
@evetsnilrac9689 Рік тому ⁺⁹
@@johannesjolkkonen Here's my summary of the key points of Denny's presentation.
• LLMs are expensive to train
• LLMs are expensive to run inference responses.
• LLMs can’t be trusted to correctly output accurate facts.
o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime.
• Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language.
• They are not very good at math and it would be economically inappropriate to use them for math computation
• Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should.
• It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed.
In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable.
• We don't want to machine learn Obama's place of birth every time we need it.
• We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe.
The knowledge graph provides you with the ground truth for your LLMs.
• LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two.
• They can be an amazing tool to speed up the creation of a knowledge graph.
• We want to extract knowledge into a symbolic form. We want the system to overfit for truth.
• And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.
@wdonno Рік тому
@@evetsnilrac9689, such a helpful summary! Thank you!
@_jen_z_ Рік тому ⁺¹
Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"
@johannesjolkkonen Рік тому ⁺¹
Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet
@legttv97 24 дні тому
Thank you
@johannesjolkkonen Рік тому ⁺¹⁵
Hey everybody, thanks for the great comments!
Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html
@italoaguiar 6 місяців тому
Awesome!! Thanks!! 👏👏👏
@quansun8245 Рік тому ⁺¹
A really awesome video Johannes, wondering if there is a github repo for this? Thanks.
@chabo26 7 місяців тому
Great demo on learning neo4j and LLM. In a typical RAG, vector database is created for the documents, how does it work for neo4j graph db?
@johannesjolkkonen 7 місяців тому
Thanks! As well as the multi-hop searches I talk about here, you can also use neo4j for storing vector-representations of the nodes and the text-content, and search based on node similarity and such.
@chenzhong1182 Рік тому ⁺¹
That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss
@antoninleroy3863 Рік тому
very interesting thanks !
@Epistemophilos Рік тому ⁺¹
Around 5:45, how does the LLM combine the graph search with "normal" LLM generation? What happens behind the scenes?
@johannesjolkkonen Рік тому ⁺¹
Hey! I show that part in detail in my latest video, here: ua-cam.com/video/Kla1c_p5v0w/v-deo.html
@engage-meta Рік тому
Good présentation. Thank you!
@98f5 Рік тому ⁺¹
How can that generate useful relationship triples when you can only give small subsets of the data to the LLM at a time?
@johannesjolkkonen Рік тому ⁺¹
Hey, good question. Two points:
- We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once.
- The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time.
And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph.
This can be different case-by-case, of course 🙂
@inflationking1271 Рік тому ⁺¹
I would be curious about your view on when vector seach is better suited than graph search for RAG. Thanks for this great video! It helps a lot
@johannesjolkkonen Рік тому ⁺¹
Thank you!
Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary.
Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!
@Epistemophilos Рік тому ⁺²
Fabulous video, thanks! Would be even better with no music, or at least if it was very much lower volume :)
@thcookieh 11 місяців тому
Why not using both KG with Vector embeddings?
@jingqiwu2865 Рік тому
very nice and inspiring. QQ: if gpt4 created incorrect cipher, do we try to detect and auto fix/retry?
@johannesjolkkonen Рік тому
Thank you!
You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved
@mostlazydisciplinedperson Рік тому
thank you for video
@zaursamedov8906 Рік тому ⁺¹
will u be able to share the prompts and code snippets?
@johannesjolkkonen Рік тому ⁺⁴
The repo is still a work in progress, but I'm planning to make a video soon where I share and walk through the code in more detail!
@mtprovasti Рік тому
Mahtavaa, ajattelin soveltaa tämmöistä ihan perinteiseen hierarkiseen taksonomiaan. Odotan innolla.
@shaunjohann Рік тому
@@johannesjolkkonenthat's great to hear! i'm working on a project that needed to hear some of what you said
@johannesjolkkonen Рік тому
A full video-walkthrough is now live here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html
Repository link included (:
@SDGwynn Рік тому ⁺¹
More info please.
@tomgiannulli911 Рік тому
Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?
@johannesjolkkonen Рік тому
Hey, I haven't added such metadata but that's a great idea!
For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.
@phmfthacim 8 місяців тому
I like the music
@bartoszko4028 9 місяців тому
Is it better in some way than using SQL db and relations based on for example sql schemas etc. which also can be easily used when doing retrieval?
@Jeremy-bd2yx Рік тому
When the text to cypher conversion happens, how does the LLM know how the nodes/edges are labeled and therefore able to accurately write the query?
@johannesjolkkonen Рік тому ⁺¹
Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query.
For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.
@Jeremy-bd2yx Рік тому
@@johannesjolkkonen thank you! watching now!
@infinit854 Рік тому
How does the chat interface communicate with the database? Is it based on prompts that create cypher queries?
@pierrebonnet2026 Рік тому
Nice!
@CreativityCourse 10 місяців тому
Hey great video , do you have the code on repo?
@johannesjolkkonen 10 місяців тому
Thanks! Yes I do, you can find a more detailed tutorial on my channel which also has the link to the repo (:
@AdamLorentzen Рік тому ⁺²
I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)
@sakinamosavi1104 Рік тому ⁺³
I am very excited to see how your code works. Please share your solution.
@parvesh8227 7 місяців тому
Darn!! I have been working on something similar, slightly different approach
@layanetaiwi7864 5 місяців тому
The video is great, thank you, but the background music made it difficult for me to focus :(
@Noneofyourbusiness2000 8 місяців тому
I really don't see how this is any different than a typical database with more columns. For example:
Sort by company
Lookup Azure
Next sort by number of projects
Lookup employee
@jeremyhofmann7034 5 місяців тому
You can keep enhancing the graph with additional data at runtime and not have to change the schema (design time).
@AEVMU 11 місяців тому
Gotta look at decentrlaized knowledge graphs. Those are the future of RAG databases.
@openyard 8 місяців тому
I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without.
To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.
@clamhammer2463 3 місяці тому
That's a huge ask. Essentially doubling his workload.
@labloke5020 Рік тому ⁺⁶¹
Please do not use music when creating future videos.
@johannesjolkkonen Рік тому ⁺⁸
Hey, thanks for the feedback. I'll keep that in mind!
@UlrikStreetPoulsen Рік тому ⁺²
Agreed, that's really off-putting
@infinit854 Рік тому ⁺¹⁵
I enjoyed the music 👍
@NLPprompter Рік тому ⁺³
agree but you can use music in between pause but not when you re not talking..
@itslordquas Рік тому ⁺¹¹
bro what about a "thank you for the amazing info" before nitpicking? 😂
@podunkman2709 11 місяців тому
Presentation about nothing. How to build that required
@johannesjolkkonen 11 місяців тому
Hey, I also have a full tutorial on this here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html&lc=UgyOfLtgIOQyEu2zmMF4AaABAg 🙂
@openyard 8 місяців тому
Yes the background music is distracting and annoying.
@mcpduk 11 місяців тому ⁺¹
excellent video - but the music ...... please no.........

Наступне

Автоматичне відтворення