This is basically what I've wanted to do for years. (Originally GPT-2 & Neo4j) Haven't gotten far, but keep returning to it. So glad someone else had the same idea!
This concept & this video are truly amazing. I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!! Great work - thank you.
This is a great video. It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.
Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.
Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!
I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.
Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc. P.S. the music is great 😂
Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits? Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.
Hey Steve, thank you! You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years. I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs
@@johannesjolkkonen Here's my summary of the key points of Denny's presentation. • LLMs are expensive to train • LLMs are expensive to run inference responses. • LLMs can’t be trusted to correctly output accurate facts. o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime. • Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language. • They are not very good at math and it would be economically inappropriate to use them for math computation • Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should. • It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed. In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable. • We don't want to machine learn Obama's place of birth every time we need it. • We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe. The knowledge graph provides you with the ground truth for your LLMs. • LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two. • They can be an amazing tool to speed up the creation of a knowledge graph. • We want to extract knowledge into a symbolic form. We want the system to overfit for truth. • And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.
Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"
Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet
Hey everybody, thanks for the great comments! Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html
Thanks! As well as the multi-hop searches I talk about here, you can also use neo4j for storing vector-representations of the nodes and the text-content, and search based on node similarity and such.
That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss
Hey, good question. Two points: - We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once. - The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time. And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph. This can be different case-by-case, of course 🙂
Thank you! Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary. Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!
Thank you! You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved
Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?
Hey, I haven't added such metadata but that's a great idea! For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.
Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query. For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.
I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)
I really don't see how this is any different than a typical database with more columns. For example: Sort by company Lookup Azure Next sort by number of projects Lookup employee
I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without. To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.
This is basically what I've wanted to do for years. (Originally GPT-2 & Neo4j) Haven't gotten far, but keep returning to it. So glad someone else had the same idea!
this is incredible! I see so many use cases opening up. thank you for sharing this!
This concept & this video are truly amazing.
I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!!
Great work - thank you.
This is a great video.
It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.
Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.
Very nice demo. It showed why and how to use the graph database for RAG and answered questions that I came up with while watching.
I love when data engineers making videos it's so easy to understand side look even the description is structured 👍
Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!
wow that's great!
Well illustrated! Thanks
great content and delivery - love your work
Wow. This is amazing
I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.
Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc.
P.S. the music is great 😂
Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits?
Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.
Hey Steve, thank you!
You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years.
I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs
@@johannesjolkkonen Here's my summary of the key points of Denny's presentation.
• LLMs are expensive to train
• LLMs are expensive to run inference responses.
• LLMs can’t be trusted to correctly output accurate facts.
o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime.
• Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language.
• They are not very good at math and it would be economically inappropriate to use them for math computation
• Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should.
• It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed.
In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable.
• We don't want to machine learn Obama's place of birth every time we need it.
• We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe.
The knowledge graph provides you with the ground truth for your LLMs.
• LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two.
• They can be an amazing tool to speed up the creation of a knowledge graph.
• We want to extract knowledge into a symbolic form. We want the system to overfit for truth.
• And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.
@@evetsnilrac9689, such a helpful summary! Thank you!
Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"
Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet
Thank you
Hey everybody, thanks for the great comments!
Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html
Awesome!! Thanks!! 👏👏👏
A really awesome video Johannes, wondering if there is a github repo for this? Thanks.
Great demo on learning neo4j and LLM. In a typical RAG, vector database is created for the documents, how does it work for neo4j graph db?
Thanks! As well as the multi-hop searches I talk about here, you can also use neo4j for storing vector-representations of the nodes and the text-content, and search based on node similarity and such.
That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss
very interesting thanks !
Around 5:45, how does the LLM combine the graph search with "normal" LLM generation? What happens behind the scenes?
Hey! I show that part in detail in my latest video, here: ua-cam.com/video/Kla1c_p5v0w/v-deo.html
Good présentation. Thank you!
How can that generate useful relationship triples when you can only give small subsets of the data to the LLM at a time?
Hey, good question. Two points:
- We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once.
- The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time.
And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph.
This can be different case-by-case, of course 🙂
I would be curious about your view on when vector seach is better suited than graph search for RAG. Thanks for this great video! It helps a lot
Thank you!
Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary.
Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!
Fabulous video, thanks! Would be even better with no music, or at least if it was very much lower volume :)
Why not using both KG with Vector embeddings?
very nice and inspiring. QQ: if gpt4 created incorrect cipher, do we try to detect and auto fix/retry?
Thank you!
You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved
thank you for video
will u be able to share the prompts and code snippets?
The repo is still a work in progress, but I'm planning to make a video soon where I share and walk through the code in more detail!
Mahtavaa, ajattelin soveltaa tämmöistä ihan perinteiseen hierarkiseen taksonomiaan. Odotan innolla.
@@johannesjolkkonenthat's great to hear! i'm working on a project that needed to hear some of what you said
A full video-walkthrough is now live here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html
Repository link included (:
More info please.
Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?
Hey, I haven't added such metadata but that's a great idea!
For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.
I like the music
Is it better in some way than using SQL db and relations based on for example sql schemas etc. which also can be easily used when doing retrieval?
When the text to cypher conversion happens, how does the LLM know how the nodes/edges are labeled and therefore able to accurately write the query?
Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query.
For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.
@@johannesjolkkonen thank you! watching now!
How does the chat interface communicate with the database? Is it based on prompts that create cypher queries?
Nice!
Hey great video , do you have the code on repo?
Thanks! Yes I do, you can find a more detailed tutorial on my channel which also has the link to the repo (:
I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)
I am very excited to see how your code works. Please share your solution.
Darn!! I have been working on something similar, slightly different approach
The video is great, thank you, but the background music made it difficult for me to focus :(
I really don't see how this is any different than a typical database with more columns. For example:
Sort by company
Lookup Azure
Next sort by number of projects
Lookup employee
You can keep enhancing the graph with additional data at runtime and not have to change the schema (design time).
Gotta look at decentrlaized knowledge graphs. Those are the future of RAG databases.
I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without.
To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.
That's a huge ask. Essentially doubling his workload.
Please do not use music when creating future videos.
Hey, thanks for the feedback. I'll keep that in mind!
Agreed, that's really off-putting
I enjoyed the music 👍
agree but you can use music in between pause but not when you re not talking..
bro what about a "thank you for the amazing info" before nitpicking? 😂
Presentation about nothing. How to build that required
Hey, I also have a full tutorial on this here: ua-cam.com/video/tcHIDCGu6Yw/v-deo.html&lc=UgyOfLtgIOQyEu2zmMF4AaABAg 🙂
Yes the background music is distracting and annoying.
excellent video - but the music ...... please no.........