Thank you. Subscribed. There are so many AI channels that just talk how you can build this and that with LLMs and other word soup techniques, but don't actually show the process.
Thank you so much for including also the price tag. Seeing that it is only a few cents that such proof of concepts accumulate to is really encouraging to go and try it out. Also everything else in this video was absolute gold! Really complete, really A-to-Z. Thank you so much.
Great video without any annoying music, thanks! Would be great to see a from-scratch video about how you actually use this in answering user questions, combining the graph data and LLM capabillities.
First of all, first class presentation! I've been considering building something quite similar to utilise knowledge graphs as a method of storing long term memory for ChatGPT by proxy of function calling. The vague idea I have floating in my head, is that the relationships could be automated using the LLM at inference time with some well formatted prompts. The last part of the video where you showcase cypher generation is probably the missing piece of the puzzle for connecting the storage (Neo4J) and this is great for updating the knowledge graph. I just hope you get a chance to showcase a bi-directional example of this in your part 2, as right now I'm not strong on knowledge graph ingestion in a way that makes sense for seamless LLM output when a knowledge graph is used to supplement it.
thank you for sharing this! this is going to help so many organizations who can't afford teams of data analysts. to have this much insight into their data.... 🤯
Great Video! One thing i would like to add: I think that for larger datasets it is faster / more efficient to use the import tool that comes with Neo4j Aura, instead of executing a separate query for each node / relation.
Thanks! Generating just the relationship-triplets is a simpler, less error-prone task for the LLM than generating complete Cypher with correct syntax. And because converting those triplets to Cypher is just a matter of some string-parsing, we might as well use python for that. It's always a good idea to do as much as possible with just plain old code, using LLMs just where necessary. A bit more work maybe, but a lot more reliable (:
Sure, it's exactly the same in principle! Of course the quality of entity extraction might vary between models, with OpenAI's models being top level. But this works fine with GPT3.5, so you could most likely get similar results with Llama 2 (:
Great video Johannes, thanks! Just wondering whether you could do a retrieval example of this? Would be great to see how it compares to a vector store. When you read online theres a lot saying that retrieval is slower and less efficient but not sure what the think. Would be great to get your insight with a video to explain
Hey Hassan, thank you very much! I'm going to create a video of the retrieval component very soon, and that will be out within the next couple of days (:
@@johannesjolkkonen good stuff. Look forward to it brother :) Do you offer consulting on stuff like this? Working on a startup where I think KG will play a role. Do you have an email I can send information to? Keep up the good work :)
Great video, thanks for that! I would also be interested in data quality here. I noticed a few inconsistencies in your input data. How did LLM cope with that? How accurate is the output knowledge graph? Can you make a more detailed comparison or share the output file pls?
excellent presentation. Love the detail and depth. Have you had to perform this on email text? I'm processing a large number of emails so the grammar and hence entities and relationships are not so clearly delineated. Was wondering if you've seen anything on performing this kind of LLM extraction on email texts or if you have any suggestions. I just started my journey into graph and its super cool, so def enjoy this content. Cheers,
Thank you Emilio! I haven't tried this myself, but it's a great idea. Extracting entities from the contents themselves might be hard for the reasons you said, but you could definitely get the sender-recipient relationships on a graph, and then use LLMs to add things like sentiment scores and email-thread summaries to those relationships. Maybe also segment email-conversations under some categories, to get a more high-level understanding of what themes people are discussing over emails. This is not so different to how graphs are already being used in social networks and content recommenders, but the LLMs definitely add more possibilities to the picture. Keep me posted if you end up doing something like this!
Great Job Johannes ! I'm curious to discuss about the interest of going with Azur OpenAI instead of directly to OpenAI Thx, and once again, great job !
Hey Joffrey, thank you! The main reason is that on Azure, you can run the model with a dedicated and isolated endpoint, and all data that's passed to that endpoint is covered by Azure's enterprise-grade data privacy guarantees. Another thing which is important for a lot of companies is that you can choose the region in which this endpoint is hosted 🌍🌎
Can you point me to where Azure offers privacy guarantees to open AI users on their platform? We were considering it for our clients, but I cannot find documentation that seems to include Azure openAI under their data privacy terms,@@johannesjolkkonen
Hey! As mentioned in the tutorial, Neo4j Aura offers your first instance for free, up to 1 million nodes. I'm not aware of a graph database that would offer free and unlimited capacity
Cloud platforms are currently such an annoyance because of the "essential complexity" required for them to capture service charges and maintain security. It makes me think of the dial-up model of internet access back in the day or the clumsy process of installing printer drivers, instead of plugging the thing in and selecting print.
Hi. Thanks a lot for the helpful content. I have a question. When I run the ingestion_pipeline() function, I only get two entities in Neo4j. Those that have a space in their name are not covered. Could you please guide me to solve the issue?
Hey! Yeah, it's important things to ensure is that the node & relationship types and property keys don't have spaces or special characters as they are not allowed and will fail. I recall I had some sanitization already in the cypher generation to ensure this, but in any case you should be able to fix it pretty easily by running some .replace(" ", "") in your code, getting rid of spaces in the names before generating & running the cypher statements. Hope this helps!
So what's the difference between drawing this on a whiteboard from statistical information and using a graph that does what you could draw on a whiteboard from statistical information?
40 mn de cuisine, no linguistics, but that's the trend. All in all, one can copy it all except - Azure subscription - entity typing and some Neo4J setup - data sources ?
I'm a little confused. Why would you want to do it this way instead of using a hardcoded config of value types and their relationships? You're calling it unstructured data, but it's anything but unstructured. It has clear fields and values. So, I'm trying to understand the benefit here.
Hey Doug, that's a valid question. It's true that the markdown-files here are structured very neatly, with headings that work almost like fields. In this case, you could get something similar with just standard text-parsing, extracting values from the markdown based on the headers. However, the approach of using LLMs can be generalized to more complex situations, working with longer and messier documents (like pdfs) where the entities and relationships are more implicit, and text-parsing won't get you there. Hope that makes sense!
@johannesjolkkonen it does make sense. I was weighing the cost:benefit ration of using an LLM and token processing for such neatly structured data. It would be very expensive vs parsing them with a config.
few questions here: 1. Can i retrieve the source documents from where the response was generated using graph knowledge? 2. How can i avoid deduplication of data, if i am planning to ingest data from multiple sources (creating a data ingest pipeline)? 3. How will i update the data present in the database which was true last week, but now it is not (like till last week my device had 3 ports, but now it has 6 ports) Thanks in advance!
Hey Mukesh, great questions! 1. Sure. You can store some metadata about the source documents (document title, link, etc.), in each node and relationship, and then include that metadata in the query results when querying the database. You could also have the documents as nodes of their own, with relationships to all nodes that originate from the document. 2. This is a key challenge in these applications. I haven't sadly done work on this myself yet, but here's a few good resources about this: ua-cam.com/video/dNGV4sLkOcA/v-deo.html and margin.re/2023/06/entity-resolution-in-reagent/ A lot of people have asked about this same thing, and it'll definitely be a topic for a video soon (: 3. Assuming you have a good way to id the nodes, it's pretty easy to match the nodes by id and update their attributes with SET -statements. See here: neo4j.com/docs/getting-started/cypher-intro/updating/
@@johannesjolkkonen thank you for your time and help, highly appreciate 😊 would be looking forward data deduplication and ingest pipeline video. *The notification bell icon is ON"
well, thank you. that were very interesting 40mins. are you aware that your mic picks up the noise your arms make on the desk? and i dont know if "python" in a title is very smart, it scares at least me ;) nah, its probably fine, just kidding
I re-generated all the credentials shown here before publishing, so these ones don't work anymore (: But it's a great point that I probably should've mentioned in the video, always rotate your creds!
It’s a simple concept video… in a Jupyter notebook… without tests or anything. Injection risks are really the far end of hurdles for public production code here.
That is just rude. Not saying that it is wrong, but criticizing it here is just so out of place. Shame on you for not respecting the quality product this guy offered.
This guy must be a hardcore relational database guy having no relationship's outside his primary/foreign key. Don’t put such constraints in your life bro... Remove duplicate elements from your life and travers the nodes of real world concepts. 😂
Basicaaly am getting this error Running pipeline for 11 files in project_briefs folder Extracting entities and relationships for ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md Error processing ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md: Connection error. Extracting entities and relationships for ./data/project_briefs\AlphaCorp Customer Support Chatbot.md Error processing ./data/project_briefs\AlphaCorp Customer Support Chatbot.md: Connection error. While am executing the result can you help me with fixing it.
Hey Aravind! That seems like the entity extraction / LLM step is working, but there's an issue connecting to Neo4j. I would check that - Your neo4j-instance is running - Your connection url is in the correct format (neo4j+s://{your-database-id}.databases.neo4j.io:7687) - Your username and password are correct
Thank you. Subscribed. There are so many AI channels that just talk how you can build this and that with LLMs and other word soup techniques, but don't actually show the process.
Thank you so much for including also the price tag. Seeing that it is only a few cents that such proof of concepts accumulate to is really encouraging to go and try it out. Also everything else in this video was absolute gold! Really complete, really A-to-Z. Thank you so much.
Great video without any annoying music, thanks! Would be great to see a from-scratch video about how you actually use this in answering user questions, combining the graph data and LLM capabillities.
That was a great share on knowledge graphs and LLMs. Thanks for putting it together.
First of all, first class presentation! I've been considering building something quite similar to utilise knowledge graphs as a method of storing long term memory for ChatGPT by proxy of function calling. The vague idea I have floating in my head, is that the relationships could be automated using the LLM at inference time with some well formatted prompts. The last part of the video where you showcase cypher generation is probably the missing piece of the puzzle for connecting the storage (Neo4J) and this is great for updating the knowledge graph. I just hope you get a chance to showcase a bi-directional example of this in your part 2, as right now I'm not strong on knowledge graph ingestion in a way that makes sense for seamless LLM output when a knowledge graph is used to supplement it.
To develop AI you need to have HI... JJ has it! Thanks for the class! 👏👏👏👏👏👏👏👏👏
thank you for sharing this! this is going to help so many organizations who can't afford teams of data analysts. to have this much insight into their data.... 🤯
Excellent video, thankyou for actually coding and showing the process, I was long stuck in this
on my todo list. Was looking it, many thanks !
Very good round up. I just started to follow you. This is as useful as papers.
Looking forward to part 2
That was incredibly interesting and inspiring! Thank you!
Thanks for sharing, great content!
amazing! thank you very much.
You absolute Chad! 🙏🏾
Great Video! One thing i would like to add: I think that for larger datasets it is faster / more efficient to use the import tool that comes with Neo4j Aura, instead of executing a separate query for each node / relation.
great video! is there a reason why you didn't have the LLM generate the cypher?
Thanks! Generating just the relationship-triplets is a simpler, less error-prone task for the LLM than generating complete Cypher with correct syntax. And because converting those triplets to Cypher is just a matter of some string-parsing, we might as well use python for that.
It's always a good idea to do as much as possible with just plain old code, using LLMs just where necessary. A bit more work maybe, but a lot more reliable (:
@@johannesjolkkonen I see, makes sense. Appreciate the answer! :)
Great video! I really learn a lot and enjoy the video. Thanks!
this is awesome bro just subscribed, is this the same for open source models; wanted to host using LM studio etc
Sure, it's exactly the same in principle! Of course the quality of entity extraction might vary between models, with OpenAI's models being top level.
But this works fine with GPT3.5, so you could most likely get similar results with Llama 2 (:
Johannes - thank you for the video! what r ur thoughts on building a KG-native CRM?
Great video Johannes, thanks!
Just wondering whether you could do a retrieval example of this?
Would be great to see how it compares to a vector store. When you read online theres a lot saying that retrieval is slower and less efficient but not sure what the think.
Would be great to get your insight with a video to explain
Hey Hassan, thank you very much!
I'm going to create a video of the retrieval component very soon, and that will be out within the next couple of days (:
@@johannesjolkkonen good stuff. Look forward to it brother :)
Do you offer consulting on stuff like this? Working on a startup where I think KG will play a role. Do you have an email I can send information to?
Keep up the good work :)
@@johannesjolkkonenlet’s gooooo
Great job!
Thank you Tomaz!
Great video, thanks for that!
I would also be interested in data quality here. I noticed a few inconsistencies in your input data. How did LLM cope with that? How accurate is the output knowledge graph? Can you make a more detailed comparison or share the output file pls?
Can we do this with LlamaIndex?
Awesome man ❤❤❤🎉🎉 i love it the way you present if i have button to subscribe more i will hit it millions time
excellent presentation. Love the detail and depth. Have you had to perform this on email text? I'm processing a large number of emails so the grammar and hence entities and relationships are not so clearly delineated. Was wondering if you've seen anything on performing this kind of LLM extraction on email texts or if you have any suggestions. I just started my journey into graph and its super cool, so def enjoy this content. Cheers,
Thank you Emilio!
I haven't tried this myself, but it's a great idea. Extracting entities from the contents themselves might be hard for the reasons you said, but you could definitely get the sender-recipient relationships on a graph, and then use LLMs to add things like sentiment scores and email-thread summaries to those relationships. Maybe also segment email-conversations under some categories, to get a more high-level understanding of what themes people are discussing over emails.
This is not so different to how graphs are already being used in social networks and content recommenders, but the LLMs definitely add more possibilities to the picture. Keep me posted if you end up doing something like this!
Is there any idea on how we will scale it? If the documents is large then entity duplication can happen in graph right? How will we solve that?
Make your text as big as possible when sharing your screen. Thank you for your video.
Noted!
How are the prompt templates written, are there any guidelines to writing those ?
Great Job Johannes !
I'm curious to discuss about the interest of going with Azur OpenAI instead of directly to OpenAI
Thx, and once again, great job !
Hey Joffrey, thank you!
The main reason is that on Azure, you can run the model with a dedicated and isolated endpoint, and all data that's passed to that endpoint is covered by Azure's enterprise-grade data privacy guarantees. Another thing which is important for a lot of companies is that you can choose the region in which this endpoint is hosted 🌍🌎
Can you point me to where Azure offers privacy guarantees to open AI users on their platform? We were considering it for our clients, but I cannot find documentation that seems to include Azure openAI under their data privacy terms,@@johannesjolkkonen
Do you think you can make it for a github repo or incoming code?
Is it possible to use a free alternative for Neo4j?
Hey! As mentioned in the tutorial, Neo4j Aura offers your first instance for free, up to 1 million nodes.
I'm not aware of a graph database that would offer free and unlimited capacity
Cloud platforms are currently such an annoyance because of the "essential complexity" required for them to capture service charges and maintain security. It makes me think of the dial-up model of internet access back in the day or the clumsy process of installing printer drivers, instead of plugging the thing in and selecting print.
This is awsome, thanks for the work. Your setup reminds me of Windows XP :D
Can I integrate the same solution with Memgraph?
Hi. Thanks a lot for the helpful content. I have a question. When I run the ingestion_pipeline() function, I only get two entities in Neo4j. Those that have a space in their name are not covered. Could you please guide me to solve the issue?
Hey!
Yeah, it's important things to ensure is that the node & relationship types and property keys don't have spaces or special characters as they are not allowed and will fail. I recall I had some sanitization already in the cypher generation to ensure this, but in any case you should be able to fix it pretty easily by running some .replace(" ", "") in your code, getting rid of spaces in the names before generating & running the cypher statements.
Hope this helps!
What if you have the neo4j desktop version? How do you access it from your code?
So what's the difference between drawing this on a whiteboard from statistical information and using a graph that does what you could draw on a whiteboard from statistical information?
😊
40 mn de cuisine, no linguistics, but that's the trend.
All in all, one can copy it all except
- Azure subscription
- entity typing and some Neo4J setup
- data sources
?
I'm a little confused. Why would you want to do it this way instead of using a hardcoded config of value types and their relationships?
You're calling it unstructured data, but it's anything but unstructured. It has clear fields and values.
So, I'm trying to understand the benefit here.
Hey Doug, that's a valid question. It's true that the markdown-files here are structured very neatly, with headings that work almost like fields. In this case, you could get something similar with just standard text-parsing, extracting values from the markdown based on the headers.
However, the approach of using LLMs can be generalized to more complex situations, working with longer and messier documents (like pdfs) where the entities and relationships are more implicit, and text-parsing won't get you there.
Hope that makes sense!
@johannesjolkkonen it does make sense. I was weighing the cost:benefit ration of using an LLM and token processing for such neatly structured data. It would be very expensive vs parsing them with a config.
few questions here:
1. Can i retrieve the source documents from where the response was generated using graph knowledge?
2. How can i avoid deduplication of data, if i am planning to ingest data from multiple sources (creating a data ingest pipeline)?
3. How will i update the data present in the database which was true last week, but now it is not (like till last week my device had 3 ports, but now it has 6 ports)
Thanks in advance!
Hey Mukesh, great questions!
1. Sure. You can store some metadata about the source documents (document title, link, etc.), in each node and relationship, and then include that metadata in the query results when querying the database. You could also have the documents as nodes of their own, with relationships to all nodes that originate from the document.
2. This is a key challenge in these applications. I haven't sadly done work on this myself yet, but here's a few good resources about this: ua-cam.com/video/dNGV4sLkOcA/v-deo.html and margin.re/2023/06/entity-resolution-in-reagent/
A lot of people have asked about this same thing, and it'll definitely be a topic for a video soon (:
3. Assuming you have a good way to id the nodes, it's pretty easy to match the nodes by id and update their attributes with SET -statements. See here: neo4j.com/docs/getting-started/cypher-intro/updating/
@@johannesjolkkonen thank you for your time and help, highly appreciate 😊 would be looking forward data deduplication and ingest pipeline video.
*The notification bell icon is ON"
well, thank you. that were very interesting 40mins.
are you aware that your mic picks up the noise your arms make on the desk? and i dont know if "python" in a title is very smart, it scares at least me ;) nah, its probably fine, just kidding
you probably should not show your openai key like that
I re-generated all the credentials shown here before publishing, so these ones don't work anymore (:
But it's a great point that I probably should've mentioned in the video, always rotate your creds!
Why do you care 😂
So this is what happens with the data from the quarterly HR forms/questionnaires lol
...horrible injection vulnerability
It’s a simple concept video… in a Jupyter notebook… without tests or anything. Injection risks are really the far end of hurdles for public production code here.
That is just rude. Not saying that it is wrong, but criticizing it here is just so out of place. Shame on you for not respecting the quality product this guy offered.
That’s why we need to take data cleansing seriously in AI
😂
This guy must be a hardcore relational database guy having no relationship's outside his primary/foreign key. Don’t put such constraints in your life bro... Remove duplicate elements from your life and travers the nodes of real world concepts. 😂
Souomolinen sisu!
Basicaaly am getting this error Running pipeline for 11 files in project_briefs folder
Extracting entities and relationships for ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md
Error processing ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md: Connection error.
Extracting entities and relationships for ./data/project_briefs\AlphaCorp Customer Support Chatbot.md
Error processing ./data/project_briefs\AlphaCorp Customer Support Chatbot.md: Connection error. While am executing the result can you help me with fixing it.
Hey Aravind!
That seems like the entity extraction / LLM step is working, but there's an issue connecting to Neo4j. I would check that
- Your neo4j-instance is running
- Your connection url is in the correct format (neo4j+s://{your-database-id}.databases.neo4j.io:7687)
- Your username and password are correct