How to Build Knowledge Graphs With LLMs (python tutorial)

Поділитися
Вставка
  • Опубліковано 17 січ 2025

КОМЕНТАРІ • 75

  • @lhxperimental
    @lhxperimental Рік тому +9

    Thank you. Subscribed. There are so many AI channels that just talk how you can build this and that with LLMs and other word soup techniques, but don't actually show the process.

  • @123unhooked
    @123unhooked Рік тому +3

    Thank you so much for including also the price tag. Seeing that it is only a few cents that such proof of concepts accumulate to is really encouraging to go and try it out. Also everything else in this video was absolute gold! Really complete, really A-to-Z. Thank you so much.

  • @Epistemophilos
    @Epistemophilos Рік тому +3

    Great video without any annoying music, thanks! Would be great to see a from-scratch video about how you actually use this in answering user questions, combining the graph data and LLM capabillities.

  • @chrisogonas
    @chrisogonas 9 місяців тому +1

    That was a great share on knowledge graphs and LLMs. Thanks for putting it together.

  • @KCM25NJL
    @KCM25NJL Рік тому +5

    First of all, first class presentation! I've been considering building something quite similar to utilise knowledge graphs as a method of storing long term memory for ChatGPT by proxy of function calling. The vague idea I have floating in my head, is that the relationships could be automated using the LLM at inference time with some well formatted prompts. The last part of the video where you showcase cypher generation is probably the missing piece of the puzzle for connecting the storage (Neo4J) and this is great for updating the knowledge graph. I just hope you get a chance to showcase a bi-directional example of this in your part 2, as right now I'm not strong on knowledge graph ingestion in a way that makes sense for seamless LLM output when a knowledge graph is used to supplement it.

  • @ProfFranciscoAraujo
    @ProfFranciscoAraujo 27 днів тому

    To develop AI you need to have HI... JJ has it! Thanks for the class! 👏👏👏👏👏👏👏👏👏

  • @w_chadly
    @w_chadly Рік тому +1

    thank you for sharing this! this is going to help so many organizations who can't afford teams of data analysts. to have this much insight into their data.... 🤯

  • @masked00000
    @masked00000 11 місяців тому

    Excellent video, thankyou for actually coding and showing the process, I was long stuck in this

  • @kamiln8398
    @kamiln8398 Рік тому +1

    on my todo list. Was looking it, many thanks !

  • @kewpietonkatsu
    @kewpietonkatsu Рік тому +1

    Very good round up. I just started to follow you. This is as useful as papers.

  • @andydataguy
    @andydataguy Рік тому +1

    Looking forward to part 2

  • @vivalancsweert9913
    @vivalancsweert9913 Рік тому +1

    That was incredibly interesting and inspiring! Thank you!

  • @JelckedeBoer
    @JelckedeBoer Рік тому +1

    Thanks for sharing, great content!

  • @sgttomas
    @sgttomas 11 місяців тому

    amazing! thank you very much.

  • @andydataguy
    @andydataguy Рік тому +1

    You absolute Chad! 🙏🏾

  • @Music4ever326
    @Music4ever326 Рік тому

    Great Video! One thing i would like to add: I think that for larger datasets it is faster / more efficient to use the import tool that comes with Neo4j Aura, instead of executing a separate query for each node / relation.

  • @3ti65
    @3ti65 9 місяців тому +2

    great video! is there a reason why you didn't have the LLM generate the cypher?

    • @johannesjolkkonen
      @johannesjolkkonen  9 місяців тому +3

      Thanks! Generating just the relationship-triplets is a simpler, less error-prone task for the LLM than generating complete Cypher with correct syntax. And because converting those triplets to Cypher is just a matter of some string-parsing, we might as well use python for that.
      It's always a good idea to do as much as possible with just plain old code, using LLMs just where necessary. A bit more work maybe, but a lot more reliable (:

    • @3ti65
      @3ti65 9 місяців тому

      @@johannesjolkkonen I see, makes sense. Appreciate the answer! :)

  • @satri101
    @satri101 Рік тому +1

    Great video! I really learn a lot and enjoy the video. Thanks!

  • @hy3na-xyz
    @hy3na-xyz Рік тому +3

    this is awesome bro just subscribed, is this the same for open source models; wanted to host using LM studio etc

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому

      Sure, it's exactly the same in principle! Of course the quality of entity extraction might vary between models, with OpenAI's models being top level.
      But this works fine with GPT3.5, so you could most likely get similar results with Llama 2 (:

  • @chriseun0503
    @chriseun0503 3 місяці тому

    Johannes - thank you for the video! what r ur thoughts on building a KG-native CRM?

  • @hassanullah1997
    @hassanullah1997 Рік тому +3

    Great video Johannes, thanks!
    Just wondering whether you could do a retrieval example of this?
    Would be great to see how it compares to a vector store. When you read online theres a lot saying that retrieval is slower and less efficient but not sure what the think.
    Would be great to get your insight with a video to explain

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому +4

      Hey Hassan, thank you very much!
      I'm going to create a video of the retrieval component very soon, and that will be out within the next couple of days (:

    • @hassanullah1997
      @hassanullah1997 Рік тому

      @@johannesjolkkonen good stuff. Look forward to it brother :)
      Do you offer consulting on stuff like this? Working on a startup where I think KG will play a role. Do you have an email I can send information to?
      Keep up the good work :)

    • @andydataguy
      @andydataguy Рік тому

      @@johannesjolkkonenlet’s gooooo

  • @jebisetitut
    @jebisetitut Рік тому

    Great job!

  • @michalstun5187
    @michalstun5187 11 місяців тому

    Great video, thanks for that!
    I would also be interested in data quality here. I noticed a few inconsistencies in your input data. How did LLM cope with that? How accurate is the output knowledge graph? Can you make a more detailed comparison or share the output file pls?

  • @theuser810
    @theuser810 7 місяців тому +2

    Can we do this with LlamaIndex?

  • @nikhilshingadiya7798
    @nikhilshingadiya7798 Рік тому +2

    Awesome man ❤❤❤🎉🎉 i love it the way you present if i have button to subscribe more i will hit it millions time

  • @EmilioGagliardi
    @EmilioGagliardi Рік тому +1

    excellent presentation. Love the detail and depth. Have you had to perform this on email text? I'm processing a large number of emails so the grammar and hence entities and relationships are not so clearly delineated. Was wondering if you've seen anything on performing this kind of LLM extraction on email texts or if you have any suggestions. I just started my journey into graph and its super cool, so def enjoy this content. Cheers,

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому +1

      Thank you Emilio!
      I haven't tried this myself, but it's a great idea. Extracting entities from the contents themselves might be hard for the reasons you said, but you could definitely get the sender-recipient relationships on a graph, and then use LLMs to add things like sentiment scores and email-thread summaries to those relationships. Maybe also segment email-conversations under some categories, to get a more high-level understanding of what themes people are discussing over emails.
      This is not so different to how graphs are already being used in social networks and content recommenders, but the LLMs definitely add more possibilities to the picture. Keep me posted if you end up doing something like this!

  • @Manu-m8w6m
    @Manu-m8w6m 10 місяців тому +1

    Is there any idea on how we will scale it? If the documents is large then entity duplication can happen in graph right? How will we solve that?

  • @ryanslab302
    @ryanslab302 Рік тому +1

    Make your text as big as possible when sharing your screen. Thank you for your video.

  • @krishnakandula6587
    @krishnakandula6587 8 місяців тому

    How are the prompt templates written, are there any guidelines to writing those ?

  • @joffreylemery6414
    @joffreylemery6414 Рік тому +1

    Great Job Johannes !
    I'm curious to discuss about the interest of going with Azur OpenAI instead of directly to OpenAI
    Thx, and once again, great job !

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому +5

      Hey Joffrey, thank you!
      The main reason is that on Azure, you can run the model with a dedicated and isolated endpoint, and all data that's passed to that endpoint is covered by Azure's enterprise-grade data privacy guarantees. Another thing which is important for a lot of companies is that you can choose the region in which this endpoint is hosted 🌍🌎

    • @mikelewis1166
      @mikelewis1166 Рік тому

      Can you point me to where Azure offers privacy guarantees to open AI users on their platform? We were considering it for our clients, but I cannot find documentation that seems to include Azure openAI under their data privacy terms,​@@johannesjolkkonen

  • @johny1n
    @johny1n 9 місяців тому

    Do you think you can make it for a github repo or incoming code?

  • @AshWickramasinghe
    @AshWickramasinghe Рік тому +1

    Is it possible to use a free alternative for Neo4j?

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому

      Hey! As mentioned in the tutorial, Neo4j Aura offers your first instance for free, up to 1 million nodes.
      I'm not aware of a graph database that would offer free and unlimited capacity

  • @u2b83
    @u2b83 Рік тому

    Cloud platforms are currently such an annoyance because of the "essential complexity" required for them to capture service charges and maintain security. It makes me think of the dial-up model of internet access back in the day or the clumsy process of installing printer drivers, instead of plugging the thing in and selecting print.

  • @Doggy_Styles_Coding
    @Doggy_Styles_Coding Рік тому +1

    This is awsome, thanks for the work. Your setup reminds me of Windows XP :D

  • @Milind-eu4fc
    @Milind-eu4fc 7 місяців тому

    Can I integrate the same solution with Memgraph?

  • @HaniehKh-v9i
    @HaniehKh-v9i 9 місяців тому +1

    Hi. Thanks a lot for the helpful content. I have a question. When I run the ingestion_pipeline() function, I only get two entities in Neo4j. Those that have a space in their name are not covered. Could you please guide me to solve the issue?

    • @johannesjolkkonen
      @johannesjolkkonen  9 місяців тому

      Hey!
      Yeah, it's important things to ensure is that the node & relationship types and property keys don't have spaces or special characters as they are not allowed and will fail. I recall I had some sanitization already in the cypher generation to ensure this, but in any case you should be able to fix it pretty easily by running some .replace(" ", "") in your code, getting rid of spaces in the names before generating & running the cypher statements.
      Hope this helps!

  • @moviecules1697
    @moviecules1697 8 місяців тому

    What if you have the neo4j desktop version? How do you access it from your code?

  • @pichirisu
    @pichirisu Рік тому

    So what's the difference between drawing this on a whiteboard from statistical information and using a graph that does what you could draw on a whiteboard from statistical information?

  • @MegaNightdude
    @MegaNightdude 11 місяців тому

    😊

  • @PijanitsaVode
    @PijanitsaVode 9 місяців тому

    40 mn de cuisine, no linguistics, but that's the trend.
    All in all, one can copy it all except
    - Azure subscription
    - entity typing and some Neo4J setup
    - data sources
    ?

  • @dougclendening5896
    @dougclendening5896 Рік тому

    I'm a little confused. Why would you want to do it this way instead of using a hardcoded config of value types and their relationships?
    You're calling it unstructured data, but it's anything but unstructured. It has clear fields and values.
    So, I'm trying to understand the benefit here.

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому

      Hey Doug, that's a valid question. It's true that the markdown-files here are structured very neatly, with headings that work almost like fields. In this case, you could get something similar with just standard text-parsing, extracting values from the markdown based on the headers.
      However, the approach of using LLMs can be generalized to more complex situations, working with longer and messier documents (like pdfs) where the entities and relationships are more implicit, and text-parsing won't get you there.
      Hope that makes sense!

    • @dougclendening5896
      @dougclendening5896 Рік тому

      @johannesjolkkonen it does make sense. I was weighing the cost:benefit ration of using an LLM and token processing for such neatly structured data. It would be very expensive vs parsing them with a config.

  • @Yogic-ignition
    @Yogic-ignition Рік тому +2

    few questions here:
    1. Can i retrieve the source documents from where the response was generated using graph knowledge?
    2. How can i avoid deduplication of data, if i am planning to ingest data from multiple sources (creating a data ingest pipeline)?
    3. How will i update the data present in the database which was true last week, but now it is not (like till last week my device had 3 ports, but now it has 6 ports)
    Thanks in advance!

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому +1

      Hey Mukesh, great questions!
      1. Sure. You can store some metadata about the source documents (document title, link, etc.), in each node and relationship, and then include that metadata in the query results when querying the database. You could also have the documents as nodes of their own, with relationships to all nodes that originate from the document.
      2. This is a key challenge in these applications. I haven't sadly done work on this myself yet, but here's a few good resources about this: ua-cam.com/video/dNGV4sLkOcA/v-deo.html and margin.re/2023/06/entity-resolution-in-reagent/
      A lot of people have asked about this same thing, and it'll definitely be a topic for a video soon (:
      3. Assuming you have a good way to id the nodes, it's pretty easy to match the nodes by id and update their attributes with SET -statements. See here: neo4j.com/docs/getting-started/cypher-intro/updating/

    • @Yogic-ignition
      @Yogic-ignition Рік тому

      @@johannesjolkkonen thank you for your time and help, highly appreciate 😊 would be looking forward data deduplication and ingest pipeline video.
      *The notification bell icon is ON"

  • @karlarsch7068
    @karlarsch7068 Рік тому

    well, thank you. that were very interesting 40mins.
    are you aware that your mic picks up the noise your arms make on the desk? and i dont know if "python" in a title is very smart, it scares at least me ;) nah, its probably fine, just kidding

  • @gatuhcreations
    @gatuhcreations Рік тому +5

    you probably should not show your openai key like that

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому +11

      I re-generated all the credentials shown here before publishing, so these ones don't work anymore (:
      But it's a great point that I probably should've mentioned in the video, always rotate your creds!

    • @asads30
      @asads30 6 місяців тому

      Why do you care 😂

  • @u2b83
    @u2b83 Рік тому +1

    So this is what happens with the data from the quarterly HR forms/questionnaires lol

  • @luisdanielmesa
    @luisdanielmesa Рік тому +4

    ...horrible injection vulnerability

    • @dinoscheidt
      @dinoscheidt Рік тому +12

      It’s a simple concept video… in a Jupyter notebook… without tests or anything. Injection risks are really the far end of hurdles for public production code here.

    • @123unhooked
      @123unhooked Рік тому +7

      That is just rude. Not saying that it is wrong, but criticizing it here is just so out of place. Shame on you for not respecting the quality product this guy offered.

    • @jgcornell
      @jgcornell 11 місяців тому

      That’s why we need to take data cleansing seriously in AI

    • @JordyBackes
      @JordyBackes 10 місяців тому

      😂

    • @crusnic_corp
      @crusnic_corp 10 місяців тому +1

      This guy must be a hardcore relational database guy having no relationship's outside his primary/foreign key. Don’t put such constraints in your life bro... Remove duplicate elements from your life and travers the nodes of real world concepts. 😂

  • @thedudephx
    @thedudephx Рік тому

    Souomolinen sisu!

  • @aravindarjun4814
    @aravindarjun4814 Рік тому +1

    Basicaaly am getting this error Running pipeline for 11 files in project_briefs folder
    Extracting entities and relationships for ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md
    Error processing ./data/project_briefs\AlphaCorp AWS-Powered Sales Analytics Dashboard.md: Connection error.
    Extracting entities and relationships for ./data/project_briefs\AlphaCorp Customer Support Chatbot.md
    Error processing ./data/project_briefs\AlphaCorp Customer Support Chatbot.md: Connection error. While am executing the result can you help me with fixing it.

    • @johannesjolkkonen
      @johannesjolkkonen  Рік тому

      Hey Aravind!
      That seems like the entity extraction / LLM step is working, but there's an issue connecting to Neo4j. I would check that
      - Your neo4j-instance is running
      - Your connection url is in the correct format (neo4j+s://{your-database-id}.databases.neo4j.io:7687)
      - Your username and password are correct