Leann Chen
Leann Chen
  • 5
  • 44 466
You Need Better Knowledge Graphs for Your RAG
RAG (Retrieval-Augmented Generation) has become the hype of Generative AI applications, so are knowledge graphs. You see lots of graph-based LLM apps out there and you're probably building one too. However, how you construct knowledge graphs determines the quality of your LLM-based application. Solely relying on GPT-4 for extracting entities and relationships without thorough evaluation will give you the garbage-in-garbage-out effect.
To get prepared for Data Day Texas 2024, I built a Graph RAG AI assistant using Diffbot API for both web scraping and knowledge graph construction. You'll see how I built it while monitoring the results throughout the video. Diffbot offers transparency in the information retrieval process and benchmarks for evaluating the accuracy of the information retrieved.
Diffbot's APIs are free to use, including the Natural Language API that was used in the video:
app.diffbot.com
Note: This video is independently produced and is not sponsored by Diffbot, Neo4j, or Streamlit.
Here's the link to my Github repo for this project:
github.com/leannchen86/graph-rag-ai-assistant
0:00 Intro
0:53 Step 1. Web Scraping with Diffbot API
1:37 Step 2. Construct knowledge graph with Diffbot Graph Transformer (Langchain)
3:31 Step 3. Customize Diffbot Graph Transformer
3:41 Step 4. Import Diffbot Knowledge Graph into Neo4j Database
5:03 Step 5. What Entity/Relationship Extraction Looks Like By GPT-4
5:41 Step 6. Meet My Graph RAG AI Assistant
7:08 Outro
#knowledgegraph #generativeai #llm #aichatbot
Music: Background Motivating Corporate by WinnieTheMoog
Free download: filmmusic.io/song/6611-background-motivating-corporate
Licensed under CC BY 4.0: filmmusic.io/standard-license
Переглядів: 31 159

Відео

Build an Advanced RAG Chatbot with Neo4j Knowledge Graph
Переглядів 12 тис.6 місяців тому
Advanced RAG (Retrieval-Augmented Generation) and knowledge graphs make AI chatbots more powerful and context-aware. Your chatbot can digest more data sources than just one document. We feed the chatbot with different text data regarding the event of Sam Altman's surprising exit and return to OpenAI. This video walks you through how to build the system with LLM tools. 0:00 Intro 0:42 Load wiki ...
Vector Search (RAG) Decodes Inside Out
Переглядів 7657 місяців тому
🎬 Inside Out Data Adventure! In this video, we take a fun look at Pixar's 'Inside Out' and see what data can tell us about the movie's 5 emotions: 😄😢😠😱😒. We're using cool AI to understand the characters better and build a simple app to show you the story in a whole new way. What's Inside: - Cool ways to turn the 'Inside Out' script into pictures and graphs. - A chatbot that helps us dig into th...
Meet the Inside Out AI Chatbot! 🤖🎬
Переглядів 4717 місяців тому
Curious about the emotions from the movie Inside Out? I've built an AI chatbot to explore them! This video is a quick peek at how we decode the movie's feelings with AI technologies. Check out the app: inside-out-character-explorer-wdyr3tx8maxqwgjrh2bnz3.streamlit.app What to Expect: A sneak peek at using AI to chat about 'Inside Out's characters. A teaser of our data science journey with Joy, ...

КОМЕНТАРІ

  • @riyashah1161
    @riyashah1161 2 дні тому

    you're so pretty

  • @mohammedshuaibiqbal5469
    @mohammedshuaibiqbal5469 12 днів тому

    Can you make a video on how to generate knowledge graphs for pdf books like DSM 5

  • @mandraketupi5
    @mandraketupi5 16 днів тому

    Hi Leann Thank you! Your videos are an amazing source of inspiration! keep up the good work!

  • @enkhbatenkhjargal9447
    @enkhbatenkhjargal9447 29 днів тому

    Thanks for this video, subscribed! :)

  • @senthilkumarpalanisamy365
    @senthilkumarpalanisamy365 Місяць тому

    Excellent video, clear explanation, please do post more in the gen ai and knowledge graph space

  • @markquinsland8385
    @markquinsland8385 Місяць тому

    go to the presentation by Amy Hodler (and tell her I said hello)

  • @AP-fu3bj
    @AP-fu3bj Місяць тому

    Are you creating embeddings on top of the knowledge graph for RAG??

  • @AP-fu3bj
    @AP-fu3bj Місяць тому

    Can you please share the code for the application you built to visualize the knowledge graph ?

  • @idk-kv9hg
    @idk-kv9hg Місяць тому

    Hey Leann, first of all great explanation with some insights (specially on Diffbot). You got a new subscriber 👍 I'm going to work on a RAG based project which will use Neo4J as a Graph Database. I've went through other comments and your answers to them. But still wanted to know few things: 1. Here you took the example of speaker and what they have spoken (and their interest/expertise etc...) which is working fine. But what if I have some PDF docs of roughly 50-70 pages with some rules and regulations and want to use them as a custom knowledge base from my RAG project? Is knowledge graph database is good choice? why not simple Vector DB (such as milvus db)? 2. Assuming I must use Graph database, how do I efficiently chunk the PDFs and store into graph notes and relations? So that if users asks any query then user should get correct answer. 3. If the docs are related to rules and regulations, then what will be the nodes and relationships between them? Because here in your example, nodes were speaker, their expertise etc... I understand that you might not have perfect answer for all of these above but I'd like to have some point of view. Hope you find my comment and reply me once you get a time. Thanks for reading and your time.😊

    • @Abstract.x
      @Abstract.x 13 годин тому

      Hey great questions, I'm also considering working on such. Did you figure out answers to your questions? they will help me better understand the usability of this as well

  • @linlinlau6785
    @linlinlau6785 Місяць тому

    英文發音清楚。速度剛好。 如果內容文字部分再放大些。會更清晰喔

  • @linlinlau6785
    @linlinlau6785 Місяць тому

    太棒了。 已訂閱追蹤

  • @cristhiamtovar9003
    @cristhiamtovar9003 Місяць тому

    Thanks for this video

  • @BenjaminKing1
    @BenjaminKing1 Місяць тому

    It's nice to see my old co-worker Michelle randomly popping up in a video. I hope you were able to meet her. She is great!

    • @lckgllm
      @lckgllm Місяць тому

      I did meet her and was able to talk to her personally! It was definitely great. Joining her session presented with Amy Hodler will make you realize you don't want to miss another one! 😊

  • @stanTrX
    @stanTrX Місяць тому

    I have tested for few critical document to get some answer using standard RAG and to be honest, didnt enjoy the performance so much.

    • @lckgllm
      @lckgllm Місяць тому

      Thanks for sharing your experience! As you referring to standard RAG (purely vector-based) or graph-based? Purely vector-based RAG is not great, while graph-based RAG could return more reliable results. But I also have to be honest that prototypes are generally cute and are very far from production-ready. That's why we need a lot of testing/evaluations, and I'm currently gearing towards making videos containing production-oriented testing :) Here's a video that I did some testing: ua-cam.com/video/mHREErgLmi0/v-deo.html

  • @vbywrde
    @vbywrde 2 місяці тому

    Yes, to physically present yourself in multiple locations at the same time is quite challenging. My understanding is that it requires you to achieve presence on the fourth dimension. Once there, you can then enter multiple three-dimensional spaces at the same time. I wish I could do that, though I suspect it would be really disorienting at first! Best wishes! Also, I learn something new with every one of your videos! Thank you! I really like your approach!

    • @lckgllm
      @lckgllm 2 місяці тому

      I'm surprised and also thrilled that finally someone takes my not-so-funny joke in the video seriously! 😂 Love you concise and scientific explanation of the multi-dimensional space, which makes me dream more about having that superpower. 😉 Thanks for the encouragement once again. I'm learning a lot from you guys too and have been enjoying the journey with you all!

    • @vbywrde
      @vbywrde 2 місяці тому

      @@lckgllm Oh good. When you make it to the fourth dimension, please give a holler! :) It would be fun to see you in two places at the same time. Three even! XD. In the meantime, please keep us posted as to your coding progress. I find your videos really helpful. Thanks!

  • @darkhydrastar
    @darkhydrastar 2 місяці тому

    You are an excellent presenter. Thank you. We do however need to find you a better background music. It's giving pharmaceutical commercial and the levels are a little too high over your voice. Still great though. You have excellent stage presence and a clear voice.

    • @lckgllm
      @lckgllm 2 місяці тому

      Totally agree with you :) I have since upgraded to epidemic sounds for music and be more mindful that the music volume should not distract the viewer when I'm speaking. I'm trying to learn and become better after every video, so really appreciate seeing feedback like this for improvement!

  • @ramdeeproy6853
    @ramdeeproy6853 2 місяці тому

    Can we implement the azure open ai creds like api key, model name, endpoint, type and version in the ipynb file and run it? Also please mention the dependency libraries of the functions.py file as visual, Node, Edge, Cypher_graph are not getting initialised in VSC while running the file....

  • @cemery50
    @cemery50 2 місяці тому

    I'm interested in creating a Little Logical Model based upon the command structure of an application and then using agents take voice to text and text to cmd. maybe with a coresponding graph view updated with current information avaiable in another window on another display screen.

  • @MrBekimpilo
    @MrBekimpilo 2 місяці тому

    This is very insightful Leann.. cheers from South Africa

    • @lckgllm
      @lckgllm 2 місяці тому

      Thanks for the encouragement! 😊

  • @alexandreturlier5464
    @alexandreturlier5464 3 місяці тому

    Great content! I am a complete beginner: I have a Neo4j db already populated, I want to "only" do the chatbot portion connected with GPT4. Would you mind guiding me on which .py I should use in this usecase? In the meantime, I am getting a "UnboundLocalError: local variable 'nodes' referenced before assignment". Not sure what to do... Thanks!!

  • @krisograbek
    @krisograbek 3 місяці тому

    This is so good, Leann! I'm just jumping into the field of Knowledge Graphs! This will be huge for RAG applications! Why did you stop publishing videos?

    • @lckgllm
      @lckgllm 3 місяці тому

      Hi Kris! Thanks for the encouragement :) I didn't stop posting. Instead, I'm posting videos on another channel I work for (e.g. ua-cam.com/users/shortsoJVRWGfqfjQ) I will try hard to post more videos about knowledge graphs and LLMs.

    • @krisograbek
      @krisograbek 3 місяці тому

      @@lckgllm I didn't know about the other channels. Thanks for letting me know!

  • @kingmouli
    @kingmouli 3 місяці тому

    Thank you for amazing short video, I am eagerly waiting for you to make a video on how to convert csv data into knowledge graphs and answers questions on the csv files

  • @kingmouli
    @kingmouli 3 місяці тому

    thank you for the crisp view on KG+RAG, can we create KG on multiple csv files , currently csv agents were lacking behind to answer questions based on content, they only search for matching column for the question rather content passed.

    • @lckgllm
      @lckgllm 3 місяці тому

      Love your idea! It’s totally possible to create KG based on multiple csv files, but can you say more about what “content” means in your case?

    • @kingmouli
      @kingmouli Місяць тому

      @@lckgllm What I mean by content is csv data here

  • @souzajvp
    @souzajvp 3 місяці тому

    Thanks for the awesome video! I was trying to reproduce your code but got an error because the "text_for_kg()" function was not defined. Any chance you can help me understand where this functions comes from? Great content and great editing! Thank you

    • @ajeeshsunny4592
      @ajeeshsunny4592 3 місяці тому

      Same problem for me. Trying to implement text_for_kg.

    • @lckgllm
      @lckgllm 3 місяці тому

      Hello! Sorry for the late reply, been busy with work. I just realized that text_for_kg() somehow was deleted from the notebook, but it should be the same thing as diffbot_nlp.nlp_request(). I just updated the notebook in the girhub repo. Let me know if it doesn't work. I'll do my best to fix it. Thanks for point this issue out! @souzajvp

  • @mohammedmahinuralam2796
    @mohammedmahinuralam2796 3 місяці тому

    Great! Waiting for more of your videos!

  • @deniowork7084
    @deniowork7084 3 місяці тому

    Love this!

  • @SunMai93
    @SunMai93 3 місяці тому

    Diffbot sets a pretty high bar for entering this project, any thought/plan to utilise open source project instead? Thanks!

    • @lckgllm
      @lckgllm 3 місяці тому

      Yes I have previously used spacy-llm in my last video:ua-cam.com/video/mVNMrgexxoM/v-deo.html However, from the results generated by spacy-llm in my GitHub, you can see that there are still errors in the output, and I need to further pass the results to ChatGPT-4 for refining: github.com/leannchen86/openai-knowledge-graph-streamlit-app/blob/main/openaiKG.ipynb I hope future LLMs (regardless of closed source and open source) will enable us to see the confidence score for the output as I experienced with Diffbot's APIs.

    • @SunMai93
      @SunMai93 3 місяці тому

      thank you@@lckgllm ! I will have a look @ the video and the notebook. Might come back for discussion again. have a good one!

  • @SunMai93
    @SunMai93 3 місяці тому

    useful content, no word wasted!

  • @SunMai93
    @SunMai93 3 місяці тому

    Very nice content! support support 🇹🇼

  • @AdityaSharma24091994
    @AdityaSharma24091994 4 місяці тому

    Can RAGs become efficient enough to do data analysis over text tables and csvs? I'm planning to build one so wanted to know if this is possible.

    • @lckgllm
      @lckgllm 3 місяці тому

      Yeah I think so! That's a great idea for a new video :)

    • @AdityaSharma24091994
      @AdityaSharma24091994 3 місяці тому

      @@lckgllm yes. I would be glad to collaborate on such project.

  • @MehdiAllahyari
    @MehdiAllahyari 4 місяці тому

    Great video! However, I would completely replace DiffBot with an open source solution. There are many NER models, SpanMarkerNER to name one, since most of the entities you showed in the video are Person, Location, and Org, which libraries like SpaCy and setFit are pretty good for them. Using LLM with few shot learning would be another option. Overall, very nice video.

    • @lckgllm
      @lckgllm 3 місяці тому

      Thanks for the feedback! I have previously used spacy-llm in my last video:ua-cam.com/video/mVNMrgexxoM/v-deo.html However, from the results generated by spacy-llm in my GitHub, you can see that there are still errors in the output even if examples are included in the prompts, and I needed to further the pass the results onto ChatGPT-4 for refinement: github.com/leannchen86/openai-knowledge-graph-streamlit-app/blob/main/openaiKG.ipynb I hope future LLMs (regardless of closed source and open source) will enable us to see the confidence score for the output as I experienced with Diffbot's APIs.

    • @MehdiAllahyari
      @MehdiAllahyari 3 місяці тому

      @@lckgllm If you'd like to have confidence score using llms, a simple hack is, add that into the prompt, so llm returns the result with scores. :)

  • @pouet4608
    @pouet4608 4 місяці тому

    It can be done by hand, but automatisation of this human feature is impressing. Good video!

  • @IdeationGeek
    @IdeationGeek 4 місяці тому

    It's cool! Considering the current long context window of LLMs that pass the "Needle In A Haystack" (NIAH) test with flying colors, makes creating Neo4j graphs rather a human domain-learning and verified knowledge collection activity, useful for science and formal domain exploration, rather than ad-hoc knowledge explorations. One thing, Neo4j is not enough to represent complex hierarchical relationships that need hypergraphs. Check Dgraph (open source software).

    • @lckgllm
      @lckgllm 3 місяці тому

      Just checked DGraph out! Didn't know about that before, thanks for sharing the info :)

  • @andydataguy
    @andydataguy 4 місяці тому

    Great video!

  • @CK.23.
    @CK.23. 4 місяці тому

    I just had to be the 1,000th.. Congrats..

    • @lckgllm
      @lckgllm 4 місяці тому

      Thanks! 🥳

  • @goofballbiscuits3647
    @goofballbiscuits3647 4 місяці тому

    Cool! New sub from me 😊

  • @BobaQueenPanda
    @BobaQueenPanda 4 місяці тому

    Content was good but found face filters visually distracting.

    • @lckgllm
      @lckgllm 4 місяці тому

      What face filters? I literally talked in front of my MacBook Pro 14. I did have makeup on which I admit.

    • @Armoredcody
      @Armoredcody 4 місяці тому

      @@lckgllm you're fine do not worry about it. however, the audio sounds like it has a low bitrate.

    • @lckgllm
      @lckgllm 4 місяці тому

      Definitely going get a mic for better voice quality. Thanks for the feedback, folks!@@Armoredcody

  • @vanderstraetenmarc
    @vanderstraetenmarc 4 місяці тому

    I'm a newbie on these matters discussed here, but I really do appreciate the way your MyGraph RAG AI Assistant work, responding with text AND graph. Can you tell me a bit more on how you did accomplish this? (I'm especially interested in the graph that got generated!). Hope that's not a stupid question?

    • @lckgllm
      @lckgllm 4 місяці тому

      Definitely a great question! I didn't include the process in this video and plan to make another video about this, but let me show you the details via email :)

    • @vanderstraetenmarc
      @vanderstraetenmarc 4 місяці тому

      @lm Would be highly appreciated! 🙏 Didn't get it yet though...

    • @justindehorty
      @justindehorty 17 днів тому

      ​@@lckgllm Hi Leann, I had the same question. Isn't this just an implementation of streamlit-agraph? Is there any reason why you left this out of the GitHub repo you shared? It would be incredibly helpful/instructive to see the implementation.

  • @manfyegoh
    @manfyegoh 4 місяці тому

    nice sharing

  • @thomaskaminski2187
    @thomaskaminski2187 4 місяці тому

    KG are key for providing context to RAG. Still, I see the OWL/RDFs path outperforming LPG as it enables the user to explicitly define semantics and infer knowledge

  • @Jeganbaskaran
    @Jeganbaskaran 4 місяці тому

    Really awesome, Thank you for this video

  • @malipetek
    @malipetek 4 місяці тому

    Thanks.

  • @AndrewNeeson-vp7ti
    @AndrewNeeson-vp7ti 4 місяці тому

    6:26 It's not a great answer. 🙁 The graph DB has effectively acted as a bottleneck for the data. I.e. The answer is based purely on nodes + edges. I'd be curious whether the graph DB could essentially act as an index for the original content. I.e. Still use a graph query to return the relevant nodes/edges, but pass the source text corresponding to them as a RAG response.

    • @lckgllm
      @lckgllm 4 місяці тому

      Good question, although I'd defend that the answer at 6:26 is good enough for my use case😂, as my purpose was converting unstructured text data into structured knowledge graphs, which served as the ground truth for LLM to find out the answer. 6:26 exactly showed the context from my knowledge graph. I think what you're asking "whether the graph DB could essentially act as an index for the original content" is a different use case, where the documents themselves are classified as nodes and edges would be appended to the nodes based on their semantic similarities. I'd probably make another video particularly for this use case, which is different from what you saw in this current video.

    • @AndrewNeeson-vp7ti
      @AndrewNeeson-vp7ti 4 місяці тому

      ​@@lckgllm not the entire document, but rather the section of it that corresponds to the creation of that node/edge. E.g. *Graph response:* [Amy] interested_in [science history] *Source text:* "Amy has a love for science history and a fascination for complexity studies" If it was possible to store the source text as an attribute of each relationship and return that rather than the edge names then you'd probably get a higher quality answer.

    • @lckgllm
      @lckgllm 4 місяці тому

      Ohh!! I like this idea, yes it would be more concrete and reliable. Thanks for sharing! Let me try to improve this feature and may make a video about it ;) Really appreciate your feedback, thanks of much ❤@@AndrewNeeson-vp7ti

  • @allthingsai8166
    @allthingsai8166 4 місяці тому

    where do i get the zeroshot.cfg file? and i want to replace the @llm_models = spacy.GPT-4.v2 with mistral. How do i do that?

    • @lckgllm
      @lckgllm 4 місяці тому

      Hi! Thanks for pointing that out. I just added the zeroshot.cfg to github.com/leannchen86/openai-knowledge-graph-streamlit-app You can also find the document here: spacy.io/usage/large-language-models#zero-shot-prompts

  • @SlykeThePhoxenix
    @SlykeThePhoxenix 4 місяці тому

    Have you tried using a local LLM such as Mistral? It'll take a big longer, but it's considerably cheaper.

    • @lckgllm
      @lckgllm 4 місяці тому

      I think I didn't get my point across clearly and sorry about that. In terms of constructing a knowledge graph, from my experience, currently Diffbot's Natural Language API has the best performance regarding Named Entity Recognition (NER) and Relationship Extraction (RE) compared to GPT-4 (so far the best LLM) or spaCy-llm as I tried them both. Frankly speaking, large language models are not inherently optimized for tasks as entity/relationship extractions and we should think again whether LLMs are the best option for every single task.

    • @SlykeThePhoxenix
      @SlykeThePhoxenix 4 місяці тому

      @@lckgllm Everything's a nail if you only wield an LLM =D

    • @lckgllm
      @lckgllm 4 місяці тому

      🤣@@SlykeThePhoxenix

  • @AerialWaviator
    @AerialWaviator 4 місяці тому

    This was an interesting video. I was more focused on the process, and thinking behind using this process to organize and visualize data.

  • @danielneu9136
    @danielneu9136 4 місяці тому

    great video like that but plese stop using Large Language models, for NER there are way better performing and cheaper alternatives :)

    • @ppcalpha1042
      @ppcalpha1042 4 місяці тому

      hey daniel, what are the better performing and cheaper alternatives for NER?

    • @lckgllm
      @lckgllm 4 місяці тому

      Good question! I want to know too :) @danielneu9136

    • @danielneu9136
      @danielneu9136 3 місяці тому

      @@lckgllm Use small open source encoder models for NER e.g fine-tuned versions of ALBERT. Simply import the right model from the huggingface transformers library and let it label your dataset. They often perform better than LLMs on the tasks that they are trained for and most of them even work with colab free tier :)

  • @robertputneydrake
    @robertputneydrake 4 місяці тому

    diffbot is quite expensive. I didn't understand what it's actually doing here. It takes all videos and extracts transcriptions then summarizes then and puts them into a table? I mean I'm thinking of implementing that process rn using yt-dlp and a local LLM.

    • @lckgllm
      @lckgllm 4 місяці тому

      It's true that it's expensive, and they're currently more focused on enterprise clients. The Natural Language API I use is primarily for constructing a knowledge graph in this project, as the knowledge graph serves as the foundation for my Graph RAG AI assistant (where GPT-4 is the LLM used to query the knowledge graph). I tried Diffbot's API because GPT-4 is not yet perfect at extracting entities and relationships, which are crucial for building a knowledge graph, see 5:37. GPT-4's cost can escalate significantly with large document sizes too. While I don't mind paying for accurate results, expensive and erroneous outcomes are not justifiable expenses for me. Diffbot also shows you how the API extract and evaluation information at 2:55. Do you need to build a knowledge graph for your LLM-based app? If not, you might not really need Diffbot's API as I did.

  • @shingyanyuen3420
    @shingyanyuen3420 4 місяці тому

    I don't understand how knowledge graphs are being used in RAG? What's the differences between a KG-RAG and a normal RAG?

    • @lckgllm
      @lckgllm 4 місяці тому

      Good question @shingyanyuen3420 ! Sorry for not making it clear in the video, will improve my explanation next time. The typical RAG applications chunk documents into smaller parts and convert them into embeddings, which are lists of numeric values. LLM then retrieves information based on similarity to the semantic question. However, the information retrieval process can become challenging as document sizes increase, potentially causing the model to lose the overall context.. This is where knowledge graphs can be useful. Knowledge graphs explicitly define the relationships between entities, offering a more straightforward path for the LLM to find the answer while staying context-aware - improving accuracy for the retrieval process. Hopefully this article is helpful: ai.plainenglish.io/knowledge-graphs-achieve-superior-reasoning-versus-vector-search-for-retrieval-augmentation-ec0b37b12c49

  • @sachinreddy2836
    @sachinreddy2836 4 місяці тому

    Ur cute