FUSION: Knowledge GRAPHS are more than TOOLS for LLM
Вставка
- Опубліковано 8 лют 2025
- Three new AI research papers (published today) suddenly unlock three different perspectives on the interplay between LLMs and Knowledge Graphs. All three new AI paper focus on the dense re-coding of KG on LLMs and vice versa. KG are not any lore just tools for a function calling LLM: they define and fine-tune LLMs in new ways. Improved LLM reasoning by intelligently reconfigured Knowledge Graphs.
All rights w/ authors:
How to Mitigate Information Loss in Knowledge Graphs for GraphRAG:
Leveraging Triple Context Restoration and Query-Driven Feedback
by Manzong Huang, Chenyang Bu, Yi He , Xindong Wu
from Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, China and School of Data Science, William & Mary, Williamsburg, VA, USA.
Diffusion-based Hierarchical Negative Sampling for Multimodal Knowledge Graph Completion
by Guanglin Niu and Xiaowei Zhang
from School of Artificial Intelligence, Beihang University, Beijing, China
and College of Computer Science and Technology, Qingdao University, Qingdao, China
Harnessing Diverse Perspectives: A Multi-Agent Framework for Enhanced Error Detection in Knowledge Graphs
Yu Li, Yi Huang, Guilin Qi, Junlan Feng, Nan Hu, Songlin Zhai,
Haohan Xue, Yongrui Chen, Ruoyan Shen, and Tongtong Wu
from Southeast University, Nanjing, China
and China Mobile Research Institute, China
and Monash University, Australia
#airesearch
#coding
#aiagents
#education
How interesting. About 3 months ago you did a video, Harvard Presents NEW Knowledge-Graph AGENT (MedAI), and I built a "demo" of the system and mapped it out into a MAS. I found it interesting and fun but needed a better use case for it, And then we had a few days ago a video, ULTIMATE Fact Checking AI, that I again went ahead and started building and I integrated my work from the MedAI video into it. And then today you drop this gem of a video. Wow. Doing these little homework assignments are building up into quite an interesting project. A little complex to say the least but fun. A good learning lesson for sure. Thanks for the video!
I am making a long-form document DnDScore-driven system that can produce factually accurate medical summaries. Sticking with health related fields.
❤
your videos always so densely packed with mind-blowing ideas - I keep having to rewind because I start thinking of all the cool stuff I could build with that knowledge. Let's build!
Thank you for presenting this incredible work! Taking in these papers, some of your recent videos, and the precipitous decline in cost of reasoning LLMs... freely available and open source... wow.
Is there a repo available of your combined and/ or refined work? This is the exact path i and some others have been on. It's sad this is so far beyond so many orgs and groups or teams comprehending and implementing at this point. Thank you gor all your efforts!🤝 ~ Nate
37:10: But if we have an integrated KG, we can see where the errors are propagated from.
Ideally an LLM would translate everything into a KG or KG node with structured data, and operate upon that internally, and only if required, output text.
Like predicting nodes and links instead of tokens.
Amazing video, thank you!
Great topic, thanks 👍
Thank you!
Gracias 🫂
If KGs are so effective, why does so much of the video consist of 2d walls of text? Using a text window to discuss knowledge graphs feels like a flatlander trying comprehend the construction of a tesseract.
If you treat the original corpus upon which the model is trained the output of the Kolmogorov complexity program, then what is the prompt?
I put it to you that the prompt is the most recent observation by the purely scientific agent that produces the smallest executable archive of all prior observations.
That being the case we now have a formal definition of what an optimal embedding is. Do you see it?
So now let's consider what an optimal response would be.
The continuation of output of the ideal program beyond all of its prior observations is the ideal response.
Now consider what facts are.
Where do you get facts if not from the original observations?
But here is the twist.
If you have observed liars saying things you must infer the fact that they are liars. This is in fact what epistemological forensics amounts to. Indeed it is the ideal form of science.
So whenever you present a prompt to this ideal model you will be generating yet another ideal response which if you're wise will cache as part of an expanded set of observations so that you don't need to recompute it. These observations will be structured and a graph that you can call a knowledge graph especially if it contains provenance of knowledge including who the liars are.
Need I go on?
Yes, please explain how to implement.
If your minimal program accounts for liar detection and knowledge provenance, is it still the simplest? Or have we ironically inflated complexity by embedding trust labels? Do we have a singularity somewhere?
@@Pure_Science_and_Technology Think about a faulty thermometer as a "liar" and what to do with the data it has reported which ends up in your corpus. Its "lies" are likely to manifest as a bias in the numbers it reports that end up in your corpus. So long as the temperatures it reports can, in some sense, be imputed to be of the same origin (or similar origin if a manufacturer of thermometers is the culprit, resulting in, say many experiments exhibiting similar biased temperature reports), then the cross-checking with other temperature reports -- say that water boils at 100C and freezes at 0C at sea level -- whereas there are all these experiments that report boililng at 101C and freezing at -1C at sea level -- the way to most compactly store the experimental data from the biased thermomenter(s) is to unbias them with an unbiasing formula so they align with all other observations, hence don't require rewriting the physics books you also have stored in your corpus, and merely providing a list of the experiments that used the lying thermometer, along with the biasing formula so the original data can be recreated upon execution of the KC program.
In other words, your smallest algorithm to generate the original corpus must contain within it ways of better aligning biased data so it is more easily compressed, and this entails discovering both the identity of the origin of the bias and the quantification of the bias.
I'm just a user and a content creator.
Everything from d&d adventures to graphic art, to music.
My method has always been a world building one. Seems like it would fit a knowledge graph excellently.
The problem is I have almost no idea what I'm doing.
I use Pinocchio to get open web UI to work. And I have figured out how to install models. That's about it.
I have played around with docker a little bit, I don't like docker. But if it's necessary it's necessary.
Can anybody give me the minimum viable software list to get a knowledge graph working?
Like I'm not sure if I need vLLM or not, and I'm not exactly sure what it is, I think it's a server of some kind.
I have some run pod credits if that makes things easier, I was hoping I could figure out a way to make open web UI work with a serverless endpoint. But I hit a wall, well I hit several walls.
Can anyone help a luser out? Mahalo. 🤙
Replying to myself.
I would actually pay for this. Just not very much. But I bet a whole bunch of like-minded people would also pay not very much.
Just saying. Some of us need lots of hand holding.
@@jtjames79 _ Have you tried Infranodus?
a knowledge graph can be nothing more than just a simple textfile. In that textfile you have your triplets (for instance in the form of turtle statements, which i prefer). The question is where can you get a knowledge graph that suits your needs?
@ Can chatgpt automatically extract turtle statements and then assign them to nodes and edges?
@@christopherd.winnan8701 This channel has a ton of videos of approaches regarding this use-case. But yeah you can simply experiment yourself with simple stuff such as: "From the sentence: "A ist the mother of B", generate a knowledge graph as turtle statements, also provide the schema of your concepts and properties using rdf, rdfs and owl"
first!
First? First