Mehdi, if you say that LLMs are that excellent at making KGs and you prefer other libraries that are more practical at making KGs, could you say what libraries you mean?
There are many depending on the domain. But here are some of them that tend to work very well for many domains: - github.com/urchade/GLiNER - github.com/universal-ner/universal-ner - github.com/kamalkraj/BERT-NER
Would you recommend using the SLIM local models you introduced earlier in this series for NER, intent classification etc to construct knowledge graphs? Looks like it could be a cost-saver plus it offers structured and consistent inputs for graph construction, although I'm not sure if any of the existing available set of SLIMs is well trained enough for this purpose?
What if we have no information about the entities? Suppose it is for an application that takes documents as input, in that case we have no sure idea about what the entities will be. How will it work then?
Then you can let a general NER model to parse depending on your use case, or if you do know the domain, like if it's a PII data, Finance dataset, etc. you can run it through a pretrained NER model for that particular domain
Thanks for using the the right tools for the purpose. I am looking at a tabular dataset that i want to use as the material for an llm to generate synthetic sample graphs from so instead of extracting it from the Wikipedia page it has to write the page given the base knowledge graph. And I believe an llm is very useful for that.
The challenge for me is that LLMs are not consistent within or between documents. In the example, you see "us" and "u.s.". I'm also concerned that Fiat is an Organization but Chrysler is a Company. And in the LLM example of triples, many of the objects are just, well, sentence fragments. The killer feature of KGs is that you can make connections...but the overspecificity would seem to prevent this. For example, I cannot connect Tom Hanks to any other "fourth highest grossing actor"...he's the only one! There seems to be no good way to create a prompt where the LLM generates entities and relationships at a consistent and appropriate level of hyper/hypo-nymy. This is perhaps not surprising given that LLMs don't think, reason, whatever. And therein lies the trap in getting LLMs to lift themselves up by their own bootstraps.
That's exactly my point in the video too. Many people are hyped/over excited to use LLM for extracting name entities and relations especially when you don't define your schema at first. However, there is no guarantee that you get consistent results. Plus the cost if prohibitive!
I think adding an extra layer of adding metadata (e.g. Parent documents) could solve this issue e.g.: - you can have an LLM embedding with semantic ability to go over each chunk and add metadata related to that chunk so that the LLM can understand the context of each word e.g. "u.s" -> "united states, country".
Thanks for the informative video. Other than LLMS , could you suggest some approach or models to try for relationship extraction?
There are traditional ner methods. We will share more in new videos!
Thanks for your video, I am also looking forward to new video of relationships extraction by traditional ways!
Thanks! Will you publish the code or github?
Here's the code: github.com/mallahyari/twosetai
@@MehdiAllahyarithanx!!!
Yes.
Mehdi, if you say that LLMs are that excellent at making KGs and you prefer other libraries that are more practical at making KGs, could you say what libraries you mean?
Yes we will share more
There are many depending on the domain. But here are some of them that tend to work very well for many domains:
- github.com/urchade/GLiNER
- github.com/universal-ner/universal-ner
- github.com/kamalkraj/BERT-NER
@@MehdiAllahyari thanx, it looks awesome. I will test it for sure…
what was wrong with the previous video? As always, thank you!
Because the subtitles were distracting, we had to re-upload a new one. Unfortunately, the comments of last video cannot be not displayed for this one!
i removed the subtitle. hopefully this is easier to watch! thanks!
Can you please share the link of the notebook that you went through in the video
Sure. Here's the code: github.com/mallahyari/twosetai/blob/main/02_kg_construction.ipynb
Very interesting review. Any chances you share the code to try it myself?
Thanks in advance.
BTW I'm reading your RAG book.
Awesome! Here's the code: github.com/mallahyari/twosetai/blob/main/02_kg_construction.ipynb
Would you recommend using the SLIM local models you introduced earlier in this series for NER, intent classification etc to construct knowledge graphs? Looks like it could be a cost-saver plus it offers structured and consistent inputs for graph construction, although I'm not sure if any of the existing available set of SLIMs is well trained enough for this purpose?
@@myfolder4561 That potentially is a good idea. We haven’t tried it ourselves. Let us know if you try this approach!
@@myfolder4561 it’s indeed possible you will need to train your own SLIM model for this.
What if we have no information about the entities? Suppose it is for an application that takes documents as input, in that case we have no sure idea about what the entities will be. How will it work then?
Then you can let a general NER model to parse depending on your use case, or if you do know the domain, like if it's a PII data, Finance dataset, etc. you can run it through a pretrained NER model for that particular domain
@@karthickdurai2157The type of application I'm developing, is intended to work on all types of documents irrespective of the domain.
How to use documents which have images like some product manual pdf files. How can we use Grpahrag for this problem?
Thanks for using the the right tools for the purpose. I am looking at a tabular dataset that i want to use as the material for an llm to generate synthetic sample graphs from so instead of extracting it from the Wikipedia page it has to write the page given the base knowledge graph. And I believe an llm is very useful for that.
Yes for your use case llm is actually the best tool as you want to convert structured data into natural language form.
The challenge for me is that LLMs are not consistent within or between documents. In the example, you see "us" and "u.s.". I'm also concerned that Fiat is an Organization but Chrysler is a Company. And in the LLM example of triples, many of the objects are just, well, sentence fragments. The killer feature of KGs is that you can make connections...but the overspecificity would seem to prevent this. For example, I cannot connect Tom Hanks to any other "fourth highest grossing actor"...he's the only one! There seems to be no good way to create a prompt where the LLM generates entities and relationships at a consistent and appropriate level of hyper/hypo-nymy. This is perhaps not surprising given that LLMs don't think, reason, whatever. And therein lies the trap in getting LLMs to lift themselves up by their own bootstraps.
That's exactly my point in the video too. Many people are hyped/over excited to use LLM for extracting name entities and relations especially when you don't define your schema at first. However, there is no guarantee that you get consistent results. Plus the cost if prohibitive!
I think adding an extra layer of adding metadata (e.g. Parent documents) could solve this issue e.g.: - you can have an LLM embedding with semantic ability to go over each chunk and add metadata related to that chunk so that the LLM can understand the context of each word e.g. "u.s" -> "united states, country".
spacy-llm can help you do few shots NER and the performs is almost 99% of traditional approach
I think spacy-llm also uses a LLM behind the scenes, so it may not be as fast as this