I accidentally asked Claude AI through the MCP function to whip up a script for data chunking. Told it to extract specific data and format it into a certain output. Next thing I know, Claude goes ahead and writes a script that pulls metadata first and sends it off to a vector database. A few hours later, your video pops up on my homepage
Awesome video! Thank you for sharing your techniques. Extracting use case-specific info and store them in metadata before indexing is a very interesting approach. This might actually be better than regular contextualization of chunks, where you add the info in the content of your chunk instead of the metadata. Will definitely try that out. Thanks! Would love to see you talk about agents frameworks in the future! Especially how you could try to make something as good as the Composer Agent from Cursor.
Thanks! Yeah, this kind of metadata-enriching with LLMs can definitely be applied in all kinds of ways. Very versatile. Might make a video about LangGraph at some point, it's honestly the only agent framework I've found that could be useful - I tend to look for deterministic workflows as much as possible. Don't know about making something that could touch Composer though, the engineering team at Cursor is pretty nuts😄
Interesting topic of cause. I would start using LLM for the UX part, NLP etc and then generate SQL content against a database. The content is structured anyways. The result could then polished with a fine tuned model. Complete or partial results could be cached too since we are inside of a specific domain. Outliars could be caught and managed. This would be the benchmark to beat in that case. Running cost included...
As far as I know, google hasn't talked about their exact retrieval methods for NoteboookLM anywhere, so it's hard to say. I mean, that's usually the case for any off-the-shelf RAG-apps, they want to keep their secret sauce. NbLM does cite the retrieved sources though, so in theory you could manually run a test set of questions, and see how well it fetches the correct documents for each. Of course quite tedious to do in practice, as you can't automate any of it, and need to first manually upload all the documents (if I wanted to benchmark for this scenario for example, that would mean over 15 000 documents) and then run the tests.
That's really interesting. So, the LLM got rid of the conjugation problem by mapping all the forms to specific services? Did you also run any tests to find out how large the LLM needs to be for that functionality?
Yep, that's correct. Both in extracting the services, and in structuring the filters, the LLM is instructed to return the un-conjugated / nominative form of the services and cities. So we got 2 birds with 1 stone, both getting the filtering as well as eliminating the conjugation-issues (: We tested a couple of OpenAI models of varying sizes. They all did pretty good, but the smaller ones occasionally missed some services in the extraction. So we ended up going with a larger model (4o), which performed very well. But for the query rewriting/structuring, I'm pretty sure we concluded that 4o-mini was good enough for that. So as far as conjugation goes, I think smaller models should be able to do it. It was more the service extraction where we saw issues. This was of course specific to Finnish, so your mileage may vary (:
A early adopter i was expecting to just drop the documents and get the things sorted by LLM. Now after 2 weeks of reasearching populating 3 testing accounts of pinecone with test data, that rather looks more like a filter search in airtable. That's just the complete oposite of the "promisse", and an absolute disapointment.
@@mtprovasti Ah, right. BERT is an encoder, meaning it would be used in creating the embeddings for vector search. Here we used OpenAI's ada-02 -encoder for the same purpose. Until we gave up on vector search, that is. BERT is a popular choice when you want a fine-tuned embedding-model though, to better capture the semantic similarities/dissimilarities in your specific content (and thus get better retrieval results)
@@johannesjolkkonenfor versatility, specifically being able to get a response on global context. if you don't plan to get sophisticated responses then it'd be wasteful due to being more comp exp. so really depends on use case.
Disagree about agentic RAG. It's becoming a common feature. It's not just some grad paper. I don't know why you would say this after presenting a use case.
Sure, not saying it doesn't have its place. Just that in my experience, people are too quick to jump on flashy solutions instead of simpler ones that get the job done fine, and in a more robust way. Could you achieve similar results with an agentic approach? Maybe, but they typically come with serious trade-offs in latency, cost and unpredictability. Appreciate the comment though. One reason why I often speak against agents is also just how poorly the term is defined and over-used. Fine for marketing, but imo not useful when it comes to actually understanding how all this stuff works.
Thank you
Nice solution. Thanks for sharing.
A tutorial for this real-world use case is absolutely necessary. It’s highly relevant and applicable to many real-world problems.
A very good video showing that following the main trends isn't always profitable. Thanks.
I accidentally asked Claude AI through the MCP function to whip up a script for data chunking. Told it to extract specific data and format it into a certain output. Next thing I know, Claude goes ahead and writes a script that pulls metadata first and sends it off to a vector database. A few hours later, your video pops up on my homepage
Awesome video! Thank you for sharing your techniques. Extracting use case-specific info and store them in metadata before indexing is a very interesting approach. This might actually be better than regular contextualization of chunks, where you add the info in the content of your chunk instead of the metadata. Will definitely try that out. Thanks!
Would love to see you talk about agents frameworks in the future! Especially how you could try to make something as good as the Composer Agent from Cursor.
Thanks! Yeah, this kind of metadata-enriching with LLMs can definitely be applied in all kinds of ways. Very versatile.
Might make a video about LangGraph at some point, it's honestly the only agent framework I've found that could be useful - I tend to look for deterministic workflows as much as possible. Don't know about making something that could touch Composer though, the engineering team at Cursor is pretty nuts😄
Excellent findings. Keep continue good work!
Thank you (:
Interesting topic of cause. I would start using LLM for the UX part, NLP etc and then generate SQL content against a database. The content is structured anyways. The result could then polished with a fine tuned model. Complete or partial results could be cached too since we are inside of a specific domain. Outliars could be caught and managed. This would be the benchmark to beat in that case. Running cost included...
I have found NotebookLM’s retrieval to be pretty accurate. How would you benchmark your method against it? And how is NotebookLM’s method different?
As far as I know, google hasn't talked about their exact retrieval methods for NoteboookLM anywhere, so it's hard to say. I mean, that's usually the case for any off-the-shelf RAG-apps, they want to keep their secret sauce.
NbLM does cite the retrieved sources though, so in theory you could manually run a test set of questions, and see how well it fetches the correct documents for each.
Of course quite tedious to do in practice, as you can't automate any of it, and need to first manually upload all the documents (if I wanted to benchmark for this scenario for example, that would mean over 15 000 documents) and then run the tests.
That's really interesting. So, the LLM got rid of the conjugation problem by mapping all the forms to specific services? Did you also run any tests to find out how large the LLM needs to be for that functionality?
Yep, that's correct. Both in extracting the services, and in structuring the filters, the LLM is instructed to return the un-conjugated / nominative form of the services and cities. So we got 2 birds with 1 stone, both getting the filtering as well as eliminating the conjugation-issues (:
We tested a couple of OpenAI models of varying sizes. They all did pretty good, but the smaller ones occasionally missed some services in the extraction. So we ended up going with a larger model (4o), which performed very well.
But for the query rewriting/structuring, I'm pretty sure we concluded that 4o-mini was good enough for that. So as far as conjugation goes, I think smaller models should be able to do it. It was more the service extraction where we saw issues. This was of course specific to Finnish, so your mileage may vary (:
thanks:)
Can you share any notebook?
A early adopter i was expecting to just drop the documents and get the things sorted by LLM.
Now after 2 weeks of reasearching populating 3 testing accounts of pinecone with test data, that rather looks more like a filter search in airtable.
That's just the complete oposite of the "promisse", and an absolute disapointment.
I second your approach, a bit strong on the agentics orchestration, that might not fit your use case, but still have plenty other good happy ending ;)
Yeah, definitely. There's just quite a lot of over-enthusiasm about taking an agentic approach wherever possible, so I want to push back on that (:
After a day, why only 22 likes???
Is the LLM bert?
Nah, GPT4o and -mini
@johannesjolkkonen trying to figure out now that modern bert is out at what stage of rag is it applied.
@@mtprovasti Ah, right. BERT is an encoder, meaning it would be used in creating the embeddings for vector search. Here we used OpenAI's ada-02 -encoder for the same purpose. Until we gave up on vector search, that is.
BERT is a popular choice when you want a fine-tuned embedding-model though, to better capture the semantic similarities/dissimilarities in your specific content (and thus get better retrieval results)
why not just kg w triples?
Not sure what the benefit would be. What do you think?
@@johannesjolkkonenfor versatility, specifically being able to get a response on global context. if you don't plan to get sophisticated responses then it'd be wasteful due to being more comp exp. so really depends on use case.
Disagree about agentic RAG. It's becoming a common feature. It's not just some grad paper. I don't know why you would say this after presenting a use case.
Sure, not saying it doesn't have its place. Just that in my experience, people are too quick to jump on flashy solutions instead of simpler ones that get the job done fine, and in a more robust way.
Could you achieve similar results with an agentic approach? Maybe, but they typically come with serious trade-offs in latency, cost and unpredictability.
Appreciate the comment though. One reason why I often speak against agents is also just how poorly the term is defined and over-used. Fine for marketing, but imo not useful when it comes to actually understanding how all this stuff works.