Hello Nick, I noticed that they use a dataset with question-and-answer pairs for fine-tuning the BART generator model(correct me if I am wrong here). If I only have data with questions and their corresponding passages, how can I fine-tune the generator model to improve answers for my custom data?
Hello Nick! Thanks for the amazing conversation. I am in between the RAG and LFQA techniques and require some clarity on that. Basically what I have found is that both ways work the same way (retrieving the relevant documents and generating answers) except LFQA is able to generate detailed and long responses. My question is that, is it possible to use some way or model that can absorb all of my domain specific data and when I question it, without going to retrieve the answer, it generates the answer from its knowledge? Thanks! Please enlighten me.
Thank you for this great explanation. I wonder if you know if there's any model that uses RAG for language generation rather than QA. (By language generation, I am thinking of GPT3 where you feed one a few words, it could generate a whole passgae)
Hi River, glad you enjoyed the explanation! Augmenting text generation tasks with retrieved documents is becoming more and more common, so there are a good amount of options for any task. "retrieval text generation" is probably the keyword for you to search by. WebGPT3 is a verion of GPT3 that pulls in documents following a web search. It looks like DeepMind's RETRO is another option (though I haven't reviewed it in detail.) "A Survey on Retrieval-Augmented Text Generation" also looks like a great resource. Good luck :)
That's a good question Abhishek, I was curious about the same thing. For example, if you ask "Who is Arya Stark's father?" it answers "Lord Eddard Stark" instead of providing a complete sentence, like "Arya's father is Lord Eddard Stark." I know that RAG was fine-tuned on Google's Natural Questions. You can see examples from that dataset here: ai.google.com/research/NaturalQuestions/visualization The questions all appear to have both short answers and long answers, and I think RAG must have been fine-tuned on the short versions! Thanks, Chris
Hello Nick,
I noticed that they use a dataset with question-and-answer pairs for fine-tuning the BART generator model(correct me if I am wrong here).
If I only have data with questions and their corresponding passages, how can I fine-tune the generator model to improve answers for my custom data?
Hello Nick! Thanks for the amazing conversation. I am in between the RAG and LFQA techniques and require some clarity on that. Basically what I have found is that both ways work the same way (retrieving the relevant documents and generating answers) except LFQA is able to generate detailed and long responses.
My question is that, is it possible to use some way or model that can absorb all of my domain specific data and when I question it, without going to retrieve the answer, it generates the answer from its knowledge?
Thanks! Please enlighten me.
can we do bert as retriver as bloom as generator . RAG model on custom data
Thank you for this great explanation. I wonder if you know if there's any model that uses RAG for language generation rather than QA. (By language generation, I am thinking of GPT3 where you feed one a few words, it could generate a whole passgae)
Hi River, glad you enjoyed the explanation! Augmenting text generation tasks with retrieved documents is becoming more and more common, so there are a good amount of options for any task. "retrieval text generation" is probably the keyword for you to search by. WebGPT3 is a verion of GPT3 that pulls in documents following a web search. It looks like DeepMind's RETRO is another option (though I haven't reviewed it in detail.) "A Survey on Retrieval-Augmented Text Generation" also looks like a great resource. Good luck :)
Thank you Nick and Chris for your nice research on RAG. Would it be interesting also evaluating the new Blenderbot2.0 from fb research?
Apparently Blenderbot2.0 is RAG based too.
Hi Alejandro, thank you! Yes, we are actually going to look at Blenderbot in our new series "Chatbots and Conversational AI".
Really helpful... Thanks for this!
Glad you found it valuable, thanks!
why your model..generate one line answers.....even one word.......its makes sense...........?
That's a good question Abhishek, I was curious about the same thing.
For example, if you ask "Who is Arya Stark's father?" it answers "Lord Eddard Stark" instead of providing a complete sentence, like "Arya's father is Lord Eddard Stark."
I know that RAG was fine-tuned on Google's Natural Questions. You can see examples from that dataset here: ai.google.com/research/NaturalQuestions/visualization
The questions all appear to have both short answers and long answers, and I think RAG must have been fine-tuned on the short versions!
Thanks,
Chris