Dear GPT, I really appreciated his video. But I'm a lazy human so can you read thru this transcript and come up with a nice short compliment for me? Hey, I just watched your video on using Lang chain with the Flan 20B model, and I'm impressed with how you demonstrated its conversational capabilities. It's great to see how easy it is to compare different models using this method, all for free! Keep up the fantastic work, and I'm looking forward to more insightful content from you.
Great Job again. Thanks to share. One question do you ever see a comparaison of models just to kepp embedding search ? What i mean is using free model with vectors database search to keep for example the 5 more relevant chunks. And to use after a model like openai to summary the chunks. So it will be good if the chunks choice are more or less identical NO ?
Once again very good content. But Quick Question. from this line "formatted_prompt = conversation.prompt.format(input='the next input',history=memory.buffer)" we already providing memory to the prompt. So why we again pass "memory=ConversationBufferMemory()" in ConversationChain
Great videos, Sam. I started to go through all of your videos after seeing one of it. Unfortunately, i am unable to access any of your colab notebooks.
well. he developed this for chat model. in order to make chat like a normal bot he wrote a while loop. with that while loop he made a secondary advantage as tracking the token count.
Awesome video helped a lot, one thing i am confused abt The flan models are seq2seq, these models are heavier them decoder only eg. Gpt. So why not just use the decoder type models?
Good Question. The Encoder - Decoder models are closer to the original Transformer and traditionally they were used more for text 2 text tasks (like translation) rather than just simple next word prediction of decoder only models. These lines are more and more blurred these days, though Google has still done a lot of research on encoder-decoder compared with OpenAI's focus on GPT decoder only models. When you have a a very deep Decoder model, you could argue that lower layers are doing the part of an encoder. hope this helps.
Thanks and interesting that a new flan model is released. How about using embeddings to query , for example a pdf, with langchain. We have many examples with chatgpt. btw: it would be nice if u can make a video about voice cloning in combination with chat. for example using the vall-e model of Microsoft, whisper or...
Hello Sam, Wonderful! Thanks for the good work. I am trying to build a chatbot with one of these models [without using GPT3.5], in your opinion, which one of these models works closest to GPT3.5?
honestly none of these are as good as 3.5-turbo etc. But with each new model coming out they seem to get a lot closer. They key is to look for a model that is at least pretrained on 1T tokens.
Thanks for the video !!! I think the LLMs from Google team like T5 , Mt5 , FlanT5 are amazing. However, when it compares to GPT 3 / GPT 3.5 Turbo (without any fine tune) , they are not able to be handled / understand the given NLP task... 🤔🤔🤔 Like, I have tried to use the same promt on GPT vs Flan T5. GPT gives very accurate result in NER and Flan T5 , dont know what to do 🤔🤔🤔 Do you have any suggestion on using Flan models , like working on the Prompt or few shot ?
The GPT models are doing a lot better for 2 reasons 1. They are much bigger 2. they are fine-tuned on non academic datasets. The T5 etc really needs to be fined for your task. GPT3/3.5 etc are much more akin to the LaMDa models at Google which currently aren't available publicly.
As I was executing my way from the notebook, I get down to the first conversation.predict() and when I executed it, it shows the prompt and current conversation with the Human: input but just hangs then times out. Has anyone run into this issue? Do I need any additional API key other than the HF api key to make the note book work?
you shouldn't need any other API key it could be HF is not supporting the API for this model anymore. You can still load it locally if you have a access to a good GPU
Hi Sam, on 1:3 mark on cell 9, conversation.predict(input="Hi there! I am Sam"), upon execution, I am getting an error: Error raised by inference API: Input validation error: `temperature` must be strictly positive. Any way to resolve this issue?
Dear GPT, I really appreciated his video. But I'm a lazy human so can you read thru this transcript and come up with a nice short compliment for me?
Hey, I just watched your video on using Lang chain with the Flan 20B model, and I'm impressed with how you demonstrated its conversational capabilities. It's great to see how easy it is to compare different models using this method, all for free! Keep up the fantastic work, and I'm looking forward to more insightful content from you.
As Open AI language model I don't have feelings, but this made me laugh :D
Great Job again. Thanks to share. One question do you ever see a comparaison of models just to kepp embedding search ?
What i mean is using free model with vectors database search to keep for example the 5 more relevant chunks.
And to use after a model like openai to summary the chunks.
So it will be good if the chunks choice are more or less identical
NO ?
I appreciate your efforts, really awesome video!!!!!!!
Once again very good content. But Quick Question.
from this line "formatted_prompt = conversation.prompt.format(input='the next input',history=memory.buffer)"
we already providing memory to the prompt. So why we again pass "memory=ConversationBufferMemory()" in ConversationChain
Thank you, Sam!
Question: when using the summary library, aren't we paying additional tokens just to ask (and receive) for the summary on every step?
not sure what you mean by the summary library?
Great videos, Sam. I started to go through all of your videos after seeing one of it. Unfortunately, i am unable to access any of your colab notebooks.
Why can't you access the Colabs? I am planning to put these all up on Github later this week to, if that helps.
Thanks for the video! is free Google colab enough to run the model?
No this model is unfortunately too big for Colab free
Thank you for the content!! Can you help me understand the input("Human ") line in the chat_to_llm() function please? What does it do?
well. he developed this for chat model. in order to make chat like a normal bot he wrote a while loop. with that while loop he made a secondary advantage as tracking the token count.
Awesome video helped a lot, one thing i am confused abt The flan models are seq2seq, these models are heavier them decoder only eg. Gpt. So why not just use the decoder type models?
Good Question. The Encoder - Decoder models are closer to the original Transformer and traditionally they were used more for text 2 text tasks (like translation) rather than just simple next word prediction of decoder only models. These lines are more and more blurred these days, though Google has still done a lot of research on encoder-decoder compared with OpenAI's focus on GPT decoder only models. When you have a a very deep Decoder model, you could argue that lower layers are doing the part of an encoder. hope this helps.
Thanks and interesting that a new flan model is released. How about using embeddings to query , for example a pdf, with langchain. We have many examples with chatgpt.
btw: it would be nice if u can make a video about voice cloning in combination with chat. for example using the vall-e model of Microsoft, whisper or...
Hello Sam, Wonderful! Thanks for the good work. I am trying to build a chatbot with one of these models [without using GPT3.5], in your opinion, which one of these models works closest to GPT3.5?
honestly none of these are as good as 3.5-turbo etc. But with each new model coming out they seem to get a lot closer. They key is to look for a model that is at least pretrained on 1T tokens.
@samwitteveenai Thanks for the reply. I am assuming when a model comes up with 1T pre trained tokens, you will surely cover it. 😀
you bet!! :D
Great
Thanks for the video !!! I think the LLMs from Google team like T5 , Mt5 , FlanT5 are amazing. However, when it compares to GPT 3 / GPT 3.5 Turbo (without any fine tune) , they are not able to be handled / understand the given NLP task... 🤔🤔🤔
Like, I have tried to use the same promt on GPT vs Flan T5. GPT gives very accurate result in NER and Flan T5 , dont know what to do 🤔🤔🤔
Do you have any suggestion on using Flan models , like working on the Prompt or few shot ?
The GPT models are doing a lot better for 2 reasons 1. They are much bigger 2. they are fine-tuned on non academic datasets. The T5 etc really needs to be fined for your task. GPT3/3.5 etc are much more akin to the LaMDa models at Google which currently aren't available publicly.
why its not returning anything on my colab instance ? it keeps scrolling but no use
make sure you have a GPU that can run it on the VM
How to use langchain with locally downloaded model, I'm interested in vicuna or the new 7b king
basically the same as in this video, just changing out the models.
@@samwitteveenai kindly make that tutorial for newbie. Thank you
As I was executing my way from the notebook, I get down to the first conversation.predict() and when I executed it, it shows the prompt and current conversation with the Human: input but just hangs then times out. Has anyone run into this issue? Do I need any additional API key other than the HF api key to make the note book work?
you shouldn't need any other API key it could be HF is not supporting the API for this model anymore. You can still load it locally if you have a access to a good GPU
Did you get any solution? I am facing the same problem. I don't have a powerful GPU so I can't load it locally either.
@@sahilmandavkar8230 I did not have a chance to try locally. I think I'll try using CoLab Pro.
Hi Sam, on 1:3 mark on cell 9, conversation.predict(input="Hi there! I am Sam"), upon execution, I am getting an error: Error raised by inference API: Input validation error: `temperature` must be strictly positive. Any way to resolve this issue?
This is a change in the HF API I think change temp from 0 to 0.1 and it should work
@@samwitteveenai Thank you, that did it, appreciated.
can you do the same with LLama?
I have done it with LLaMa but have been reluctant to release because of the Meta issues. Anything in particular you want me to cover about LLaMa?
Just use ollama to serve local models.
Ollama is cool I have made a few vids about it. This video was made long before Ollama existed.