Thanks. For me the model is almost the least interesting bit in this, also nice to see a proper paper with details for a model/project not just a blog post or "tech report".
I tend to test these by asking about an earthquake that happened here in 1931, this gave a pretty good answer. Some have tended to hallucinate an answer and go off on a tangent, one said that over 25 people died in the 'quake, and seeing it was 256 people I guess it was technically correct, but this gave an explanation and talked a little about the rebuild. For such a "small" model, it's done pretty well on everything I asked it.
Love your videos, You are now my go to guy for videos on advances in the field. I'd suggest that maybe for your tests, you should add a couple of examples of it following instructions. Can it sort words alphabetically? Can it write a python program that determines prime numbers from 1-1M. That sort of thing. Particularly something like wizard that's suppose to be trained on instruction that would be very interesting.
I really like the idea of having a good standard test set for benchmarking all these models that is not just academic etc. I know there are academic benchmarks but it would be good to have something that was generated by people for real world bench marking. I am thinking of setting up a system for people here to be able to go and contribute prompts of different types which could be used for bench marking. I would love to hear you thoughts on this.
@@samwitteveenai Consistency would help. I approach my tests with defaults, a certain number of tokens (in regard to an average human conversation), and the same seed.
Another great video I am working through you back catalog its taking me some time :-) I am trying to build an agent for myself that can retain my conversations and build up a memory. I think i need a vector store for this and not just a promo based memory as that so limited. Would love to see you do some more on the vector database sides of things...
I see, we are asking the LLM to come up with a more complicated instruction. that's really interesting, taking that angle to 'evolve' from simple instructions.
Ive been doing something similar for over a month. This approach is refined though. We're almost at a inflection point, where future models can be trained on dataset with a majority of the data being synthetic. Almost there. I think in near future. LLM will start leaning heavily on synthetic data since everyone is wilding over data privacy. This approach is especially useful when you need flexible datasets on longer sequences. To finetune models for longer context windows.
Given it's academic tone, I'm really curious as to how well it'll work with LangChain agents. It feels like it may not be overly prone to hallucinate answers and might be more open to pulling in data from other sources.
I haven't seen any of these models handle the ReACT stuff well yet. For general chat a lot of it comes down to preference. It seems most people like the Vicuna models for the chat stuff but they can't do the ReACT/PAL etc from my tests. I am curious do you have a specific use case for the LangChain+ Open Model scenario?
@@samwitteveenai The problem with using LangChain with OpenAI is you can't really use it for most company data. You'd be bleeding out private customer info or violating regulations, like HIPAA. But a properly licensed RedPajamas/StableLM on a self hosted encrypted server could be used to parse Jira tickets, browse SQL, hit local APIs, or work on internal code "paired programming" without issue so long as the LLM access itself was restricted. Like for example, being able to use agents to generate reports customers want from SQL sources. Stuff like that is a big time sink. It wouldn't replace a manager needing to compile and send the report(you can't give the customer access to the LLM), but that manager having access to an AI agent that can parse/assemble the data from SQL and create a draft report on that data would be a high demand use case.
Amazing! Loved this paper and see two really good insights from it….one is on the prompt engineering side in just creating this data set and perhaps even elevating the complexity of your model with this type of iterative complexity prompt engineering and the other the obvious data set itself and the fine-tuning that will come along with it….so good Sam and Tku once again!🥳🦾😎
thanks a lot for your informative video :) may I ask how can I train this model for another language? just training it with a dataset in another language would be enough? or are there any other steps I have to do?
this depends on the language. If it is a language that uses normal roman characters you should be able to get some decent results. If it is using totally different characters then the original LLaMa model doesn't have a great tokenizer and it hasn't seen much of that language in pre-training
@@samwitteveenai Thank you so much for your response. So, if I understand correctly, the original LLaMa model may not perform well for languages that do not use normal Roman characters. In that case, what steps should I take to train the model for example for the Korean language?
@@SoroorMalekmohamadi So I am not sure about Korean but for example the tokenization for Thai is not ideal and basically would turn the model into a character level model.
Hm, is there any reason a person couldn't use the WizardLM strategy with RLHF? Like, could one follow the AI's progress with some of the earlier challenges, comment / rate the AI's improvement, and then "check in" throughout its process coming to grips with the various WizardLM problems? It seems like it would be a fairly efficient use of the strategy to produce a semi-supervised model with lots of experience problem solving.
Ironically the model is IDENTICALLY TO GPT-J BASE model. It's garbage beyond basic conversation. It's 7B. Models tend to hallucinate under ~12B. Vicuna released v1.1 q5_0 and q5_1 both in censored and uncensored models at 13B and are probably the most coherent model available. I haven't had a single hallucination 🤫 adding LORAs also helps if looking for specific output styles or context.
I love that you focused on the paper and the methodology rather than the model.
Thanks. For me the model is almost the least interesting bit in this, also nice to see a proper paper with details for a model/project not just a blog post or "tech report".
I tend to test these by asking about an earthquake that happened here in 1931, this gave a pretty good answer. Some have tended to hallucinate an answer and go off on a tangent, one said that over 25 people died in the 'quake, and seeing it was 256 people I guess it was technically correct, but this gave an explanation and talked a little about the rebuild. For such a "small" model, it's done pretty well on everything I asked it.
Love your videos, You are now my go to guy for videos on advances in the field. I'd suggest that maybe for your tests, you should add a couple of examples of it following instructions. Can it sort words alphabetically? Can it write a python program that determines prime numbers from 1-1M. That sort of thing. Particularly something like wizard that's suppose to be trained on instruction that would be very interesting.
I really like the idea of having a good standard test set for benchmarking all these models that is not just academic etc. I know there are academic benchmarks but it would be good to have something that was generated by people for real world bench marking. I am thinking of setting up a system for people here to be able to go and contribute prompts of different types which could be used for bench marking. I would love to hear you thoughts on this.
@@samwitteveenai Consistency would help. I approach my tests with defaults, a certain number of tokens (in regard to an average human conversation), and the same seed.
Great work, really good channel for being upto date in the field. Thanks 😁
Another great video I am working through you back catalog its taking me some time :-) I am trying to build an agent for myself that can retain my conversations and build up a memory. I think i need a vector store for this and not just a promo based memory as that so limited. Would love to see you do some more on the vector database sides of things...
I see, we are asking the LLM to come up with a more complicated instruction. that's really interesting, taking that angle to 'evolve' from simple instructions.
Hi Sam W.!
As always fantastic tutorial
Thank you very much for this
Ive been doing something similar for over a month. This approach is refined though. We're almost at a inflection point, where future models can be trained on dataset with a majority of the data being synthetic. Almost there. I think in near future. LLM will start leaning heavily on synthetic data since everyone is wilding over data privacy. This approach is especially useful when you need flexible datasets on longer sequences. To finetune models for longer context windows.
Given it's academic tone, I'm really curious as to how well it'll work with LangChain agents. It feels like it may not be overly prone to hallucinate answers and might be more open to pulling in data from other sources.
I haven't seen any of these models handle the ReACT stuff well yet. For general chat a lot of it comes down to preference. It seems most people like the Vicuna models for the chat stuff but they can't do the ReACT/PAL etc from my tests. I am curious do you have a specific use case for the LangChain+ Open Model scenario?
@@samwitteveenai The problem with using LangChain with OpenAI is you can't really use it for most company data. You'd be bleeding out private customer info or violating regulations, like HIPAA. But a properly licensed RedPajamas/StableLM on a self hosted encrypted server could be used to parse Jira tickets, browse SQL, hit local APIs, or work on internal code "paired programming" without issue so long as the LLM access itself was restricted. Like for example, being able to use agents to generate reports customers want from SQL sources. Stuff like that is a big time sink. It wouldn't replace a manager needing to compile and send the report(you can't give the customer access to the LLM), but that manager having access to an AI agent that can parse/assemble the data from SQL and create a draft report on that data would be a high demand use case.
You are a wizard. Love your videos 🐐
The WizardVicuna 13B is much better than vicuna13B. Tested both for my work. Really surprised with wizardvicuna!
hey can you share the notebook you used
Amazing! Loved this paper and see two really good insights from it….one is on the prompt engineering side in just creating this data set and perhaps even elevating the complexity of your model with this type of iterative complexity prompt engineering and the other the obvious data set itself and the fine-tuning that will come along with it….so good Sam and Tku once again!🥳🦾😎
Thanks its good to see you and other commenters see it is not always just about the model etc.
thanks a lot for your informative video :)
may I ask how can I train this model for another language? just training it with a dataset in another language would be enough? or are there any other steps I have to do?
this depends on the language. If it is a language that uses normal roman characters you should be able to get some decent results. If it is using totally different characters then the original LLaMa model doesn't have a great tokenizer and it hasn't seen much of that language in pre-training
@@samwitteveenai Thank you so much for your response.
So, if I understand correctly, the original LLaMa model may not perform well for languages that do not use normal Roman characters. In that case, what steps should I take to train the model for example for the Korean language?
@@SoroorMalekmohamadi So I am not sure about Korean but for example the tokenization for Thai is not ideal and basically would turn the model into a character level model.
Which version of colab are you using, pro or pro+. Because for me on free tier it is crashing.
normally pro+ or a custom one on the vids with multiple GPUs
Could you make a video on how to use an open-source model with instruction-seeds and create a custom dataset with that?
Thanks bro
Hm, is there any reason a person couldn't use the WizardLM strategy with RLHF? Like, could one follow the AI's progress with some of the earlier challenges, comment / rate the AI's improvement, and then "check in" throughout its process coming to grips with the various WizardLM problems? It seems like it would be a fairly efficient use of the strategy to produce a semi-supervised model with lots of experience problem solving.
Ironically the model is IDENTICALLY TO GPT-J BASE model. It's garbage beyond basic conversation. It's 7B. Models tend to hallucinate under ~12B. Vicuna released v1.1 q5_0 and q5_1 both in censored and uncensored models at 13B and are probably the most coherent model available. I haven't had a single hallucination 🤫 adding LORAs also helps if looking for specific output styles or context.
Wizard 7b in GPT4ALL(((OFFLINE))) is the best AI in the lightweight AI world! BRASIL LOVE IA'S!