Gemma 2 - Google's New 9B and 27B Open Weights Models

Sam Witteveen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 26 чер 2024
Colab Gemma 2 9B: drp.li/6LuJt
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
Наука та технологія

КОМЕНТАРІ • 41

@Nick_With_A_Stick 7 днів тому ⁺²¹
27b takes up 15gb in 4bits ❤. Although llama 8b smashes both its human eval with 62.2, which google conveniently left out of the chart in the paper. But then again llama 8b’s human eval drops to 40 at 4bits, and there was a new code benchmark I think by big code bench, I saw on twitter and it showed llama 3 70b actually sucks at coding and *potentially* *allegedly* trained on human eval, probably on accident with a 8T token dataset 🤷‍♂️ Elo dosent really lie, with the exception of GPT4-o people just like it because it makes pretty outputs, like they way it formulates it’s outputs is really visually appealing (for example it uses a ton of markdown, like lines separating the title, and big font and small font at certain times). which 100% launch the scores to the moon because claude sonnet 3.5 is significantly better, provided my main use case is coding.
@blisphul8084 6 днів тому ⁺²
the IQ1_S quant is only 6GBs so it can fit on an 8GB GPU like the RTX 3060ti. No need for H100s here. Though based on humaneval, I'll stick with dolphin Qwen2 for now.
@Nick_With_A_Stick 6 днів тому
@@blisphul8084 for coding I like Codestral with continue dev a vs code extension, but then again ever model sucks in comparison to sonnet 3.5’s 1 shot code ability. And for some reason it actually kinda looses performance at multi shot, if you are in a convo with it editing long code it occasionally messes up, but it rarely ever makes an error if you start a new convo and re ask the question with the code.
Side note, I wish Eric had done his fine-tuning on top of the Qwen instruct model using lora. It would help combine the strengths of both the datasets.
@olimiemma 7 днів тому ⁺⁴
Am so glad I found your channel by the way.
@supercurioTube 7 днів тому ⁺¹⁰
All quantizations are available for Gemma2 9b and 27b in Ollama, but the 27b has an issue, with a tendency to never stop its output.
@falmanna 6 днів тому
The same happened to me with 9B 4bit ksm
@volkovolko5778 6 днів тому ⁺¹
Got the issue on my video on my channel
@unclecode 7 днів тому ⁺⁵
Thanks, Sam! You know, it started with 7b as a trend, then Meta made it 8b, and now Google has 9b. I wish they'd compete in the opposite direction. 😄 Btw, I have an opinion. Let me share it and hear yours. I’ve noticed recently proprietary models often train with a chain-of-thought style, to the level became annoying because it’s hard to get the model to do otherwise. This approach ensures the model crosses benchmarks but gives it one personality that's hard to change.
For instance, GPT-4o became a headache for me! It always follows one pattern, regenerating entire previous answers, even if the answer is just one word! It's annoying, especially in coding. Imagine you want a small change, but it regenerates the whole code. constantly have to remind it not to regenerate the whole code, just show a part of it, and it's frustrating. This is clearly due to the training data. I don’t see this issue with open-source models. One proprietary model I like, Anthropic, still feels controllable. I can shape its responses and keep it consistent.
To me, this technique hides model weaknesses. It’s easier to train a model to stick to one style, especially if the data is syntactically generated. Language models need a word distribution that's not overly adjusted, or they become biased. When they release a model as an instruct model with one level of fine-tuning, you still expect it to be unbiased. Fine-tuning it to take on another behavior would be tough.
@longboardfella5306 7 днів тому
Interesting. I’ve noticed the same thing with getting stuck - when it kept producing an incorrect word table I couldn’t get it to stop repeating that each time.
@samwitteveenai 6 днів тому
Definitely post training ( SFT, IT, RLHF,RLAIF etc ) has changed a lot in the last 9 months. All the big proprietary models and big company open weights are now using synthetic data heavily. A big challenge with synthetic data is creating the right amount of diversity. This could explain some of what you are seeing. Also you might be seeing models that have been overly aligned with reward models etc. Anthropic has “Ant thinking “ for their version of CoT and it is wrapped in xml tags. I think a lot of that gets filtered on their UI etc. The Gemma models clearly show Google has gone down the path of baking CoT into the models. For following System prompts well I think Llama is much better. I test the model by asking them to be a drunk assistant. For some reason Llama can do that very well.
@SwapperTheFirst 7 днів тому ⁺¹
fantastic news and great overview, Sam.
@jbsiddall 4 дні тому
great video sam! this is my first look at gemm 2.
@TreeYogaSchool 6 днів тому
Thank you for making this video!
@jondo7680 7 днів тому ⁺¹
The 9b one is very interesting and promising.
@SirajFlorida 7 днів тому ⁺²
Well if Gemma 2 is just barely beating llama3 8b and it has an additional billion parameters than I would leap to say that llama is the higher quallity model. Not to mention outstanding support for llama. I get the commercial license, but I kind of see the llama license as not allowing big tech to just plagiarize models. If only we had llama 30ish B. Oh dear zuck. If you can ever hear these words. Please give us the 30B. We love and appreciate all that you do!!!
@SirajFlorida 7 днів тому
Actually, I think he did... and it's multimodal model called chameleon. :-D
@onlyms4693 6 днів тому ⁺¹
So can we use llama 3 for our chat bot support freely for our enterprise?
@imadsaddik 7 днів тому ⁺¹
Thanks for the video
@toadlguy 7 днів тому ⁺²
Running Genma2 9B with Ollama on an 8GB M1 Mac, even though it is only 5.5 GBs (for the 4-bit quantized model) it immediately starts running into swap problems and outputs at about 1 token/sec. The llama3 8B (which is 4.7GBs for 4-bit quantized model) runs fine entirely in working memory even with lots of other processes running. So there must be something different about how the inference code is running (or Ollama's implementation)
@NoSubsWithContent 7 днів тому
are you sure that 8GB isn't being partially used to run the system at all? it could also just be that the hardware is too old, I had 16GB and still couldn't run qwen 2 0.5B
@samwitteveenai 6 днів тому ⁺¹
Apparently they had issues with it and are fixing them. For me I just made vid in the last few hours and it seemed fine. Maybe try to uninstall and try again.
@dahahaka 7 днів тому ⁺³
Damn, feels like Gemma came out last month
@micbab-vg2mu 7 днів тому ⁺²
thank you - I am waiting for Geminy 2.0 Pro - )
@samwitteveenai 7 днів тому
Give it a bit of time.
@strangelyproton5188 7 днів тому ⁺¹
hello can you please tell me whats the best hardware to buy for running at max 70b models not just for inferencing but also for instructor tuning
@NoSubsWithContent 7 днів тому
with quantization I think you can get away with a single H100, 80GB. using QDoRA will achieve nearly the same performance as full finetuning while still fitting within this constraint.
for cost effectiveness you could try multi-GPU training with older versions, this is just harder to set up and takes a lot more understanding of the specs
@user-bd8jb7ln5g 7 днів тому ⁺¹
What I really wanted from Gemma is at least a 100k context window. It looks like that is not forthcoming.
@samwitteveenai 6 днів тому ⁺²
Someone may do a fine tune to get it out to that length. Let’s see
@AdrienSales 7 днів тому ⁺¹
I gave a try to gemma:9b vs llama3:7b on function calling... and I got much better results with llama3. Did you give a try to function calling ?... maybe will there be a specifi tuning for fucntion calling.
@samwitteveenai 7 днів тому ⁺¹
AFAIK Google doesn’t do any FT for function calling on the open weights models. I have been told it could be due to legal issues, doesn’t make a lot of sense to me. The base models can be tuned to do this though
@AdrienSales 5 днів тому
@@samwitteveenai Keeping an eye on FT on ollama library within the next few days
@MeinDeutschkurs 7 днів тому ⁺³
dolphin, dolphin, dolphin!!!! ❤❤❤❤ I hope it’s being read!!!! 🎉🎉🎉🎉gemma2 seems to be a cool model for dolphin. Have I already mentioned it? Dolphin. Just in case! 😆😆🤩🤩
@Outplayedqt 7 днів тому
Dolphin-3.0-Gemma2 🙏🏼
@MeinDeutschkurs 7 днів тому
@@Outplayedqt 🥰
@AudiovisuelleDroge 7 днів тому
Neither 9b or 27b Instruct supports a system prompt, what were you testing?
@samwitteveenai 7 днів тому
You can just append them together, which is what I did there. You can see it in the notebook.
@flat-line 7 днів тому
@@samwitteveenaiwhat is system prompt support for ? If we can just do it like this ?
@samwitteveenai 6 днів тому ⁺¹
So on models that support a system prompt it normally gets fed into the model with a special token added. If the model is trained for that it can respond better for it (like the Llama models) if it doesn’t have it like Gemma prepending like I did still can work well but it is just part of the overall context.
@flat-line 2 дні тому
@@samwitteveenai thanks this is informative, what is this special tokens? How do we learn about them
@sammcj2000 7 днів тому ⁺⁴
Tiny little 8k context. Pretty useless for many things.

Наступне

Автоматичне відтворення