WizardLM 2 - First Open Model Outperforming GPT-4
Вставка
- Опубліковано 31 тра 2024
- In this video, we test the first Open LLM that outperforms GPT-4 on MT-Bench. Open LLMs are catching up really fast.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
How was it trained? / 1
How it performs? : / 1779899325868589372
Github Repo: wizardlm.github.io/
TIMESTAMPS:
[00:00] Ground breaking Open LLM
[00:58] Deep Dive into Model Training and Performance
[01:58] Testing it with LM Studio
[05:08] Exploring the Model's Reasoning and Writing Skills
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu... - Наука та технологія
Command R+ is the first open model to outperform GPT-4-0314 according to the LMSYS Chatbot Arena Leaderboard.
Agree, everyone is using different benchmarks, ones that suites the model creators :)
We aren’t running out of human generated data. We are just running out of the easily internet accessible data.
True. Humans generate insane levels of data everyday outside of the internet. Companies have to look for ingenious ways to try and capture some of that data.
Remove censorship and it will be good.
Fr
Thank you for the informative video! By the way, how did you record this video the zooms and the cursor look super smooth!
thanks, I use. screen.studio/
Isn't it crazy that u uploaded the video 13hrs ago, and about 5hrs later, Llama3 came out with an impressive claimed benchmark and a 400B version in training? Just 9 days ago Mixtral8x22B, then 3 days ago with WizardLM, and now Llama3! I think the table is changing; now open-source models are pushing proprietary models to improve themselves. Tbh, I think the only thing left for OpenAI to impress the market is to drop AGI :D:D
I agree, the pace is just crazy. Hard to keep up. Its OpenAI's turn now :D
btw any plans adding function calling to llama3, that would be great.
@@engineerprompt haha u read my mind 😎 working on it since morning, stay tuned , update you soon
@@unclecode Awesome, will be waiting for it.
The thing is how to pack all those tools so we can sell customs gpts and merge them into websites 😂
Wow that Llama3 is here we can ignore all these models for a few days. Until the next best thing is released! The pace is breathtaking.
I agree, I wonder if people are actually using every new model or just sticking to their old stack.
Once I start downloading these I will run everything of quality that comes out looking for mostly storytelling abilities and some general knowledge ai
First.. you know.. I miss 2015 😢
i miss 2005 :/
@@PazLeBon I mis 500 BC
Zenzorzhip bad
Unfortunately, I missed it and then couldn't find the part in the video that said which version was being tested.
Maybe someone understands - the author managed to download a version that the manufacturer later removed, or will he get access to a new, improved version?
He didn’t explain it well. What happened was the weights for 7B and 8x22B were uploaded and then deleted. However the license used was Apache 2.0 which allows for copying and reuploading. So people who managed to download the weights before they deleted reuploaded the weights fully legally. Just search on hugging face. Only the 70B is missing which they never uploaded.
MaziyarPanahi/WizardLM-2-8x22B-GGUF
WizardLM-2-8x22B.IQ3_XS-00003-of-00005.gguf
Thank you for the clarification!
We still managed to download and post it! :)
Why not use open Ollama instead of closed LM Studio?
Because LM Studio has a wicked user interface and Ollama barely functions on windows, that's my reason anyway.
I tested it on ollama but the model is generating gibberish. Still figuring out what is the issue there.
@@kylequinn1963 Well, it might also be wicked in the real sense. How can we know without access to the source?
@@engineerprompt May be report the issue on Github or DIscord. That's why it is open-source after all.
@@engineerprompt The Github repository accepts bug issues
If I don't want to/can't use this model locally: Does anyone know if it's already hosted somewhere online and available per API?
Not this but the instruct fine-tuned version by Mistral AI is available on their platform.
You could try checking Infermatic, not sure how their API runs though.
*Does anyone know where I can test the Mistral 8x22b online, as I don't have a system that supports local models?*
checkout labs.perplexity.ai/ its the base version not the instruct version
There is a version of it in ollama. Is it different
I have tried the latest version of ollama (1.32) and have issues running the 4bit version. 8bit works but is too 🐌
@@engineerprompt I have an NVidia 3090 with 24GB VRAM so might be able to load it. Need to try it with Ollama
As far as the trick question about whether Sally is John's sister and it figuring out its mistake once you pointed it out:
You should do another test where you do specify that Sally is John's sister and then gaslight it saying the initial prompt didn't say that. I'm curious how it would respond.
Interesting, will try that for sure with this and llama3.
Would the price of this hosted be lower than gpt4
Self hosting will be cheaper in the long run but in short term it will be more expensive.
@@engineerprompt What kinda hardware are you running this on?
Edit: Nevermind I saw it further down.
What are the VRAM requirements to run these models?
Massive. I'm running the Q3 variant on my machine with a 4090 and 128gb of ram and the model itself is around 65gb, referring to the 8x22b model specifically.
I am running this on M2 Max 96GB RAM. Can run the Q3 only.
I am running the Q8 model on a desktop with a 3060 12gb. It takes about 4 seconds to start writing. That's fine with me.
Why not just finetune on benchmark questions?)
I'm just genuinely curious: are you being sarcastic? :)
We would need new benchmark questions then. But in my opinion, we need new benchmarks (reguarly) anyways, to prevent false advertising of new models.
Its possible to run it using 2 GPU? any tutorial with langchain
Yup depending on the vRAM you have in each gpu. you will need about 48GB
i got these naughty poem inside my notes it was converted notes from my teens, somehow i got them into one of my daily notes no wonder i never ever find those poems. using ollama + obsidian copilot with dolphin model i got that old notes back and then i was calling all my buddies from the 90's then we all having great time they even remember those silly naughty poems.... ah the beauty of uncensored LLM.
without censorship all kind information can be used in all kind different ways whenever it's for good nor for bad. censorship in my country already be misused in all kinds different creative corrupt way to get monopoly for the profit of few ~they censor yet they access ~they censor yet they gain strategics ~they censor in favor of their ideology ~they censor in favor their politics (this is fact)
uncensored = good will gain, bad also gain. let's us human thrive in information and tech.
what hardware spec does it need to run ?
I am running this on M2 Max 96GB and takes about 50GB
the current model is not 1106... there is april updated chat4 turbo version
Oh, yes, you are right
This didn't age well 😂
That's so true 😂😂😂
@@engineerprompt I'm glad you made this though. With the news cycle I would have completely missed it!
Then there was llama3 🎉😂
😂😂😂