DeepSeek-Coder-V2: First Open Source Coding Model Beats GPT4-Turbo

Ai Flux

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 жов 2024

КОМЕНТАРІ • 79

@DigitalDesignET Годину тому
I was just thinking about VHDL, and you did it. Thanks
@fernandoz6329 4 місяці тому ⁺⁷
I've been using Deepseek-coder for building some framework in the LLM field and it has been very impresive so far. I cannot imagine what V2 will bring.
@jjhw2941 4 місяці тому ⁺³
It doesn't need to cost $4M for the chip. you could implement it in an FPGA or on a project chip like Tiny Tapeout, or even a metalized gate array depending on the number of units you need.
@SiCSpiT1 4 місяці тому ⁺⁵
I like jailbreaking models and wouldn't mind suffering through rumbles jank to watch how others do it. Especially since anytime I talk about it on youtube my comments disappear.
@MK-ro7ee 4 місяці тому
Interesting
@yashrid 4 місяці тому ⁺³
Great video. But why you are not running results of code and don't showing up how it handle it from first try?
@thorvaldspear 4 місяці тому ⁺¹¹
Can't wait for state sponsored LLMs to start introducing code vulnerabilities when the user's location is in a rival country 🤣🤣
@empirechen 3 місяці тому
Yes I really enjoy a no-brain knee jerk reaction when hearing the word China😂
@heltengundersen 3 місяці тому
Knowing what MooCode is makes me a dinosaur now? They even did it at Stanford!
@koctf3846 4 місяці тому ⁺¹
That snake one is obvious wrong.
Generally speaking these models are using the same architecture and training method published by OpenAI. That's why their outputs and capabilities are quite similar.
@Noaman2022 3 місяці тому ⁺¹
I am new php developer which llm i can use Deepseek or which one you recommend ?
@aifluxchannel 3 місяці тому
What could I include in a dedicated coding llm video that would help you make this decision?
@Outcast100 4 місяці тому ⁺²
Tryed it for coding and it answers the explination in Chinese😂😂😂😂 also it outputs jiberris some times but it could be because of my limited hardware an RTX 3070
@mickelodiansurname9578 4 місяці тому
Okay that sounds like more of a headache than I can stomach right now. and insisting on 'Explanations in English only' in the system prompt is a thing but to be honest I'll give it a try sometime whenever cos.... Sonnet 3.5 reasons.
@i3oot897 4 місяці тому ⁺⁷
Did the generated snake game run?
@aifluxchannel 4 місяці тому ⁺⁶
Yes, when executed on my local machine.
@Mikoto_401 4 місяці тому ⁺⁶
Codestral 22B and Llama 3 70B are tiny in size compared to gpt-4 t, it would be pathetic if they were better than gpt-4 t
@aifluxchannel 4 місяці тому ⁺³
This is a great point - the 16V quantization of DeepSeek Coder V2 actually fits entirely within 24GB of VRAM. I managed to fit it all on a single 4090.
@aifluxchannel 4 місяці тому ⁺¹
x.com/teortaxesTex/status/1802875700703068256
@pmarreck 4 місяці тому ⁺¹
Which size model did you run and what is its context length?
@LucasMatuszewski 4 місяці тому ⁺¹
On Chat online you have only big models. And both had 128k context
@techguy2342 4 місяці тому
The wise EEs are all over this stuff. Especially those of us who design both HW and SW.
@GerryPrompt 4 місяці тому ⁺¹
Much more impressive than Codestral, any specific reason you use this for your work over other models? Great video! 😊
@aifluxchannel 4 місяці тому ⁺¹
I agree! I chose DeepSeekCoder initially because of it's speed and relative ability based on first-shot prompts. GPT4 generally misses unless you're asking something it's already known to be good for, like drafting complex bash scripts or FFMpeg filters. What do you use?
@GerryPrompt 4 місяці тому
@@aifluxchannel I generally just use mixtral 8x7B and gpt-4 for everything else but mostly for repetitive tasks and C embedded stuff. Which is mostly using it to find things in docs
@tomgirl366 4 місяці тому
@@GerryPrompt unless you have internet issue, why would you use open source models. As claude ai and gpt4 free is still the best for coding
@angelcamiloguillenguzman6529 4 місяці тому
@@tomgirl366check the price of gpt4o or gpt4 against deepseekv2coder.
@LucasMatuszewski 4 місяці тому
It costs 0.14 and 0.28 USD for 1 mln tokens... vs 5 and 15 USD for GPT-4o... So I use it for code specific tasks in agent frameworks because it's knowledge and code generation is very good. But I use GPT-4o as Supervisor because it's much better in this role.
@LucasMatuszewski 4 місяці тому ⁺¹
It has great, up to date knowledge. BUT... it's not great in following instructions and calling functions. So it quite often respond in a wrong format, what makes it not-usable on production as Agent Supervisor or for Code Interpreting like with Open Interpreter.
Can't wait to see some fine-tuned version with better instruction following to respond in specific format for Agent Frameworks :)
@amirlilit 4 місяці тому
Do you think this model is recommended for VHDL and SystemVerilog coding?
@hjups 4 місяці тому ⁺¹
It's definitely smarter than the previous models, but still fails my simple tests. Although, it did technically get one of the Verilog questions correct if you are purely thinking about inference code (i.e. relying on the compiler to infer a more complex mapping) - perhaps that's due to most of the training set being for FPGAs?
On that same problem, I was able to get it closer to the correct solution when applying the Socratic method, and it came up with a configuration I hadn't considered (which would be more efficient on FPGAs but not ASICs), although it failed to execute on the idea.
For your VHDL test, LLaMA2-7B would get the syntax correct most of the time, so that's not really an effective test. You would have to see how it performed implementing the logic itself, which would probably fail - I noticed it was unable to reason about the parallel nature thinking that the code executed sequentially.
Regardless, the improvement of LLaMA3-70B could make it feasible for local deployment for data privacy reasons when using 4-bit (or 2-bit if it doesn't exhibit a huge loss of quality). Otherwise, I don't see much of a benefit over GPT4.
@novantha1 4 місяці тому
I'm pretty sure Verilog is a pretty niche application for typical LLMs. I'm not sure if this is just a test for you, or an actual use case, but had you considered "black box fine tuning", as in, treating the outputs of common models (ie: deepseek, qwen, llama 3), but then fine tuning a dedicated small language model to adapt those answers into correct responses for your use case?
@hjups 4 місяці тому
@@novantha1 I agree, although DeepSeek was trained on it (so was LLaMA3, and GPT 3/4) - this is obvious by the fact that it gets the syntax right most of the time (one of them would often confuse complex process statements for some lisp style language).
None of the models are good enough to be used in the practical application space of HDL (let alone debugging), but they can help with some boiler-plate and sketching. The ideal application would be to use them in some agentic way for chip design (with black boxes), but that's not something any are currently capable of. So I was only testing simpler problems (which you would expect most first year DLD students to get correct on a homework assignment).
Regarding a small language model to adapt the black boxes, that's very unlikely to work given the reasoning failure of the larger models. The issue being that HDLs are not sequential languages, which could confuse the causal assumption made by language (they're locally causal but globally more like a graph). That said, perhaps Verilog generation and debugging should be the benchmark that replaces ARC for reasoning (once ARC is solved).
@m12652 4 місяці тому
Initially I was impressed, then today I asked a very simple question about tsql and it replied with a hallucination about a subject I'm working but had not mentioned. Spooky... it also writes answers in Japanese or Chinese, sorry i don't read either so I may be wrong. Even more worrying, it did the same after a reboot...
@TomNook. 4 місяці тому ⁺⁴
Wow, you must have just filmed this... Turkey Georgia just finished 2 hours ago!
@aifluxchannel 4 місяці тому ⁺²
I do my best to create videos as soon as I validate a topic! Let me know what you want to see next!
@jmirodg7094 4 місяці тому ⁺¹
excellent! I'm keen to see the lite model in aider
@aifluxchannel 4 місяці тому
Coming soon!
@remsee1608 4 місяці тому ⁺¹
I tried this on the deepseek website and it sucked seemed like it had a low context length
@jdaniels4055 4 місяці тому
How does it fare vs claude sonnet?
@VastCNC 4 місяці тому ⁺¹
Can you run the code in future videos or generate tests?
@aifluxchannel 4 місяці тому ⁺¹
I think I might start creating secondary videos with more testing, these would go more in-depth and also include demos on actual local GPUs. Does that sound interesting / like something you'd like to watch?
@1981jasonkwan 4 місяці тому ⁺¹
@@aifluxchannelwell it is pointless generating code if you don’t test it works. Other UA-camrs give a LLM one chance to fix an error if one occurs. Process doesn’t take that long.
@999satyam 4 місяці тому ⁺¹
Is this better than qwen2 for coding?
@aifluxchannel 4 місяці тому ⁺¹
I prefer this model to Qwen 2, specifically the 16B version for it's speed.
@666-d5y 4 місяці тому ⁺¹
is the free endpoint running the 230b or the 16b?
@aifluxchannel 4 місяці тому ⁺¹
Not actually sure, but I think it's the 16B variant. Can confirm this later tonight.
@666-d5y 4 місяці тому
@@aifluxchannel alright, thank you
@HyperUpscale 4 місяці тому
But you tested Coder-2 instead ?!?!
What did I miss?
@apester2 4 місяці тому ⁺¹
Are there VSCode plugins to integrate this for local dev? Of how do you use it to code?
@kristianlavigne8270 4 місяці тому
Code your own simple wrapper in an afternoon, using AI to assist you 😅
@HETRIXAIINC 4 місяці тому
Lets get the jailbreak video going..🎉🎉🎉
@xXWillyxWonkaXx 4 місяці тому
Qwen and DeepSeek Coder are both Chinese. It's amazing how they're slowly dominating everything.
@the42nd 4 місяці тому
Everything?
@jasvantsolanki284 4 місяці тому
My picks for next 3 months period are INJ and FIL. But also, guys do not miss Cyberopolis presale, is almost over.
@xd-qi6ry 4 місяці тому
My got can even do real IMO math problems from only an image problem upload aswell
@cassandrachristine 4 місяці тому ⁺¹
im ready to start using rumble. i made an account and followed you.
@xd-qi6ry 4 місяці тому
this can do pretty crazy coding aswell
anyone wants 10 x vision capabilities for chat GPT i figured out something open Ai forgot about, Its called Smart Vision image/text analysis (Paste your own custom instructions for a superior smart chat)
It is 10 x better at analysing images and especially reasoning relationships read the examples below
I uploaded an image of a cloud that looks like multiple things but it can be interpreted, the one i gave it was a personal photo I took and it has recognised it was a rabbit but not even a random human guessed it every time now on 1st shot so it knows when something is unusual about an image even if you dont say anything, it can also do iq test image reasoning pattern questions relatively well.
Another example is upload an image of a model with clothes that fit or don’t fit or you don’t know it can analyse that and tell you with great detail why it fits or doesn’t fit
It kind of even understands real logic games when giving good instruction but is limited there by the model not the instruction
*IMPORTANT* just gotta follow the instructions given to get the right seed its 1 in 2 chance or so i have absolutely no idea why it needs that. Just paste the conversation starter and you’ll understand what to do.
@BienestarMutuo 4 місяці тому
i dont understand what you write, What you create ? How to be used?
@xd-qi6ry 4 місяці тому
@@BienestarMutuo its pretty complicated but basically its like 10 x smarter than gpt 4o especially for analysing any uploaded images, its overall just better accuracy you can use it for anything, and tell it your custom prompts aswell
@BienestarMutuo 4 місяці тому
@@xd-qi6ry How i can test that ?
@xd-qi6ry 4 місяці тому
@@BienestarMutuo all you need to do is search up Smart Vision in gpt or the website, in the custom gpt section and it should show up with that orange eye image, it has 50 people usijg it, but for usecase it can be used as anything but be gpt 6 level actually
@xd-qi6ry 4 місяці тому
Just search Smart Vision image/text analysis in gpt store it’s an orange eye looking logo
@adamhenriksson6007 4 місяці тому
I would imagine that most of these datasets are compromised in some way or another. I know for a fact that parts of HumanEvalJava is part of large training data sources like The Stack. IMO only new datasets or real usage can accurately evaluate these models. If the model came from out of nowhere and it consistently outperforms by that much, that is a red flag for me.
@aifluxchannel 4 місяці тому
Yep, this was something I was highly suspicious of with DeepSeek Coder V2. However, I do think there are practical limits to how fast these benchmarks can be fooled / reverse engineered.
@adamhenriksson6007 4 місяці тому
@@aifluxchannel They are not "fooled" per se, it's (i'm totally guessing) simply that they could be overfitting. LLM can start overfitting when the problems and solutions for the benchmarks have been included in large datasets used for training the model. Normally, you HAVE to separate training and evaluation data, but there are practical limitations when dealing with LLMs who needs vast data sources which continuously scrape vast amounts of data from the internet.
Worst case, in a scientific ethics context, is if they intentionally selected older benchmarks in their results for their better performance relative to other models, and older benchmarks are more likely to suffer from data leakage, but that is a more serious accusation.
I will try it myself nonetheless, the code quality of your example was really good at least compared to GPT :)
@MohinKhan-pf9ow 4 місяці тому
Everyone talking about the Cyberopolis launch best news this year.
@hdrrdd 4 місяці тому
Gaming tokens are booming. Cyberopolis will go 50x after the launch.
@bhuvneshkumar1335 4 місяці тому
Do not underestimate the power of getting in early. If you are not in Cyberopolis now, you are lagging behind.
@pinkubhaixy 4 місяці тому
Gaming tokens are booming. Cyberopolis will go 50x after the launch.

Наступне

Автоматичне відтворення

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source