AI Coding BATTLE | Which Open Source Model is BEST?

Поділитися
Вставка
  • Опубліковано 11 січ 2025

КОМЕНТАРІ • 221

  • @batuhankaral7320
    @batuhankaral7320 2 місяці тому +49

    I think the best way to test the coding models would be to give them an existing code and ask it to add a function to the code since that is the main use case for coding models (For now at least) rather than writing the entire code.

    • @hitbox7422
      @hitbox7422 2 місяці тому +1

      @@batuhankaral7320 o1 is pretty good at that, only downside is the recursion/use of private functions/redundancy in most cases. You still need to specifically state how you want it to structure your code, but other then that, I still have to catch it making syntax errors like o4

  • @punk3900
    @punk3900 2 місяці тому +25

    The fun part starts with going beyond 2000 lines of code. Prototyping a short code is not what you need AI for

    • @hitbox7422
      @hitbox7422 2 місяці тому +2

      @@punk3900 that’s why you write static tests and contextualize outputs. Token limitations won’t change in the near future, so making use of techniques to overcome those limitations is the way to go.

  • @ipmur
    @ipmur 2 місяці тому +27

    what about a test if the model can understand existing Code?

    • @JINIRPA
      @JINIRPA 2 місяці тому +3

      So bad it will generate different code from what you give it

    • @jacky2476
      @jacky2476 2 місяці тому

      @@JINIRPAexactly!

  • @desmond-hawkins
    @desmond-hawkins 2 місяці тому +6

    (7:02) *kyū* is Japanese term for a rank used in martial arts, it indicates the number of steps away a student is from reaching "dan" (master). So 1 kyū outranks 2 kyū, etc.

    • @Sven_Dongle
      @Sven_Dongle 2 місяці тому

      Sad when a man reached that age without ever strapping on a gi.

  • @pmarreck
    @pmarreck 2 місяці тому +8

    Summary:
    • Quen 2.5 Coder 7B was the overall winner, being faster and the most effective, especially with tasks like the Snake
    game.
    • Quen stood out for its speed and practicality for locally run AI coding assistants.
    • Matthew encourages viewers to suggest other practical tests for coding models.

    • @aszmajdzinski
      @aszmajdzinski 12 днів тому

      Is it summarized by AI?

    • @pmarreck
      @pmarreck 12 днів тому

      @ Transcribed by whisper and summarized by AI, yes. Using a script I’m building into a website.

  • @trelligan42
    @trelligan42 2 місяці тому +9

    Here's a testing suggestion: Have the target models complete one fairly simple task such as snake game. Then perform two iterations: 1) Cosmetic changes, with a border around the play area, color changes and different head & tail blocks for the snake. 2) Functional changes; have the food appear preferentially in different areas, only head eats so if a food randomly appears under the body or tail it is not eaten and you have to 'go around again', Food is initially yellow (shrinks snake) but randomly turns red (elongates snake), scoring etc.
    The idea is that models can have more difficulty modifying existing code without breaking it entirely. Guess how I know-or at least why I think this. #FeedTheAlgorithm

    • @m.c.4458
      @m.c.4458 2 місяці тому

      I know what you mean

    • @Techonsapevole
      @Techonsapevole 2 місяці тому

      I agree it's a more real world coding test

    • @DeianGiNet
      @DeianGiNet 25 днів тому

      I agree when you start making changes - this is where they mess it up

  • @MrMoonsilver
    @MrMoonsilver 2 місяці тому +3

    Missed these kind of videos! They're the best!

  • @SahilP2648
    @SahilP2648 2 місяці тому +5

    Nemotron seems to be really good. Also, Mistral's Codestral. I am surprised you didn't test that. Also please don't test snake in python. I am pretty sure even non-code-oriented models can output it nowadays.

  • @Diego_UG
    @Diego_UG 2 місяці тому +1

    0:48 Wow, that PC is so cool, I'd like you to make a video talking about hardware, especially what's needed to run models, from the smallest to the largest.

  • @BillBaran
    @BillBaran 2 місяці тому +4

    Kyu is part of an Asian leveling system. It's pronounced 'que'. Kyus count down, so higher numbers are weaker. After 1 Kyu, you switch to dan (produced don) and start counting up. In karate, the colored belts are the Kyu levels and the levels of black belt are the dan levels. Dan is where you're considered to actually start to know what you're doing.

    • @andrewcarnegie5805
      @andrewcarnegie5805 2 місяці тому

      Not really. An 8k is a pretty strong club player in Go (the game the system, later adopted by Karate, was based on). A 5k is well above average, and 3k is respectably strong. 1D is like black belt. You've far surpassed "knowing what you're doing" at that stage. For the chess players out there, think 2k+ ELO.

    • @BillBaran
      @BillBaran 2 місяці тому

      @andrewcarnegie5805 "knowing what you're doing" is defined differently in the West than it is in the East. It is also different across the centuries and in different communities. This is splitting hairs. I am describing the abstract intent.

  • @JGoodwin
    @JGoodwin 2 місяці тому +2

    If you are going to mention a machine to do AI work, it would be nice if you linked to the exact spec you used in the testing.

  • @rs832
    @rs832 2 місяці тому +30

    Comparing these models to Claude, chatGPT, and Meta free tier would be a ***very*** valuable/helpful comparison!!
    Any chance of seeing this in the near future?
    Thanks for the great Ai content, easily the best channel out there for this content!

    • @JohnLewis-old
      @JohnLewis-old 2 місяці тому +4

      I can't wait for local models to be on par with the current Claude Sonnet and ChatGPT o1-mini. I get MUCH better results from them and even more with the API, paying pennies per file to get great results.) These challenges are dead simple for Claude, using Cline (a plugin for VSCode) as the interface. Cline just works for me and I've created 10,000 lines of code with it.

    • @chandrasekhar01
      @chandrasekhar01 2 місяці тому

      @@JohnLewis-old That's never gonna happen.. at least in this lifetime

    • @TheDandonian
      @TheDandonian 2 місяці тому +2

      ​@@chandrasekhar01 What makes you say that? These models aren't miles behind. This version of sonnet is incredible but 6 months from now, the open source models will be on par with it.

    • @JohnLewis-old
      @JohnLewis-old 2 місяці тому

      @@chandrasekhar01 why do you think that? Honestly sounds clueless on your part. You can see on desktop models that are way better than GPT 3.5 ... just for an example.

    • @dibu28
      @dibu28 2 місяці тому

      and Google Gemini

  • @PauloQueMoraNaAlemanha
    @PauloQueMoraNaAlemanha 2 місяці тому +1

    Please test with tasks like writing html+css+js code. A lot of people seem to be using these models to create landing pages and micro apps.

  • @mrdevolver7999
    @mrdevolver7999 2 місяці тому +9

    12:26 "...and if you want a beast of a machine that can load up models up to 96 GB into VRAM, definitely check out the Dell Precision 5860..." How much in kidneys is it?

    • @r9999t
      @r9999t 2 місяці тому +2

      Only two, you can afford it. :-)

    • @fabiobenedetti8434
      @fabiobenedetti8434 2 місяці тому +1

      @@r9999t 🤣🤣🤣

    • @mrdevolver7999
      @mrdevolver7999 2 місяці тому +1

      @@r9999t I hate to be the party pooper, but you actually do need at least 1 kidney to live. I know, sucks, but I guess I'm gonna have to pass on that beast of a machine. Unless, you want to sponsor me? 🤣

    • @r9999t
      @r9999t 2 місяці тому +1

      @@mrdevolver7999 While I try to be kind, I do sorta need my kidneys. Besides, I'm old, you really don't want my kidneys.

    • @mrdevolver7999
      @mrdevolver7999 2 місяці тому +1

      @@r9999t Hmm, I'm sure the beast of a machine wouldn't feel offended by your old kidneys, but okay, maybe next time. 😉🤝

  • @thedudely1
    @thedudely1 2 місяці тому +1

    My benchmark, which has proven to be too hard for any model i've tested under 20b parameters, is to ask them to write a simple ray casting program like Wolfenstien 3D. Easy for models like GPT4o or Claude 3.5 Sonnet, possible with models like Llama 3 70b or Codestral, but i've yet to test a model under 20b parameters that's able to do it.

  • @soerengebbert
    @soerengebbert 2 місяці тому +4

    My second hand Apple MacStudio M1 Ultra with 64GB RAM and 60 Core GPU runs at 60 tok/sec with Qwen2.5-Coder 7B 8bit! And it costs only 15% of the DELL workstation.

    • @shramscar
      @shramscar 2 місяці тому +3

      So m1 chip can use Ram as vram? That's a game changer (prepping to migrate from windows lol)

    • @the42nd
      @the42nd 2 місяці тому +3

      @@shramscar the new macbook pro m4 with 128GB RAM can run all these models.... its getting super interesting. Eager to see what llama 4 offers.

    • @shramscar
      @shramscar 2 місяці тому +2

      @@the42nd yeah. This apple unified memory architecture makes a whole difference using ram for cpu and gpu. My nvidia a5000 gpu with 16gb vram sounds like a joke compare to what m chips do.

    • @TheReferrer72
      @TheReferrer72 2 місяці тому

      @@the42nd Running it and getting decent speed in tokens is a different matter!

  • @drewski6843
    @drewski6843 2 місяці тому +3

    Thank you for this awesome video.
    Which llm model would you choose if there were no more internet connection?
    Basic ai assistant or an ai assistant with more capabilities? Computer vision, voice chat mode?
    Most likely build your own cloud networ?
    Thanks again for your vids

  • @solidreactor
    @solidreactor 2 місяці тому +1

    Is there a model for LM Studio that can "think and reason" like the OpenAI 01 preview does? Would love to have a local model that could solve more complex set of tasks and design requirements.
    Currently I use LLMs to help me design by picking through different coding architectures and approaches, the actual structure and design choices for the program, not the coding by itself.

  • @CreativeEngineering_
    @CreativeEngineering_ 2 місяці тому +1

    I love Gwen2.5 models, they are some of the best and fastest models for their size. My only complaint is that the smaller models 1.5 and 3b instruct models are VERY particular about the wording of the prompts. But this is common amongst smaller models, so it's definitely not a deal breaker.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому +1

    Generally the micro circuits are getting so small that a few molecules being knocked off by static charges and magnetic field fluxes influence with radiation particular matters passing through and knocking off orbiting charges changing time keeping

    • @Sven_Dongle
      @Sven_Dongle 2 місяці тому

      Also micro power fluctuations, which is why the Tesla gigacluster runs off battery charged from the grid

  • @ScottVanKirk
    @ScottVanKirk 5 днів тому

    It would be more interesting to give them a more complex real world problem and then give them a certain number of iterations or tokens to produce the final result. Also, it would be very interesting to see how they handle codebases of different sizes.

  • @vadimk1004
    @vadimk1004 2 місяці тому

    4:46 - that Dell AI workstation came with Windoze and not Linux?

  • @annchoe1630
    @annchoe1630 2 місяці тому

    On top of being very informative, this is also so much fun to watch.

  • @chriswatts3697
    @chriswatts3697 2 місяці тому +1

    i love the qwen2.5 models so far, the llm models are really good and they dont use too much memory

  • @AprilEclipse2024
    @AprilEclipse2024 2 місяці тому +5

    just ask it to code ever more complex games, more complex websites.

  • @notme222
    @notme222 2 місяці тому +1

    As good and useful as these larger model responses are, I've recently become spoiled by Cursor and their AI autocomplete. The fact that I don't even have to form the question makes for faster typing at minimum, and good bug fixing when needed.
    Tangentially, I think testing these models with naked requests quickly becomes less useful and it will come down to which can properly input your existing codebase.

  • @auriocus
    @auriocus 2 місяці тому +1

    The "hard" problems are not actually coding challenges, but math challenges. The task with fast prime ouput means you need to apply e.g. Eratosthenes sieve instead of going through all numbers and testing if they are prime. It's just stupidly phrased in the task

  • @cesar_ai_eng
    @cesar_ai_eng 2 місяці тому

    I wonder if you tested quantized versions of the models, in which case they shouldn't be as good as their full B16F versions

  • @DerivsBank1ng
    @DerivsBank1ng 2 місяці тому +1

    I see you were running GGUF quantized models, what quantizations were they?

  • @pkaser
    @pkaser 19 днів тому

    I think another eval would be to give it terrible code, like sorting a numerical array by using timers, and see how it corrects it. Or like ask it to rewrite a block of code to be smaller yet maintain the same functionality

  • @zigydk
    @zigydk 2 місяці тому

    Now that you have so much firepower in those A6000 cards, I would have loved to see what can be done with much larger models. Like 70B or 200B.

  • @alx8439
    @alx8439 2 місяці тому +4

    With all your VRAM you should be running not quantized, but full precision fp16 models. Quantization hurts smaller models performance much more, than bigger models

  • @wurstelei1356
    @wurstelei1356 2 місяці тому

    Nice video. I would like to see different IDEs that allow for different local models to collaborate in an agentic manner.

  • @cybernit3
    @cybernit3 2 місяці тому +1

    So the RTX 6000 is more power efficient than the RTX 4090, but the it has more CUDA cores and same amount of VRAM. Apparently the RTX 4090 is like $1500 CAD and the RTX 6000 is like $11-12k CAD. Thanks for doing the tests, I will try to remember Quen.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    With the electric circuits on most semiconductors becoming more prone to 'mistakes' beginning to breakdown and making mistakes at 80 degrees C. That is different than 80 F. 100 degrees C is boiling at sea level and Zero degrees C is rain water freezing at sea level. . . (France, a long time ago) Water frozen at 32 and boiling at. . . Is the scale the same

  • @ladykilla85
    @ladykilla85 2 місяці тому

    I LOVE the idea to use codewars!!! What a great way to avoid contamination in the datasets :) great job

  • @kokomal
    @kokomal 2 місяці тому +87

    Matthew man. I love your videos. But what is up with the stupid thumbnails? Come on man. Just do a normal video, be yourself and more people will like it.

    • @mrpro7737
      @mrpro7737 2 місяці тому +5

      😂😂

    • @DannerBanks
      @DannerBanks 2 місяці тому +13

      He is just doing what the algorithm (and people clicking behind the algorithm) rewards him for

    • @kokomal
      @kokomal 2 місяці тому +11

      @@DannerBanks Fuck the algorithm man. I think he will do great if he ditch the stupid thumbnails and the positive vibes. Just be yourself and talk to us. Share your passion and enjoy.

    • @DannerBanks
      @DannerBanks 2 місяці тому +9

      @@kokomal bro, it's a thumbnail. Take a deep breath. Touch grass

    • @VidamKaporin
      @VidamKaporin 2 місяці тому +3

      Agree

  • @the42nd
    @the42nd 2 місяці тому

    Nice machine. Curious, you mention its pretty fast "even with other models loaded" but memory usage shouldn't matter right? So long as its not 100% and then using disk. At least from a regular main system CPU RAM perspective that's pretty much how it works. Be interested to know if its different for VRAM and AI.

  • @ThoughtFission
    @ThoughtFission 2 місяці тому

    I've been using Qwn in Gen AI image creation and pretty much smokes everything else.

  • @adamstewarton
    @adamstewarton 2 місяці тому

    Matt, could you use Nvidia's nemotron 70b for coding tasks just to see how it does?

  • @CocaNoah
    @CocaNoah 2 місяці тому

    I guess building some UI would be a neat test case

  • @artnikpro
    @artnikpro 2 місяці тому

    I tried all the models you mentioned, maybe they are ok for python but they performed bad for me for web coding (frontend / backend). Qwen has problems with understanding the prompt in general. From all I tried, dolphin-2.7-mixtral-8x7b is best so far for coding

  • @yakamo
    @yakamo 2 місяці тому

    good to see smaller models getting better, sitll a long way to go though

  • @joenobk
    @joenobk 2 місяці тому

    Need a baseline to compare. Like Llama 3.2 3B or 11B.

  • @arunkumar-uo9tg
    @arunkumar-uo9tg 2 місяці тому +7

    Please do nemotron with some complicated stuff. You'll like it

  • @AC-cg6mf
    @AC-cg6mf 2 місяці тому +1

    codegemma, codellama, and codestral would be interesting

  • @luizgustavs
    @luizgustavs 2 місяці тому

    Evaluating coding models is tricky because their performance varies significantly across different programming languages. This makes it difficult to declare a single "best" model, as one might excel in Python but struggle with JavaScript, for example. I believe it would be more insightful to evaluate language models based on their performance in specific languages, rather than relying on a single general-purpose benchmark that may not accurately reflect their strengths and weaknesses in each language.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    Artificial Intelligence is like cruise control on a vehicle it still requires supervision. . . Are there any old ' back doors' that haven't been used in a while

  • @kecksbelit3300
    @kecksbelit3300 2 місяці тому

    i don't know if you noticed but today everyone had access to o1 not preview. Jimmy apples posted a link and it actually worked. But i don't think it works anymore. Did you had a change to test it and make a video?

  • @Nik.leonard
    @Nik.leonard 2 місяці тому +1

    I know that is not “open source” (it has a research license) but maybe pit that models against Codestral?

  • @Creepaminer
    @Creepaminer 2 місяці тому +1

    From what I can tell I think you were using Q4KM versions on the models and not Q8 which as always isn’t that much of a difference but still a drop in precision

  • @NoHandleToSpeakOf
    @NoHandleToSpeakOf 2 місяці тому +1

    DeepSeek Coder V2 Lite is a MoE with just 2.4B active, it should be faster than Yi and Qwen. Must be something wrong with your setup.

  • @CosmicCannabist
    @CosmicCannabist 2 місяці тому +1

    Those cards are beasts!!! Nvidia hear my slow processing and low-vram cries...

  • @DerivsBank1ng
    @DerivsBank1ng 2 місяці тому

    Also if you could try out some of the new VPTQ quantized models, these look interesting for local inference or at least running in a Google Colab runtime with an L4 or A100

  • @aitrans
    @aitrans 2 місяці тому

    Why not Deepseek 2.5?

  • @nicktumi
    @nicktumi 22 дні тому

    Does anyone know of a tutorial for this to run on M1 macbook pro?

  • @Piotr_Sikora
    @Piotr_Sikora 2 місяці тому

    Make some test with bigger context window in coding

  • @anozarkian1
    @anozarkian1 2 місяці тому

    Dude, badass video.

  • @ListenGrasshopper
    @ListenGrasshopper 2 місяці тому

    I'm jumping in the Ai arena on safety and info. I have posted the first "semester" of Ai101 on my channel. 20 classes (episodes) on the players and basic info on Ai Agi for beginners and where the industry is currently at using NotebookLM called the 'Listen Grasshopper Podcast'

  • @vdpoortensamyn
    @vdpoortensamyn 2 місяці тому

    You could also try IBM's Granite 3.0, Matthew

  • @agustinpizarro
    @agustinpizarro 2 місяці тому

    It is impossible that 2 A6000 are so slow doing inference with a 7B model

  • @drwhitewash
    @drwhitewash 2 місяці тому +12

    We did a benchmark test of these local coding LLMs at our company, in our real setup in which we write real code. We used a test assignment so as to not disclose real confidential codebase (we were also assessing cloud based models such as chatGPT).
    Result was that using these LLMs actually make us less efficient, even compared to writing the code ourselves using just the IDE and its features.

    • @HCG
      @HCG 2 місяці тому +3

      Yeah I’ve heard from numerous professional developers that LLM coding isn’t good for much besides basic coding and completely falls apart when you try to use it for anything large or complex

    • @drwhitewash
      @drwhitewash 2 місяці тому +3

      @HCG exactly. Our test wasn't even that complex.
      Where it really shined and sped us up compared to manual, was writing documentation (javadocs) for the methods. That was quite accurate and saved quite some time.

    • @vassovas
      @vassovas 2 місяці тому +1

      Did you use open AI's o1 model in test out of interest? This seems most likely to help

    • @kb1flr
      @kb1flr 2 місяці тому +2

      I don’t doubt your results, but I will say that prompt construction as well as breaking the problem up into a sequence of steps designed to reach a goal makes a huge difference in the outcome.

    • @drwhitewash
      @drwhitewash 2 місяці тому

      @@vassovas it was one of the tested models. But this video is mainly about alternative LLMs that you could also run locally.

  • @kecksbelit3300
    @kecksbelit3300 2 місяці тому

    maybe include a first person 3d shooter for the real strong models. Where zombies hunt the player and even animate the zombies movement. I let o1 do that inside a single html and it got it done with 3 prompts. i'm sure with better prompting it could have been 1 or 2. o1 preview struggled much more i think that is a good test for next gen models

  • @evandroreisunreal
    @evandroreisunreal 2 місяці тому

    What not trying some other languages such as Go or Rust (probably they wouldn’t time out)? Thanks for the videos.

  • @moisesxavierPT
    @moisesxavierPT 2 місяці тому

    Hi Matthew, Given the fact that you have a Beast of a machine (and also the content is sponsored by Nvidia) I don't quite undestand why you didin't include nemotron:70b-instruct-q8_0 . It would for sure would provide better results.

  • @BradleyKieser
    @BradleyKieser 2 місяці тому

    WOuld love to you see do the same but with models that fit into 24GB VRAM (RTX 4090 compliant).

  • @kuromisu2223
    @kuromisu2223 2 місяці тому

    Matt, you're awesome... can you ask these models to process Web GPU / 3JS / GSAP tasks... this is the pinnacle of web ATM combined with the MERN stack + React.

  • @JGoodwin
    @JGoodwin 2 місяці тому +5

    For tests like this where you use local models, I think you need to increase the strength of the prompt context. Anyone expecting to run local models with weak prompts is likely to be disappointed.

    •  2 місяці тому +2

      I was thinking the same. I work with LLMs for work and we develop prompt templates, test, evaluate with inputs to improve successful outputs.

  • @ronitakhariya4094
    @ronitakhariya4094 2 місяці тому

    hello great content as always sir. i have a video idea. can you use these smaller or open source models but with good prompting and then compare the result with closed source models with simple prompting?

  • @TheAnical
    @TheAnical 2 місяці тому +1

    Claude 3.5 Sonnet is the BEST at coding.

  • @T___Brown
    @T___Brown 2 місяці тому

    Why not see how they do with other languages

  • @jmirodg7094
    @jmirodg7094 2 місяці тому

    cool idea, maybe go a bit more in depth: code quality efficiency, maintenability, etc...

  • @screamingiraffe
    @screamingiraffe 2 місяці тому

    They all fail hard when you give them a simple (to humans) powershell request.

  • @3238juan
    @3238juan 2 місяці тому

    YEAH!!!! best video till now.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    Multiple one board for each square with negative contrasts

  • @alex_great23
    @alex_great23 2 місяці тому

    just make different tops. and you can make all sorts of comparisons between the models. it's interesting to watch, especially since you have a cool PC now.

  • @gmt-yt
    @gmt-yt 2 місяці тому

    Wait, the assigment is "create a regular expression that..." and you told it "don't use the regular expression library". So is the assignment to create an untested regular expression that someone who did have access to a regular expression library could use to complete the assignment? Or to create a regular expression library from scratch, and then use it to complete the assignment? No wonder HAL won't open the pod bay doors, it appears to be GIGO.

  • @ashtwenty12
    @ashtwenty12 2 місяці тому

    Can you conduct an interview as if it was a human please. you can pick any model for this, but you should have a general set of questions and a context for the kick off.
    coding challenge should be in parts. eg make a in memory database, make it an api then and if you are feeling crazy make it do a docker compose file for the applicaiton

  • @yossifibrahem7687
    @yossifibrahem7687 2 місяці тому

    I wish you tried granite code by IBM

  • @AnneGnonymous
    @AnneGnonymous 2 місяці тому

    If you are looking for a hard puzzle to test LLMs, consider this one: "An ardent swain said to his lady love, some years ago, “Once when a week ago last Tuesday was tomorrow, you said, ‘When a day just two fortnights hence will be yesterday, let us get married as it will be just this day next month.’ Now sweetheart, we have waited just a fortnight so as it is now the second of the month let us figure out our wedding day.”"

  • @DeianGiNet
    @DeianGiNet 25 днів тому

    Seriosly Matt 2xA6000! ... we are not envious at all :)

  • @JINIRPA
    @JINIRPA 2 місяці тому +1

    I don't recommend qwen 2.5 7b
    As my experience it's can generate code but can't edit code or fix it
    I have given it a code and tell it to fix it but it's generate different code It's not related to the code you sent ut.

  • @madeniran
    @madeniran 2 місяці тому

    You feed it a leetcode question. You know it’s been trained on that.
    It could only get the code wrong if it hallucinates.

  • @Moyano__
    @Moyano__ 2 місяці тому +1

    I need Qwen 2.5 coder 32b

  • @peterwood6875
    @peterwood6875 2 місяці тому

    Write a program to solve the Rubik's cube in python

  • @lloydburley
    @lloydburley 2 місяці тому

    For me the biggest deciding factor is the cut off date for the models, although qwen2.5 is great its nearly a year out of date, which is just too much 🙁

    • @andrepaes3908
      @andrepaes3908 2 місяці тому

      Qwen 2.5 coder was release just a few weeks ago!

    • @MrWuzey
      @MrWuzey 2 місяці тому

      please check your "facts", the disinformation with so much confidence is annoying on internet, don't be one of them.
      Qwen 2.5 were released in September 2024.

    • @lloydburley
      @lloydburley 2 місяці тому

      @@andrepaes3908 That's true, but if you ask it what the training cut-off date was, it says 20241231, however, if I ask what version the Flutter SDK is, I get 3.10.5, which is from Jun-Jul 2023. So for me, it's around 12-18 months behind.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    With chess there are a limited number of moves now make the board 3D with the same rules

  • @hipotures
    @hipotures 2 місяці тому

    I am currently testing aideml with gpt-4o-mini and the results are better than with automl (autogluon 1h). I haven't tested longer than 24h yet, but it looks like it should be better.

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    GPS time stamp accuracy and echos

  • @MeinDeutschkurs
    @MeinDeutschkurs 2 місяці тому +2

    And none of these are able to handle German? Well. 70b is too tiny to handle German properly. Claude is really good in German, ChatGPT is close to Claude, but Claude is ahead. But what about the open source models and the multi language capabilities? Coding and or other models? Meta claims that llama 3.2 is able to understand German. No, it isn‘t, except you talk to it in SPO structure without splittable verbs (in German it reminds me to phrasal verbs).
    It‘s a mess! What a disappointment! What an exclusion of at least 100 million people.

    • @us_f4rmer
      @us_f4rmer 2 місяці тому

      eXcLuSi0n LOL, dude what do you expect, its made in USA not Germany, its not their fault that we are so behind in technology.
      At least 100Mil❓ not only is that wrong you and I know that, a big portion have never learned to speak/write in German 😆
      If you need translation, just use deepL, stell dich net so an wie's Kind beim 💩

    • @auriocus
      @auriocus 2 місяці тому

      Have you tried Qwen2.5 72b? The German is decent. It does occasional grammatical errors, though (wrong gender/declension on less often used words) Example:
      Erkläre mir, warum es nicht möglich ist, die Zinsen eines einzelnen Eurocent aus dem Jahr 0 abzuheben
      Die Frage, warum es nicht möglich ist, die Zinsen eines einzelnen Eurocent aus dem Jahr 0 abzuheben, beruht auf mehreren Gründen, die sowohl historisch, wirtschaftlich als auch mathematisch sind. Lassen Sie uns die wichtigsten Punkte betrachten:
      1. Historische Gründe:
      Fehlende Währung: Das Eurocent wurde zum ersten Mal im Jahr 2002 eingeführt, als der Euro als Bargeldwährung in den Eurozone-Ländern eingeführt wurde. Vor diesem Zeitpunkt gab es weder den Euro noch das Eurocent.
      Fehlendes Finanzsystem: Im Jahr 0 gab es in Europa noch kein modernes Finanzsystem, das Zinsen in der Form berechnen oder auszahlen konnte, wie wir es heute kennen. Zinsgeschäfte und Banken existierten in ihrer heutigen Form nicht.
      2. Wirtschaftliche Gründe:
      Feuerungs- und Nominalwert: Der Wert von Geld ändert sich über die Zeit. Inflation, Wirtschaftskrisen und wirtschaftliches Wachstum beeinflussen den Nominalwert von Geld. Ein Eurocent im Jahr 2002 hat einen anderen realen Wert als ein Eurocent heute, geschweige denn im Jahr 0.
      Zinsstruktur und Zinspolitik: Zinsen werden von Zentralbanken gesetzt und sind stark von der wirtschaftlichen Situation und den politischen Rahmenbedingungen abhängig. Diese Faktoren sind im Jahr 0 nicht vergleichbar mit denen, die heute bestehen.
      3. Mathematische und Logistische Gründe:
      Zinseszinsen: Zinseszinsen wachsen exponentiell, was bedeutet, dass ein sehr kleiner Betrag wie ein Eurocent über eine lange Zeitspanne (von 2023 bis ins Jahr 0 sind es etwa 2023 Jahre) zu einem astronomisch hohen Betrag anwächst. Dieser Betrag wäre so hoch, dass er die gesamte weltwirtschaftliche Produktion mehrfach übersteigen würde.
      Praktische Herausforderungen: Selbst wenn es theoretisch möglich wäre, die Zinsen von einem Eurocent über die gesamte Zeitspanne zu berechnen, gäbe es praktische Herausforderungen. Wohin sollte der Betrag eingezahlt werden? Wie sollte er gespeichert werden? Welche Institutionen könnten ihn vermögen?
      4. Rechtliche und Institutionelle Gründe:
      Fehlende Verträge und Rechte: Ohne einen formellen Vertrag oder Rechtstitel, der die Auszahlung von Zinsen regelt, gibt es keine rechtliche Grundlage, um Anspruch auf diese Zinsen zu erheben.
      ** gesetzliche Rahmenbedingungen:** Heutige Gesetze und Verträge haben eine bestimmte Gültigkeitsdauer. Langfristige Verbindlichkeiten über Jahrtausende hinweg sind nicht vorgesehen und würden in der Praxis nicht durchsetzbar sein.
      Zusammenfassung:
      Die Unmöglichkeit, die Zinsen eines Eurocent aus dem Jahr 0 abzuheben, resultiert aus der historischen Nichtexistenz des Euro und des modernen Finanzsystems, den enormen wirtschaftlichen und mathematischen Herausforderungen, die mit einem solchen Unterfangen verbunden wären, sowie den fehlenden rechtlichen und institutionellen Rahmenbedingungen.

  • @OghamTheBold
    @OghamTheBold 2 місяці тому

    Code an unmaintainable game in Haskell with multiple redundant backdoors and a _random number_ of deliberate 0-day vulnerabilities

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    With the fourth dimension being time

  • @ph0rd
    @ph0rd 2 місяці тому

    Just fyi saying "going head to head" is sufficient for 3 or more competitors.. You don't need to add an extra "to head" for each competitor ... Good thing there weren't 7 competitors 😂

  • @Jacekkk-qb1zd
    @Jacekkk-qb1zd 2 місяці тому

    Great video

  • @samueltucker8473
    @samueltucker8473 2 місяці тому

    4G with the forth dimension being time

  • @vivekkarumudi
    @vivekkarumudi 2 місяці тому

    Please do one video on supernova modell

  • @AlonzoTG
    @AlonzoTG 2 місяці тому

    1. The RTX A6000 is NOT the RTX 6000 ; 2. I also have a Titan RTX in my rig so I'm up around 115gb vram so =P

    • @Ivanova-j2n
      @Ivanova-j2n 2 місяці тому

      There is old Quadro RTX 6000, then there is newer RTX A6000, and there is latest RTX 6000 Ada which is used here.

  • @bdouglas
    @bdouglas 2 місяці тому

    With all that GPU horse power, why didn't you try a massive model just for fun?

  • @mrchongnoi
    @mrchongnoi 2 місяці тому

    Maybe you can try better test. Most of us are not writing snake games. I used Anthropic to build app that interfaces the OpenAI real time audio using Golang. I am using go routines.
    I do not see your snake game as a valid test.

  • @teebu
    @teebu 2 місяці тому

    i want to see a model write doom