OpenAI o1 VS Sonnet 3.5 in Coding Physics Games - AI Showdown

Eduards Ruzga

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 гру 2024

КОМЕНТАРІ • 105

@EduardsRuzga 3 місяці тому ⁺³
Links:
If you prefer this as a blog post here is the Medium blog post, give it some claps:
wonderwhy-er.medium.com/openais-o1-vs-sonnet-3-5-round-one-in-comparing-their-coding-abilities-583e578250d9
Failed Claude artifact:
claude.site/artifacts/3949a0f2-3597-4ad7-99c9-3220a9415a42
Failed WebSim:
websim.ai/c/euVVHmpCGkFyNxczp
ChatGPT o1 result:
codepen.io/wonderwhy-er/pen/qBzwjRZ
And here is the chat
chatgpt.com/share/66e34a3d-1300-800f-b3cb-b81388412164
WebSim reusing and improving o1 result:
websim.ai/@wonderwhy_er/gta-2-style-parking-simulator-with-settings?r=0191e7d4-fd79-734d-811b-05570562da5e
ChatGPT o1 failed 3d variant:
codepen.io/wonderwhy-er/pen/zYVXRgQ
Here is the chat for the 3d game
chatgpt.com/share/66e42ded-c59c-800f-aee2-bd8635b95567
@xCDHx 3 місяці тому ⁺²¹
Bro absolutely gold of a video, especially because I've been trying to create a version of baseball tomy pocket game via HTML, java and CSS. I am not finished but this has helped me realise somethings. My current workflow is going from claude to chatGPT and vice versa.
Thanks for this
@EduardsRuzga 3 місяці тому ⁺²
Mine is usually to start in websim and go to my server commander later.
But here, it may become o1 -> websim -> server commander.
@spazneria 3 місяці тому ⁺³⁹
Saying that going from 13% on math to 83% on math is only a 6x improvement is overly simplistic. The closer that you get to 100% the harder it is to improve. I'm not saying that it's a direct 100x improvement - but by your logic it would only be possible to have like a 7.5x improvement (I think, unless I'm misunderstanding you). Those benchmarks are inherently asymptotic as they have a set ceiling. Horizontal asymptote.
Edit: I also feel like it's important to say that these models (I really on have in-depth experience with Claude, but I'm guessing o1 preview would be as good or better) are better at coding than one-shot tests show. I think that at this point it might even take more effort to code exclusively with these things than for a good developer to just do it the old-fashioned way (copy paste). However, I've found three things: 1 - Breaking the code up into small modules, each of which handles a specific part of the logic of your project, helps the models to not get confused. If you want to code with these things you have to understand that the most important thing is to separate your project into chunks of logic that are small enough for the model you're using to handle. 2 - You need to separate the data from the code, and establish standardized structures for your data. 3 - The best way to get the results you want out of a model is by 'rubber duckying' it - have regular brainstorming sessions with it where you go back and forth in natural language to ensure that the model understand the logic flow you want. Like I said, I think that at this point they can do way more than people realize because using them to their full extent is a skill which requires time and effort to develop. However, I think that as more people get their hands on these tools and try harder to learn how to use them properly, people will be very surprised.
@EduardsRuzga 3 місяці тому ⁺⁶
You are right! Sad I can't pin two comments.
I was just using numbers as is, not digging deeper.
Sad UA-cam does not allow pin 2 comments. Yours should be at the top.
@spazneria 3 місяці тому ⁺⁴
@@EduardsRuzga Lol, thank you - no worries. I liked your video - don't take my criticism to mean I didn't! Cheers.
@EduardsRuzga 3 місяці тому ⁺⁵
@@spazneria ouh, you expanded your comment!
I agree with that new part 100%!
Now I MUST pin you :D
What you describe under 1. is what I call "Divide and conquer"
But there is a software development principles set called SOLID.
In it S stands for "Single responsibility" and it's the same thing you describe.
You should chink your code into single responsibility chunks. This makes it way easier to manage and plug and play.
AI systems should work the same way. Current tools only scratch the surface of what is already possible.
We currently are trying to adapt LLMs to work as humans do. The moment we start making integrated development environments not for humans but for LLMs, and then fine-tuning LLMs on using these LLM-friendly IDEs... That will be a crazy moment.
@spazneria 3 місяці тому ⁺⁴
@@EduardsRuzga That's awesome! I'm happy to know that my findings align with established principles. To be honest with you, I don't have any programming experience, and I hesitate to share that because I worry that it invalidates my opinions on the usefulness of the models - I don't have a good reference for what 'good' code looks like. However, I think that approaching it from this fresh standpoint almost gives me a sort of advantage. I approached it differently from software developers who already have established ways that they do things - I didn't have to learn how to adapt my current practices to fit the tool, I simply had to learn how to use the tool. Only time will tell
@kristianlavigne8270 3 місяці тому ⁺²
Going from 75 IQ to 150 IQ is only a 2x improvement… 😅
@somebody3 3 місяці тому ⁺²¹
Good attempt, but the prompt is somewhat confusing and unspecific. A human couldn’t create a game based on such incomplete specifications, so it’s even more impressive that 1o could.
@EduardsRuzga 3 місяці тому ⁺³
Yeah, I wrote about it on LinkedIn.
I think prompt engineering is going away as models get better and do self-prompting.
Was arguing for it since the summer of 2023.
First, there were CustomGPTs that incorporated meta prompting. Then Claude started to release features around it where Sonnet 3.5 outputs hidden thinking where it self-prompts.
And o1 is another step in that direction, but here model was actually trained to do it.
o1 is a good example of "self prompting" where it solves what the user wrote as some system of equations writing a better prompt along the way.
In that sense, I approach prompt writing like a system of statements(equations) and let LLM figure out derivatives from it through self-prompting. And I prefer for tests to keep it simple to see how LLM can do this deduction.
You saw what it wrote out of my prompt, it was understood correctly, right?
I myself also play with it for a while, have a bunch of such self prompting Custom GPTs
chatgpt.com/g/g-0KzBw6cGv-expert-creator
chatgpt.com/g/g-wnKjTKcBc-expert-debate-organizer
chatgpt.com/g/g-ES4r8YPEM-can-i-trust-this
and more
@charliekelland7564 3 місяці тому ⁺³
@@EduardsRuzga This is one of the more admirable traits of LLMs - that they can level the playing field for those for whom English is not a native language 😉
@gabrielsandstedt 3 місяці тому
I think you feedback stating issues with the positioning of the wheel also made it confused. Saying that they where rendered 90 degrees wrong in the forward axis of the car I think would be more easy for the llm to understand. Other then that good showcase of the model :)
@EduardsRuzga 3 місяці тому
@@gabrielsandstedt I actually am iterating sense and to my surprise, no matter how I prompt it it was failing. It seems it gets confused about the positions of objects between the physics engine and 3d rendering one, their nesting, and so on.
I am not surprised about that. I am rather surprised it still can reason about PhD physics problems.
I think, maybe o1 could get it right though. If I tried to iterate with it more, 30 requests per week make me anxious to waste it on back and forth.
Meanwhile, I needed to go and fix it by hand here
websim.ai/@wonderwhy_er/3d-parking-simulator-with-3-cars-different-wheel-r
@PhunkyBob 3 місяці тому ⁺¹⁰
You can use GPT o1-mini for coding, it's faster and almost as good.
Tip: when you want to do something "easy" (for instance "put the code all together in one block"), you can switch model inside the same chat session, so you don't use your credits for something too easy.
@EduardsRuzga 3 місяці тому
I did test it too and it performed worse for me than o1-preview.
But need to test more.
In my case that they are not available in custom gpts limit how much i use
@EduardsRuzga 3 місяці тому ⁺²
I just tried to switch between o1 and gpt4-o with internet access and it works like a charm. Interesting way to use it.
You can call in custom gps too.
I may do a small video on that.
@alan83251 3 місяці тому ⁺²
Nice! Hope that open source models are able to do this soon.
@EduardsRuzga 3 місяці тому ⁺¹
@@alan83251 honestly something feels weird about the model. Its good af specific things. Its kinda C level ir something. Needs executive probkem summaries to make plans to pass to other models or something. There seem to be many tasks at which other models are better.
@jomfawad9255 3 місяці тому ⁺¹²
These models need live screen share so they can see what they are doing, but i was really impressed by o1 just imagine what o2 o3 etc... can do👀🔥
@EduardsRuzga 3 місяці тому ⁺⁸
Well, my feeling from them is that they get 2x deeper with each year. But deeper means better how deeper they can dig into problems.
Also, it takes ~2 years to release next-gen so far. So o3 is in 4 years.
Considering the difference in depth between gpt2 gpt3 and gpt4. and now o1 getting to PHD level problem-solving in some areas... o3 does feel like AGI.
Make it multi-modal or a mixture of experts. You can already use o1 with gpt4o in tandem to give it vision kinda.... Really feels like AGI if they can keep up this progress.
@robertolanzone 3 місяці тому ⁺⁶
Let's not forget this is just a "preview" as of now, not the actual o1 model yet
@sztigirigi 3 місяці тому ⁺¹
So basically the software architect does the heavy lifting and juniors implement the rest. Awesome. Subscribed.
@EduardsRuzga 3 місяці тому
@@sztigirigi yep, you described title of one of next videos i am thinking about )))
@ShpanMan 3 місяці тому ⁺⁴
"It's not human level" - certainly not, I couldn't even write the prompt in the time it takes it to give a working solution and code...
@EduardsRuzga 3 місяці тому ⁺³
Haha, we used that phrase in completely opposite ways :D
You speak of speed it does things, and I spoke of depth. Its depth is not human level, humans are better. But it is definitely faster than humans at the depth it can reach.
@ShpanMan 3 місяці тому ⁺¹
@@EduardsRuzga Yea, that was the joke 😄
The rate of improvement is insane and OpenAI employees are outright saying improvements with this paradigm will come even faster.
So it will beat humans at almost everything (not only speed of reading, thinking, coding, and writing) very soon 😉
@EduardsRuzga 3 місяці тому
@@ShpanMan well, there are many elements here. First, I am bit sceptical on how fast it will be. I bet that it will take another 5 years for it to become true AGI.
That is still faster then 100 or 20 years as some sceptics say, but it is not all fast as 2025 as some hypers thing.
Some of the reasons are not technological.
There are regulations, there is adoption by humans, but these systems also feel far from general.
They will reshape knowledge and work hard in the coming years though as they are becoming better than humans in some curious areas. It's very hard to put a finger on that jagged frontier of where they are exceptionally good and exceptionally bad at the same time :D
I am not sceptic but I am sceptical of AGI in next couple of years :D
@Anders01 3 місяці тому ⁺²
Wow, what OpenAI o1 generated as the first attempt at the 2D game is pretty impressive. It might look simple but quite a lot of reasoning is required to even know how to draw the wheels etc. I see it as the first early results of generating games, like the early video game Pong. Exponential AI progress could potentially lead to very advanced generated games in just a few years from today.
@EduardsRuzga 3 місяці тому ⁺³
YES! And I am super excited, I was thinking of using games for education for a decade, but the price of making them was prohibitive for that use case.
AI can bring this into the realm of possibility. Imagine AI made World of Warcraft but for education. This makes me super excited... Kids learn with the same engagement with which they play computer games. What world would that be?
@DavidGuesswhat 3 місяці тому ⁺¹
VR games would be awesome. and the BCI games !
@EduardsRuzga 3 місяці тому
@@DavidGuesswhat do you own a VR headset?
@DavidGuesswhat 3 місяці тому
@@EduardsRuzga yes I do. Whbu?
@EduardsRuzga 3 місяці тому
@@DavidGuesswhat thinking about quest 3
@ozzietradie6514 3 місяці тому ⁺⁴
That's impressive
@mirek190 3 місяці тому ⁺³
impressive .. that o1 preview is not even full version of o1
@EduardsRuzga 3 місяці тому
Yeah, full o1 release + allowing it to use tooling(search, code interpreter, custom gpts actions) could be crazy.
Though these tools may not come in the near future but its already possible to use them in Tandem with gpt4o to kinda give it access to that.
@ThrowBackZone 3 місяці тому ⁺²
Can we talk about how much fun it is to watch AI try (and sometimes fail) at complex coding tasks? 😂
@EduardsRuzga 3 місяці тому
Depends. Some fails are boring and waiting times are long :)
@ZalinaNahaylia 3 місяці тому ⁺³
Can't wait more update from Claude. o1 is smart but slow down
@EduardsRuzga 3 місяці тому
Well, Claude did not release their Opus 3.5 and Haiku 3.5
Sonnet 3.5 was and is impressive.
The problem with Opus is speed and price.
So I don't expect it to be a better price/speed/quality. Comparable may be.
I was waiting for Haiku 3.5 too. Haiku 3 was already impressive in comparison to GPT3.5. But now GPT4o mini is out, and now o models.
This is gonna be tough to beat :)
@charliekelland7564 3 місяці тому ⁺²
Great video!
On the face of it, it looks like this could be a good model for orchestrating agents. Have you tried that yet?
@EduardsRuzga 3 місяці тому ⁺³
Nope, it just came out :D but it looks like a good high-level thinker in comparison to what we had before, while it feels like a waste to use it for small details.
I do see a big potential for using it as an initial plan architect that other models follow afterward. So feels like you hit the nail on the head here.
@CitiesTurnedToDust 3 місяці тому ⁺²
If I understand right, that's close to what it was actually designed to do.
@trenfa4371 3 місяці тому ⁺¹
can you compare claude 3.5 sonnet with Microsoft copilot precise version for complex maths problems solutions ..
@EduardsRuzga 3 місяці тому ⁺²
I actually stopped using Copilot for some time. Isn't it just a GPT4 mode under the bonnet? I could be out of sync on what is happening with it.
@EduardoA775 3 місяці тому ⁺²
Great review!
3 місяці тому ⁺⁴
Imagine if they put 8 o1 model together like what Mixtral 8x22b did, it will be *EXPENSIVE* but also extremely accurate
@EduardsRuzga 3 місяці тому ⁺²
Mixtral combines 8 different models and routes between them. Its like instead of running one large generalist you run 8 small specialists.
I can rather imagine o1 being a CEO, writting a plan and picking small soecialists to execute its parts.
@JackieUUU 3 місяці тому ⁺¹
the last code may success if you create the project from ground up as per ChatGPT’s instructions, not in a html simulator ( as some important component cannot be simulated in the simulator, eg three.js)
@EduardsRuzga 3 місяці тому ⁺¹
Yeah, may be if one would iterate more, I did succeed afterwards.
This is round one vidoe.
I think I will do round two later.
@BardicReels 3 місяці тому ⁺³
I believe that is the 01-mini (for the preview) not the main 01 right ? The mini is less able by design and what they are releasing publicly first.
@EduardsRuzga 3 місяці тому ⁺¹
I used 01-preview. Not mini.
Here is the chat for the 3d game
chatgpt.com/share/66e42ded-c59c-800f-aee2-bd8635b95567
And here chat for the first prompt
chatgpt.com/share/66e34a3d-1300-800f-b3cb-b81388412164
@ilyass-alami 3 місяці тому ⁺¹
What is the difference between o1 preview and o1 ????????? Please clarify , i know o1 preview and o1 mini But what about o1 ???????
@EduardsRuzga 3 місяці тому ⁺⁵
As far as I know o1 was not released. I used o1-preview. One number I saw was that o1-preview hits 60% at math and o1 hits 83%
Basically even more powerful model is coming.
@ilyass-alami 3 місяці тому ⁺³
@@EduardsRuzga
Thanks for the information bro , Hope open ai launches these two models in the free plan Very soon
@EduardsRuzga 3 місяці тому ⁺³
@@ilyass-alami it's kinda expansive, I don't expect that in the "soon" category.
I mean even paid users get 30-50 calls a week.
That's like 120-200 calls a month or 6-10 calls per dollar kinda. I would rather say that they will work on some kind of o1-turbo that is faster, smaller cheaper, and will be released in 6 months and available for free users. that is their usual pattern.
@cbgaming08 3 місяці тому ⁺¹
Offering a limited free tier versus having no free tier at all: the eternal dilemma of SaaS pricing strategies 😅
@EduardsRuzga 3 місяці тому ⁺¹
Well, its a preview, it's expansive, and they are doing a gradual release. I don't think this model will be free in the next 6 months. Some kind of turbo fine tune after 6 months may be.
@djayjp 3 місяці тому ⁺⁵
6x improvement is an improvement of 500% 😉
@JAK85. 3 місяці тому ⁺²
🤓 but true
@EduardsRuzga 3 місяці тому
Well, 100/50 = 2x = 200%
83/13 = 6,38x = 638%
Where did you get 500%?
@spazneria 3 місяці тому ⁺³
@@EduardsRuzga It's me again, I'm sorry but I'm bored and I like math. In your first example - 100 is double 50, so 50 is 50% of 100. However, when measuring improvement you aren't measuring it relative to your final position, you're measuring it relative to your initial position. 100 is 200% of 50, and 50 is 100% of 50. So, you have to add 100% of 50 to 100% of 50 to get to 100. So it is 50 + (1.00 * 50) = 100, ergo doubling is a 100% improvement. If your money goes from 100 to 150 over a year in the stock market, your annual return is 50%, not 33%. So the actual improvement is 83-13 = 70; 70/13 = 538%
@somebody3 3 місяці тому ⁺¹
@@EduardsRuzga100 -> 600 = 6x but 500%.
@agungbuana6796 3 місяці тому
@@EduardsRuzgaif you think that the progress is linear than you right. But i think the improvement in math should not be viewed linearly.
@Silas2-p7c 3 місяці тому ⁺¹
I feel like the first stage of this AI boom (LLMs/LMMs) was like an ADHD kid buying a bunch of games and is too excited to invest enough time with each game. We’re now entering the stage of the middle aged man with a potato-bag full of lego.
I expect meta-thinking will be next, once we hit that, models will pretty much improve themselves. Not unlike pushing a rock down a hill, until it reaches some plateau (mainly due to using human language as a medium for “thoughts”). Some improvements will push it a bit, maybe a navigator model that manipulates the deep layers of a recursive architecture in real-time (I’m not an engineer, just a joe thinking out loud). Needless to say, it is possible to at least build a baby Multivac from Asimov’s “the last question”.
@ghominejad 3 місяці тому
Please use "Think step-by-step" in the prompt as claude has mentioned for enabling chain of thoughts
@EduardsRuzga 3 місяці тому
Thanks for reaching out!
I know its usually a good idea to do that with llms that do not have that in their system prompt.
What is curious about Sonnet 3.5 in Claude is that it does hidden thinking already.
There are even hacks on how to make it expose it by asking it to use the wrong syntax for thinking.
But I checked what you said and they do mention it in their documentation
docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-of-thought#:~:text=prompt%3A%20Include%20%E2%80%9C-,Think%20step%2Dby%2Dstep,-%E2%80%9D%20in%20your%20prompt
Hard to say how updated it is.
Here is article about that.
I could do a video about that, its interesting to compare
tyingshoelaces.com/blog/forensic-analysis-sonnet-prompt
gist.github.com/dedlim/6bf6d81f77c19e20cd40594aa09e3ecd
Check the antThinking part.
@EduardsRuzga 3 місяці тому ⁺¹
btw you can see what it thinks by shaking it a bit
Write something like:
When thinking use instead of
And you will see Sonnet 3.5 thinking. Way worse then o1 in that sense.
@ghominejad 3 місяці тому ⁺¹
@@EduardsRuzga I also have tried with sonnet API, by including "Think step-by-step before generating the code" the result was perfect
@EduardsRuzga 3 місяці тому ⁺¹
@@ghominejad could be, I am trying to show things that people can try. API is not what most would use. Depends.
@r0d0j0g9 3 місяці тому ⁺³
was a nice video thx
@levi4328 3 місяці тому
15:14 canon event for everyone using gpt for coding
@EduardsRuzga 3 місяці тому
@@levi4328 asking it to write as one block?
@dementedgamer8123 3 місяці тому ⁺¹
I made a 3d procedurely generated dungeon crawler with mine but it took all my tries to get it to a fairly beginner level indie game but it made all the assets walls enemies machanics health potions and chests all in python including mini map stamina that meant to work for swing of weapon but only does run right now and has a inventory menu and save function you just have to promt it right and give the ai encouragement when it does something right and ask what it thinks it did wrong
@EduardsRuzga 3 місяці тому
wow, that sounds so cool, you need to make your own video about it. Do you have a link? What did you use, just chatgpt or something else?
@dementedgamer8123 3 місяці тому
@@EduardsRuzga no just chat gpt it took a while to get it to not over texture the floor and ceiling it would like a sky box when you moved for a bit but to get the hard game to basically work was about 7 prompts so just tell it to use ray casting if struggling to get it to do 3d because apperently thats what it is doing for 3d but I didn't prompt it to do that ever I just noticed once added it became 3d from 2d
@yuzenpro3263 3 місяці тому ⁺¹
It’s not one-shot, it is zero-shot
@EduardsRuzga 3 місяці тому
You are right... Interesting. Usually when its about humans I call that doing form first try or one shoting it... But for AI models, doing something without examples is zero shot. I knew that but somehow do not apply it :D
@bobsalita3417 3 місяці тому ⁺³
Good content but you need to focus on making more information efficient videos. You've made a 20 minute video when it should have been 5 minutes.
@EduardsRuzga 3 місяці тому
@@bobsalita3417 i know. Suffering from how long they come out, if its semi live demos. This one was faster as i was not showing first generations, showed ones i did before recoding.
@JD_2020 3 місяці тому
Why didn’t you try WebGPT🤖:(
@EduardsRuzga 3 місяці тому ⁺¹
Well, the last time I tried it it did not perform as well as Claude and WebSim
Also, it's limited to p5.js makes it feel less broadly useful.
I gave it the same prompts as in the video yesterday and it did okay.
For anyone looking at this comment
Here is ChatGPT custom GPT in question
chatgpt.com/g/g-W1AkowZY0-no-code-copilot-build-apps-games-from-words
And things it did with prompts from video
I used the one after another so it overwrote 2d with 3d variant.
3D variant, rotate with the mouse
plugin.wegpt.ai/dynamic/6a81fabb_GTA2StyleParkingSimulator/index.html
@livenotbylies 3 місяці тому ⁺²
You are using a physics engine to teach your wife how to park 😅 based
@EduardsRuzga 3 місяці тому ⁺²
@@livenotbylies yeah, I am engineer and worked in game dev before. If you have a hammer everyting looks like a hammer )))
@livenotbylies 3 місяці тому
@@EduardsRuzga yeah, as a fellow hammer-haver, I totally get it. Some people who don't have hammers need help with parking 🔨
@AlekseiGanzha 3 місяці тому ⁺¹
ай лайк дыс кайнд оф сынгс анд гонна вэйт некст видео
@EduardsRuzga 3 місяці тому
Что именно тебя заинтересовало? Что бы хотел видеть в будущих видео?
@AlekseiGanzha 3 місяці тому ⁺¹
@@EduardsRuzga интересен именно кодинг на o1 в сравнении с другими моделями, не особо много есть видео на эту тему пока. скажем, изменит ли как-то заметным образом появление o1 подход к кодингу. я пока сам не тестил о1, тк в отпуске) например, 4o меня абсолютно не впечатлял в повседневном кодинге (может я неэффективно им пользовался 🙂), но опенаи преподносят о1 как в разы более умную модель.. И кстати, заодно тогда напишу, что в целом лично меня интересует. На ютубе некоторые как будто бы авторитетные спикеры говорят, якобы они еще до о1 чуть ли ни перестали кодить сами, а только писали промты аи и тд. Может, я не совсем понимаю, как правильно юзать аи, но мне вообще не помогало до сего момента почти никак. Все эти автокомплит-подсказки вообще мимо, подсказки по рефаку - ну, почти бесполезны.. У меня установлен плагин codeium в webstorm. Даже в плане ускорения работы это не помогает, тк, ну, проще сам код написать, чем промт, а после править полученный код. Так вот интересно, как эффективно повседневно юзать аи при работе над большим веб-проектом в продуктовой разработке. Если что, я миддл+, мб, для кого-то сеньор фронтендер.
Я вообще очень люблю задавать 4о разные вопросы, в которых я ничего не понимаю) но в кодинге я понимаю, и тут он пока мне был бесполезен. Хотя кстати в последнее время я начал иногда узнавать у 4о как правильно применить какой-нибудь новый пакет, чтобы не читать документацию, если тороплюсь) В принципе, он для меня как прокачанный поисковик, который умеет резюмировать
@EduardsRuzga 3 місяці тому ⁺¹
@AlekseiGanzha Спасибо за развернутый ответ. Ты пробовал WebSim?
Мой опыт пока что такой что я начинаю новые вещи с АИ и потом приходиться доделывать.
В основном потому что промптами АИ бывает сложно что то объяснить с одной стороны
С другой стороны нет тулов которые дают ему возможность дебажить и тестировать.
Получаеться быстрее самому.
Но по сути в многих случаях мне АИ помогает начать работать над проблемой иногда сохраняя мне часы работы в поиске правильного решения
Типо доказать что это как то работает и нада допилить
Ну и тоже самое когда наткнулся на проблемы в существующей задаче
Он этакий stackoverflow + google + github + bullshitting brainstormer
но по факту этого достаточно так как очень много работы которую делают кодеры это boilerplate
он по сути становиться неплох для этого + поиск подходящих решений
так что сказать что я вообще не пишу код это вранье
но вообще начинает напоминать 50% 50%
Есть такая тула Aider
она код пишет
когда пишет она его комитет в гит под своим именем
Ее автор ее использует что бы ее саму писать
И он репортит % кода котоый пишет он и Аидер
У него часто 50 50
что касается o1 я пока сомневаюсь
он может хорошо писать планы и документацию
скоро хочу его посадить писать документацию для одного проекта и посмотреть что выйдет
но по части каких то задачь связанных с кодом...
тут была инфа что он может не очень хорошо справляться с большим контекстом
нужно тестить
время займет разобраться как им пользоваться и когда
Планы пишет хорошие
@AlekseiGanzha 3 місяці тому ⁺¹
@@EduardsRuzga Взаимно спасибо за подробный ответ) WebSim не использовал, но попробую обязательно, когда закончится отпуск), спасибо за совет! В принципе, наверно, если у меня получилось не так плохо разобраться в кодинге без ИИ, то и с ИИ тоже получится, так что не буду переживать)
@kecksbelit3300 3 місяці тому ⁺²
I don't like the 1 shot test or "big" queres to test a model. When i code using sonnet i first provide all the context it needs and than start breaking down the problem in multiple steps for it to execute and build the app step by step. I feel like this is how you are supposed to use the models and how they work best und you yourself understand the code best. While the oneshot is impressing i feel like sonnet is more stable in providing quality code when working with real bigger projects. Maybe that's just the lack of testing from my side but with the current message cap on o1 preview it's not useable anyway. It get's intresting when we can actually use it a lot more or even o1 normal releases with a high message cap
@vickmackey24 3 місяці тому ⁺¹
It's not quite at "human level" because it can't write a complex, fully working game in ONE SHOT after 30 seconds of thinking? LOL! There isn't a single human on the planet that could even come close to doing that. 😆
@EduardsRuzga 3 місяці тому
@@vickmackey24 well, speed yes, breath of its kniwladge too. It is super human. I cant yet test it on iterative tasks but i suspect that it cant write a game in 48 hours on its own.
Even if well integrated with debugging etc. And human can. Its not about speed, its about reaching goals on its own.
@Dron008 3 місяці тому ⁺¹
Cool video but you English will sound much better if you stop saying "W" in "write".
@EduardsRuzga 3 місяці тому
@@Dron008 yeah. Lot of work to do in that department. Thanks for comment.
@nvkzmaks 3 місяці тому
бля.
@andrewhtrading 3 місяці тому
Your prompt is s**t. "Create me a mini Grand Theft Auto game using HTML and Javascript to run in browser. Control the car with arrows" in Sonnet 3.5 and does the same thing, 1 shot, run-able as artefact.
@EduardsRuzga 3 місяці тому
Yeah, I don't argue that it's a garbage prompt But I was comparing all 3 variants on one prompt. So comparison still stands.
Claude did not work for me atm, some capacity issues for free version.
Websim link with your prompt that uses same model:
websim.ai/c/HrvbQ9DZk9buRwlxf
and with your prompt, here is o1 try
codepen.io/wonderwhy-er/pen/OJeegPR
I would argue so far your prompt is worse then mine, will test Calude more later.
@EduardsRuzga 3 місяці тому
Here is Claude variant with your prompt :D
claude.site/artifacts/a48d985b-0203-4611-9d8b-937ebc480310
Did you get a similar result?

Наступне

Автоматичне відтворення

Pushing beyond WebSim Limits: What Happens When WebSim Apps Get Too Big? (Part 1)