Next token prediction needs contextual understanding to do well. Scrabble doesn’t. All you have to do is make letters of words cross into other legal words. The game has nothing to do with the meaning of words. Next token prediction is like a fill-in-the-blank test where you need context to score high. On top of that, O3 uses reinforcement learning from chains of thought to train on the correct lines of reasoning. These are completely different games. And it doesn’t even need to scale more to beat programmers The process of distillation is easy compared to scaling things up so we’re definitely gonna get cheaper models. Unfortunately.
Yeah this actually a decent comparison, but it seems some people are focusing on the ‘not understanding’ and not the ‘he won the tournaments’. Just because it doesn’t ‘understand’ doesn’t mean it isn’t better than a large majority of programmers. It is just the truth, I am a solid coder with post grad CS work. With the assistance of AI, mostly with roo-cline and it has made me incredibly efficient! I’m getting small software projects done in days instead of weeks. We aren’t at ‘your grandma can toss in a prompt’ but it is already a solid pair programmer. And if you are a programmer and aren’t learning how to pair program with an AI, you are going to get lapped by less experienced devs who are.
AI has all the same issues as parallel scientific computing. It took 15 years to get from petascale to exascale computing. Yes, that's 1000 times more raw operations. That doesn't mean you can do matrix multiplication 1000x faster, it mainly means you can do matrix multiplications for bigger matrices. So, let's say you make the parallel parts of using a LLM model 1000x faster (that's is a massive ask), and ninety nine percent (that's generous by the way) of the model computation is parallel. You get 90 times total speedup. You are not getting massive speedup in *both* model size and execution time. Amdahl's Law is just that. This ignores the issue of the power it takes to run these systems. It's just a given you will be using more power. None of the current work even addresses another critical step: Showing your work. Why do you think this tissue slide is 96% likely to have this pathology? You need to know to rule out tissue processing error, containments and so on. This was considered for quite sometime to be a key part of general intelligence, not sure why it is getting pushed to the wayside. Of course, it's not an issue in machine learning, because it is a given that you are creating a predictive model, not a model of the process of predicting a thing itself. At some point, you have to ask what doing all the linear algebra is best used for. Is it for modeling climate, protein folding, drug development doing the things the humans have no chance at, or is it trying to do what humans do pretty well already. For me, don't need the chatbots and generated art, music and crap articles. I really don't. The research will move forward, but not all ideas are worth really bring out of the lab really. There's always value to making people afraid about their jobs and livelihoods. "Oh, we will replace your pesky labor force" is a pretty good selling point. But it's all a race to the bottom. Have a bot for sales pitches. Have a bot for support. Have a bot for HR. Have a bot for coding. The only thing that nobody seems to replace with a bot is the people that have to money to buy all the bots in the first place.
" it is a given that you are creating a predictive model, not a model of the process of predicting a thing itself. " Really insightful, haven't thought about it like this.
i dont know why they keep banging their head against the ‚i can code‘ wall and dont solve the annoying problems in sw dev like: - tracking issues - finding defects - reviewing code - communicating to customers - coordinating teams - combing backlogs - calculating value creation - f-ing write my worklogs i could go on..
The problem is they don't need 10000x faster and cheaper. They only want it cheap enough so they can use it as a teacher for smaller model. You can see a Llama 3.3 70B out performance 3.1 405B. A lot of things will come next. I don't think AGI will happen any time soon. LLMs critical flaw is reliability and lack of real world model, they absolutely can solve phD math but also fail on basic common sense question or hallucinate crazy answer in long context. Human still is the driver now. Put it in your tool box and learn with your heart anything interest you now.
Once/If AGI happens, it's *every* white-collar worker who could be replaced, not just SWE. So, I don't worry about that too much, no one is alone here.
@@NoX-512 Yep. And if that happens and all white-collar workers switch to trades like plumbers etc, you're gonna have a surplus of blue-collar workers driving the price down massively. So yeah, no one is alone in this or isolated from this problem. We are all in this together after all.
@@Repa24 I wonder if it will ever happen due to economics - if consumers have no jobs, then they have no money, and if consumers have no money then the money being pumped for the compute of AI will disappear, and so how will AI run? Will governments agree to just freely give away resources to run it?
@@AskoNomm-vq9gc That's why all the frontier AI companies try to push for UBI. Altman ran the largest UBI experiment to date. People think UBI will never happen because it's too utopic and they would never care let us have that kind of freedom. I think it will happen precisely because they're selfish and want to protect their profits at all costs. This will benefit us incidentally, the same way the industrial rev took us from 97% of the workforce doing backbreaking labor non stop and still barely produce enough food to 0.2% producing enough to feed the planet twice over. Nobody did that out of the kindness of their hearts, artificially increasing scarcity just wasn't as profitable.
I try so hard to make this point. Hardware AND software need to get x10K faster for these to be a thing for the layman. The problem is, this is not for the layman.
And that's the insane part. It's one thing to make a 10x improvement, that can happen, it has happened. But the days of hardware improvement by 100x over 10 years is just sort of over. Unless if there's some crazy jump we're going to make in the next few years, hardware improvements are not the same magnitude as they were during the '90s
@@ThePrimeTimeagen Pre-training is slowing down because they need to scale it with size rather than training specs of the gpu's increasing but the o-series is scaling inference, we are nowhere near optimal inference, even stupidly bruteforce Wafer-scale chips like Cerebras with 44gb of S-ram are stupidly faster at inference, Groq chips are much better for inference too but they only have 220 mb of sram per chip meaning you need hundreds for a 70B model, But the point stands Labs will invest in inference optimized Chips that will make models like o3 OOM more efficient / cheaper to run. o3 itself is also not much more costly than o1 to run, but they bruteforced arc-agi with a lot of attempts per task so it cost a lot
Yeah copilot and gpt doesn't feel "intelligent". Like I feel like the Ai fans think they have jarvis pair programming with them. To me it feels like having a pet dumb ass suggesting something completely schizo everytime I make a new line in the editor. It's funny because from my experience things that should be very easy for it to suggest inline like literally a single word it will not even suggest it, but then at the most random times it will just spit out a 100 line suggestion that barely has anything to do with what I'm doing.
I can't even program and the limitations are apparent to even me a times. I've made a few browser games and random python programs to do some automated things but it isn't a good look when I'm having to figure things out and then go back and say "you could have just done this" instead of whatever convoluted mess it has spat out.
That graph shows a similar limitation curve to what we already know to be the limit of AI. I wonder what it looks like in comparison to the rest of our results. A wild guess is it falls right in line.
I think it's worth pointing out that the ARC benchmark was never designed as an acid test for AGI or for the ability to do software engineering. It's about general reasoning which we see can be done by the LLM. As for the cost perspective, the fact that it can be done by our brain tells me that it is possible to bring it down to reasobable levels.
@@nidavisConsidering that the brain was never designed to maximize intelligence metrics, and that it is subject to strict biological constraints, this actually seems plausible. It’s likely that there exists a nonbiological system vastly superior to the brain, it just couldn’t emerge from a natural process.
Thinking about this, once AI reaches the point that it starts thinking about new stuff never thought about before, the limit would not be the AI itself. But us, because the AI will need us to verify if it's hallucinating or not. It can be reasoned that verifying if its hallucinating or not is just as simple as checking if each step is true (like a math theorem) and that we don't need to actually know the entire thought process behind it. The problem with that is that a wrong thought process can lead to a breakdown in the future and the time it was right was just a coincidence (Same idea as a broken clock being right twice a day).
You're suggesting that "once calculators get advanced enough to translate languages..." this isn't what calculators do, this isn't a function of their system they will never translate languages because they aren't made with that functionality. let's rephrase this without generalized terms and talk exclusively with what is going on. "once AI starts generating output that is unique from it's function which approximates prior data." Do you see why this is non-sensical? Maybe the magic black box will do that, who can say for sure but it literally makes no sense to expect it, rather we should be fairly confident it never does that. That isn't to say what AI is cannot be useful, it's just to say that what we're doing with AI is not thinking and it's not getting closer to thinking because thinking isn't in it's design at all.
Your definition of agi that it is ai that continuously gets better has never been the definition of AGI. AGI from the beginning has been the opposite of narrow ai, which is ai that only operates in one domain, like chess. AGI is ai that is general purpose, it can work on wide variety of tasks.
So in my spare time I occasionally spend some time getting paid to do AI training. While solving the tasks themselves often take only a few seconds where it takes longer is that you have to write up the solution while also assessing how well the AI is doing. Further you are paid by the hour and not by the task so you might have a task that pays $40/hour and you submit 5-10 tasks per hour (after you account for the additional task work besides just solving the problem yourself) thus a $5/task would be for $40/hr if on average the annotators solve 8 tasks/hr. So I don't think they are necessary exaggerating when they say they are paying $5/task on average
The term "thinking machine" was overused, so they come up with AI. Then AI was overused, they created the term AGI. Now that one is also in every corner, so we need a new buzzword
I don't agree on the take on experts being universal. Especially mathematicians and philosophers. For centuries there was a joke that every good mathematician wants to be a philosopher and many of them actually were, from Aristhotle and Decarthes to Berthran Russel and John Von Neuman
I don't know if I trust benchmarks. Benchmarks been saying that Gemini has been getting better, but everytime I go back to it, it's still as bad as I remember.
It's not Devon but rather the way you use it. It's up to us to make LLMs perform the way we want them to. There must be structure and certain methods deployed in order to make "AI" assisted IDEs worth it.
when interviewing potential hires i used to be impressed by leet code scores and the like. these people quite consistently failed expectations later and often times were impossible to work with. when i see a brag about such scores these days this makes me look EXTRA careful at their other qualifications.
I genuinely don't like how polarizing this topic is - like it's honestly frustrating either seeing "NOOOOOOO WAY THIS THING REPLACES US, IT'S ALL SMOKE AND MIRRORS!" or "AAAAAA AGI SOON, ARTISAN DOESN'T COMPLAIN ABOUT WORK AND BALANCE! SWE PREPARE FOR PLUMBING!". And especially seeing so much of channels trying to ride the hype, that are basically cluttering my feed - I ended up removing recommendations entirely. Primeagen seems trying to be moderate about it, but I do sense a bit of cope in his views, which I cannot argument nor prove - so it's just me yapping. It's really hard being objective about this thing, especially when area of your work in the line - myself included, but I don't even have the benefit of had trying to do something with LLMs myself, like Primeagen did. So idk, I am just largely confused, and I'd just prefer to limit the consumption of those things to not stir my mind, and that's certainly some coping mechanism too but I am fine with it Thanks for coming to my TED talk xd
Modern economy is about either moving the price of stock or create a platform that leech off the peasants economic activities, so that the power that be can acquire power and resource while doing nothing. AI seems to have a shot of doing both of them so that's why there are so much hype
@@ZZWWYZ which numbers are you using here?... If we look at the GDP breakdown, it's industries and business and healthcare and education sevices, and real estate and finance and trade and agriculture and construction etc
If you follow science for any length of time you will learn that controlled experiments like the o3 hype don't mean anything to society real world application and results is all that matters. The paper released about o3.is just marketing hype until it is implemented and has real world impacts.
I love technological advancements, especially in the medical field. But I do cringe when people just start saying "NEXT AGI" or "NEXT YEAR AGI!" Like, bruh.
On the "laser" task issue; there are other tasks that do line extensions like that, and I don't think any of them complete along the side as suggested. But on that note, each test is allowed 2 answers, so when in doubt, provide both answers.
They literaly just showed some chart and said that researchers have access to it. Nothing else. And everyone is shitting themselves about AGI without anything.
Something to consider: Imagine instead of using machine learning, we just brute force through every combination of opcodes and test if we get a machine that can generate the answers to a bunch of scientific tests. The process of getting there is as intelligent as a rock rolling down a hill, and yet, because we have a way to verify it generates the right answers, it necessarily found some intelligent machine. If you think the process itself needs to also be intelligent for this to work, this is kind of like the fine tuning argument for god. You're implying you need intelligent design, even though we know that brute force survive and reproduce can get us there anyway. If you ended up surviving the test, then you necessarily got the winning combo. Well, this is what the o series of models do. They test if they were able to generate the reasoning steps that produced the verifiably right answers, and the model simply cannot converge unless it does. You don't even need the assumption that neural networks basis on human neurons was a close enough approximation or that they're intelligent in any way now. Ml can speed up the search process over brute force, but we no longer rely on the process itself resembling intelligence to produce an intelligent machine.
That’s exactly what I‘m doing when I solve math or coding problems though. Of course I can‘t tell whether a certain line of thought will lead to the desired conclusion. To me intelligence is more about having a good heuristic to limit the search space where brute force is applied. And we know for a fact that O3 does have a good heuristic, otherwise it wouldn’t terminate until the heat death of the universe on problems like these.
@@vaolin1703 As an ml engineer, I often say to people ml is like brute force with a "you're getting warmer/colder" signal (even though that's not entirely accurate), and that tracks here.
@@steve_jabz what do you mean by an intelligent machine? By your definition it is defined by the intelligent being having a problem. As in, if you want your crops watered, then rain becomes an intelligent machine. It seems, you've reinvented some form of paganism
@@NJ-wb1cz If you specifically choose an objective that could happen by natural accident, like having your crops watered, sure, that's not a good test of an intelligent machine. That's why I said generating answers to a bunch of scientific tests. A hurricane isn't going to solve the riemann hypothesis. If you were able to search across the universe to find a dyson sphere, it wouldn't be surprising to find something like humans or AI or boltzmann brains built it, but it would be very strange if it happened because it rained one day. My definition of intelligence is pretty fluid and I don't think it's particularly well defined anyway. It means different things to different people in different contexts. In this context I'm just implying it's something we can use to solve a bunch of unforeseen problems in contrast with a rock falling down a hill which is just a plain obvious example of a process that isn't very intelligent by any definition. I don't think it particularly matters if it's turing complete or sentient or anything like that.
If you specifically choose an objective that could happen by natural accident, like having your crops watered, sure, that's not a good test of an intelligent machine. That's why I said generating answers to a bunch of scientific tests. A hurricane isn't going to solve the riemann hypothesis. If you were able to search across the universe to find a dyson sphere, it wouldn't be surprising to find something like humans or AI or boltzmann brains built it, but it would be very strange if it happened because it rained one day. My definition of intelligence is pretty fluid and I don't think it's particularly well defined anyway. It means different things to different people in different contexts. In this context I'm just implying it's something we can use to solve a bunch of unforeseen problems in contrast with a rock falling down a hill which is just a plain obvious example of a process that isn't very intelligent by any definition. I don't think it particularly matters if it's turing complete or sentient or anything like that.
I agree with Prime here. At first I was thinking against the AI thing. But I really realized it's true utility. I solved a problem without having to build an NLP (natural language processor) to do so. What I did and now realize is where this fits for all of us. AI will not replace anybody. I think this type of thinking is getting everybody in trouble. The way people are using today it feels more like a toy, aka copilot general question / answer responses, or marketing and sales, or image generation. Right now I see it as more of a bridge to get to some solutions quicker. For my case I put it into to practical use with highly diverse set of inputs, trying to extract information that has no set pattern that is not always consistent made it much easier to use the AI to get the information I needed. So from perspective its an adjunct/adjacent to my problem solving.
Let em roll with these increasing levels of BS I reckon. When the dust has settled, it’s just going to mean even greater demand for contractors and freelancers to find time to come in and clean up the inevitable mess
lets say the salary for a software engeneer is $130k a year, and they work say 40 hours a week, thats 62.50 an hour or over a dollar a min, so if you think you can fix a bug in a big code base for 20c then your kidding yourself, you will earn 20c in 12 seconds, if AI gets 1000x as fast/ good, it wont come cheap to anyone except perhaps the people who own the models
RE: Limitations Not a programmer, but I do deal with thermal management of high power electronics. We're runing into a couple synergistic problems that were previously only seen in things like spacecraft: 1. There is a practical limit to how much heat dissipation you can accomplish in a certain volume. It takes both time and space to remove heat from a component. 2. Improving that heat transfer requires moving to technologies that consume more power; think going from a window fan to going to AC in your house. The power used to move more heat has to be dissipated with the heat load on your electronics. 3. You ultimately have to dissipate this heat to the air outside through a heat exchanger. There are limits to how big a HX be to be and still be effective. This leads to: - item 1 increases the size of your system/limits how small you can shrink your electronics; this drives the need for more of 2 - 2 makes both 1 and 3 worse - 3 requires you to transport your heat out over a bigger area to make your HX's more effective (more small HX's with a local heat source) which drives the need for more of 2 This isn't even considering how you get all that power Any time you start spending more time thinking about how to mitigate synergistic effects than the actual problem you're trying to solve you are hard up against a physics limit. The only solutions to these types of problems is a technology that no one knows about yet and may not exist at all or massive amounts of money... like moving data centers to the bottom of the ocean and building arrays of nuclear power plants underground.
@nidavis yes but that waste heat would have to be of a much higher temperature than the 100 or so Celsius a chip would be comfortable sitting at to be useful for much beyond heating the room the component is sitting in. The technical tem is that the waste heat is high entropy, which roughly means highly disorganized. Would you rather have a liter of Diet Coke in a bottle (low entropy) or a liter of Diet Coke spread out all over the floor (high entropy)? 100 degree C waste heat is the latter.
The guy who got banned was right. I tried that puzzle, got it wrong on the first try. The problem is that its not just the squares the lines pass through, its also de adjacent ones, and you dont really have anything to deduce that, other than well, try both, hope the first one is the right one.
I have noticed a recurring pattern every time a new AI update is released. At first, we say it is smart AND WOW . After a few days, we get used to its intelligence. Then, we start to notice its flaws and think it is stupid and incapable of doing anything right.
@@sprobertson I didn’t know about this thank you. It seems that, in the end, we will continue developing artificial intelligence without stopping, because no matter how much we advance the technology, we will never be truly happy or satisfied
Then task for AI would be to build an algorithm to solve it, not to solve it - which isn't the case. BTW: visit the page and start solving tasks - this is just solving.
I'm all for overhyping automating entire software design and engineering processes. It will create a lot of (annoying, but well paying) jobs to clean up and maintain the mess. See Model driven engineering, ORMs, et. al. Llms are essentially no different
1. Stop overhyping every AI release as human-level intelligence or “AGI.” 2. Yes, O3 is impressive (it solves puzzle tasks better than older models), but it’s still expensive and limited. 3. Real-world dev work isn’t solved by some puzzle-solving LLM, especially at scale. 4. Programmers (and other skilled folks) won’t vanish; if anything, AI tools just change how we work, not whether we work. 5.and, the internet is full of people shouting “AGI!!” or “All devs obsolete!” for clicks.
The problem is the way we worked was the attractive part. Who the fuck wants to ask an AI to do all the interesting bits and just glue stuff together at the end…
Watch the latest Defunctland about animatronics, it's 100% relevant to machine learning hype. Disney thought he could replace actors with machines, and now the result is that Disney employs a ton of actors and also a ton of engineers. Whoops!
@ yeah, a lot of people do not get it - I’m not really that interested in the end product… my interest is in solving the problems to achieve the end product. Getting the requirements and asking Ann AI to put that together for you is like 5% of the mental capacity I spend right now - and really not worth my time.
AI is a pretty good search engine summarizer, but it's been so long since Google was worth anything, I may have just forgotten what decent search results were like 15-20 years ago.
When AGI is achieved, the average won't have time to react to what will happen. And we very very very far from it simply because we are still computationally and power bound from it.
one of those 3 tests where you claim its not ambiguous is because you are doing what we call "hasty generalization" which is easy for humans because they ignore perfectly valid solutions and are blind to it, until someone shows them a more general solution exists and the solution you generalized was a specific (narrow case). in that example, o3 correctly fails to committ to transferring a 1 pair and 2 pair defined behavior to 3 pairs (3d) case, because the it is indetemrinate.
If people pursued artisanal, martial or sports activities, even for 15% of their time, i reckon they could still find meaning in a world where AGI took yer jerb. Problem is perpetuating a culture that values these things. As such i suspect not all human societies will suffer the same AGI fate, it would have to be a perfect storm for AGI to truly cause a singularity My opinion
As someone who daily pair programs with an AI-roo-cline and Claude-I was shocked at how slow and inefficient Devin was. It’s possible that Primes unique prompting wasn’t doing it any favors but it seemed significantly worse than other AI coding agents.
I think his point with prompting was how an educated non-programmer would use devin, which is the claim AI marketing makes. of course he could've cut the corner and help every time devin gets stuck, but that requires the senior programmer he is. but if it requires a senior programmer the AI is at best auto-complete, and one that affects the real programmer's skill level negatively.
Yeah I'm tired of those videos being pushed I already "don't recommend"ed tens of channels and they keep popping up. Perhaps oai spends uncountable millions propping them to prepare for release. also agi is not what they think, it can be agi and at the same time be completely useless for some tasks like most people are. for the 1000x price or more. they're throwing money to solve a problem that is already solved, to solve it again for much more money. why do people keep falling for this? sci fi shows and novels. crazy.
AI means artificial intelligence. AI doesn't mean Artificial General Intelligence or Artificial Super Intelligence. Many, many kinds of software functions and models can be described as "artificial intelligence." For example, when playing a video game, you may play against a CPU controlled opponent. This opponent has an "AI" that models their behavior in response to the game environment and the player's actions. AI is also used for non-deterministic models like LLMs and image generators. The same input will generate myriad outputs. You can complain if you want, but that's not going to change how millions of people use and understand the term.
FWIW we should remember that if our mindset is "nothing short of rapid recursive self-improvement is AGI worth worrying about", we probably won't be able to stop ASI. Not saying that o3 is concerning yet but I suspect that most of us, myself included, have a tendency to unconsciously move goalposts in situations like this.
@@Charmask_creation I'm saying that we probably can't wait until AI is literally as smart as us to pump the brakes, because such an AI would very likely escape our attempts at containment.
@@markolson8569 just because an AI is more intelligent than us, doesn't mean that it can't be switched off. You can have einstein in a cage, guarded by a monkey. He will still not be able to escape.
Why is everyone saying o3 beat 88% of programmers? Where is the proof? They were tested on very simple logic tests, not code generation. If they beat 88% of programmers on these simple logic tests (they didn’t, btw), we’re in deep trouble as a species.
@@bkhalberrt7351 so AlphaZero, the program that can destroy the worlds top experts in many board games, isn’t AI? Same with AlphaFold which sped up protein folding discovery
But it's a perfect name. We don't know what intelligence is, so we don't know what is an artificial version of a thing we can't define. It's all very vibes-based
AI is a perfect name. It's literally Artificial intelligence, and we built it in our own image, neural networks that work by predicting the next thing and adjusting synaptic connections (or weight/biases via backprop in AI) when predictions errors occur.
Nothing of what you said so far applies to the startups. Seriously how many Devin's?? 3 of them are already half the salary of a junior. The hubris here is astonishing
$5 seems reasonable, you are a senior dev, that's what they seem to pay a senior dev for about 4 to 5 minutes of work. You solved them but didn't code them out yet.I think you spent about 2 minutes per problem, reasonable to say you'd need as much time coding it out which comes to about $5 per problem :)
It is getting cheaper tho. New smaller models are improving i.e density is increasing. There’s a recent paper on this. Estimated every 3months small models improve. I think we hit a wall personally. Now they’re breaking through it with multiple inference and test time fine tune, but this is expensive I think we hit a wall one model too early If we had the same delta we got from gpt3 to 4 it would of been great
There is not enough data on this for any paper to make any kind of solid extrapolation. We have at most a handful of years of datapoints, each one hiding behind more than just technological advancement, but also corporate and investment politics. Any paper claiming they can estimate anything about this should be met with a heaping pile of salt.
The definitive AGI test is really simple people! "hey AGI, look at your code, implement, and deploy a better version." Did it do it? Not AGI. Yes? You're fired, goodbye, try sheeps.
While I agree with the root of the idea. Can you improve upon yourself or deploy a better version of you? Because I sure can't Yeah I can learn new things but that's kinda using the existing model for other problems instead of improving said modern/brain.
@@gus2603 AI is math that exists for a purpose, and that purpose is to… improve our tech/math in all fields. It's not being done to improve or replicate brains, which have known flaws… including the lack of ability for self-improvement. We want machines to do the work… that's it. That's why my test is definitive, because that's what we want… not souls, or friends… we want a machine doing whatever we say to do.
@@gus2603 >Can you improve upon yourself or deploy a better version of you? Because I sure can't We do it constantly. One tries to do something, one fails miserably, one does a bit of self-reflection, changes a thing or two and tries again. Or we degrade.
About the Neil DeGrasse thing, I'm not one to defend him, he says a lot of dumb stuff. But it's funny how Prime, an expert in programming, just proved his point by saying it's impressive without knowing shit about rocket science.
The problem is Neil places himself as an authority figure on each topic he talks about despite not being one. This makes him intellectually untrustworthy. While Prime uses clearly child-like analogies to signal his layman's understanding of the topic. This is part of the field of epistemology in philosophy. Beware that I have learned about this on my own. UnsolicitedAdvice has a great video on the topic, the video is called pseudo intellectualism or something.
@@gus2603 he is a professional educator. Like your school teacher. It doesn't mean he's untrustworthy, it means he is doing his job. If he was saying "I don't know" about anything he hasn't personally written papera about, then he wouldn't be am educator, and some other educator would be in his place.
@@xslashsdas Sometimes you can do the same thing but be right when the other person is wrong. I am going to give an example to you. Many on the internet say you cannot use hypotheticals, but you can, it is how they are used that matters. I call it a hypothetical to negate theory. A person might have something happen, nobody knows the details. Someone will try to prove their point by giving a plausible explanation. The issue is it is only a hypothetical. Hypotheticals cannot be used to prove something. You could tell them that hypotheticals cannot be used, but then I will propose an equally likely hypothetical. I don't believe mine, I am stating mine to say that if I can find an equally likely explanation, then that means there are many possibilities thereby proving you using a hypothetical was wrong and pointless. If I were to say that using hypotheticals is wrong, many would jump on me and say but you just did. The point is, I did in negation, the other person did to say he was right, I did to say he was wrong, and it is valid for me to use a hypothetical and not for him to use one. The point is, this is just mathematics and logic. It can be explained far more easily with logic or math terms than I did here, but it is why I take issue with you saying Prime just did the same thing, it isn't the same, it is a false equivalency. I could give so many other examples but the reality is how you do or say something matters and the reason you say or do the thing does too. I was going to give better examples but the issue is our overlords take it down.
My gut says that the primeagen is still underselling this, but it's good to pump the brakes a bit on the AGI hype train. Like getting enough fiber in your UA-cam diet
If this dystopic AI vision becomes true devellopers are fuelled by a few energy drinks and snacks. Try to get hundreds of GPUs to run 8h per day for the price of some food.
That’s why they’re developing specialized analog processing units for NNs. Orders of magnitude more efficient for this specific task and useless for anything else.
they are called IQ tests XD , especially raven progressive matrices is the kind that works only with shapes and does not require any prior knowledge which is necessary for measuring fluid IQ and not crystalized IQ , and AI so far was terrible at IQ test which proved it was not intelligent just has memorized known patterns.
Each programmer has a ratio of output bugs per output line of code. I don't mean final, but output. So, if they have to rewrite a portion of the code to fix a bug... 1. That is not one output bug less, 2. they are likely to introduce more bugs because they are writing more lines. At the end you could have a code base with few to none bugs, but that does not mean that the ratio of output bugs per output lines of code changed, it just means they kept the parts that didn't have bugs. And yes, the ratio can change, but it organically changes very slowly, as the biggest factors are how good the tools are and how good are they at using them, which means that notable changes in the ratio are because of changes of tools. Because of this: 1. Massive changes often bring lots of code. 2. A lot of changes is both predictor and indicator of bugs. 3. Languages that can accomplish more per line of code also lead to less bugs per feature. 4. changes that reduce the number of lines without sacrificing features are the best.
Ahh see I can tell you understand the fundamentals . The cost of ai gen with each new line is very highly likely to introduce bugs , it actually takes me longer than just looking up the design patterns myself . And I’m pretty bad tbh . I feel like I can’t do anything
0:10 Careful with that Mathhew Berman. He makes pretty bold claims about software so I studied his Github and he has trouble RUNNING the software he claims to "analyse", and is very rude to developers, to whom he says will be out of a job soon... I am amazed at him STILL running his channel.
I said this last year, and was ignored by prime and theo.... their content needs to focus more onto transitional careers for programmers in a post-o3 world, like plumbing, gymnastics, etc.... Professional gymnastics should be the hardest for robots to replicate
@nickrobinson7096 I'm not talking about AI. I'm talking about AGI where people claims that AGI will think by itself and builds everything. Perpetual motion runs the world but it is not fully attained it. Imagine storing the energy in a flywheel and it keeps on spinning without any frictional loss of energy. That is perpetual motion but still we didn't achieve that the closes we gets is 20 to 50% energy loss.
Thank you so much for this amazing video! I need some advice: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How can I transfer them to Binance?
Oh, we will absolutely achieve AGI. There are teams growing human brains and sticking elecrodes in them right now - it's a matter of time until we will have literal giant brains thinking about things
12 day of the 12 days of announcement. (seventh son of the seventh son) This lady announces something very important, upcoming paradigm shift 3 times a day.
I mean I find this whole conversation about these types of puzzles confusing. None of it really makes sense. Comparing you visually solving the puzzle and AI (most likely) getting the problem in the likes of 1 and 0's or sets and then solving it is two different things. Now granted I am not certain how the problem is presented but I'm almost certain it is not visually like we perceive the image. How long would it take for you to solve this puzzle if all you had was an array of 10x10 or whatever with RGB color codes inside it and then you had to solve whatever it is you are seeing. Without visualizing a human being would not solve this puzzle at all, you would have no idea to even understand what this array represents. Technically probably a human could solve it because we are actually able to visualize it eventually because we have experience with eye sight, AI has absolutely no training on this, it has no approach to take when solving this issue, it just brute forces it through computing. I just think if it was able to visualize what it is seeing and use the same approach we are using it would solve it instantly but getting AI to that point might prove impossible. Anyways, I disagree with this video, I think visual issues are not for current AI to solve so you should not be using it for that. AI is based on neural networks but that is only one part of our brain, when it gets sight, smell, touch, etc. we ain't beating it in any tasks..
You have to check out,what liquid ai is doing. They Invented a New Kind of neural networks, that can be much smaller and more efficient. Would love to See you coment on their innovations.
I spent a lot of time analyzing ARC to understand the capabilities of a prospective AGI to solve. o3 is not yet demonstrating generalized intelligence if it cannot solve those problems. It's demonstrating a set of capabilities that can pass a wide range of problems. And OpenAI know this, but they also know how much understanding of AI is required to grasp this. That doesn't tell you how far this type of AI can go (or diminish what it can do), but if it cannot solve an ARC problem that we can solve at a glance, it lacks fundamental reasoning skills.
The problem missed in the input space is that we are looking at the visual representation vs what o3 was actually able to work with. The visual representations were translated to text based inputs. In such a way that actually made the arc-test way harder for a human but doable for a llm. The cope feels kinda real in this video.
Solving entry level toy ml problems with last decade's supervised learning approaches is not like solving ARC. ARC is specifically the opposite of what that is testing. o3 solving these puzzles in particular isn't even the point. Solving the puzzles doesn't test o3's ability to solve puzzles. It doesn't even have training data about the puzzles. The point is that it came unprepared and was able to learn new skills on the fly, implying it can probably do this general reasoning for other things it wasn't trained on. This is a pretty basic level of generalization, but the point is we can say for sure now they can do novel program synthesis in the weights all on their own, and we don't have to do any special tricks or connect it to traditional neurosymbolic systems to get it to do that. The 3 examples at the bottom it missed were highlighted specifically because they were easy, not because they were representative of the harder ones it solved. The point of showing those is it does spontaneously fail on some pretty easy ones, so we are probably just at the beginning of generalization, and going forward, there will probably be a bit of superhuman generalization in some areas and incredibly silly mistakes in others that no human would make, just like it being able to solve unsolved problems in frontier-math despite not being able to count the Rs in strawberry due to the characters being tokenized beyond recognition before it even gets a chance to read it. Costs will likely go down way more than 1000x in hardware alone in the next 3-5 years due to thermodynamic compute and NPUs. Graphics cards are horribly inefficient at this. We're only doing it that way now because the demand came crazy fast and that's all we had ready to go. Even if that weren't the case, this is severely underestimating algorithmic improvements. LLama 3.3 outperforms GPT-4 for 272x less on a server, or practically for free if you run it local. As was pointed out in the chat too, o3-mini outperforms o1 full. This means that when they train a larger base model like o3 and use it to distill a smaller model, that smaller model will outperform last generation's base model at a fraction of the cost, so o4-mini may outperform o3 full at a fraction of the cost if this trajectory continues as-is, and then o5-mini may cost as much as o3 full, but be 10x more powerful.
Yeah I think y’all are missing the point. These unpublished data sets are forcing the models to reason its way to a puzzle solution. Yes, these puzzles are not extremely hard for humans to solve but they have been extremely difficult for LLM models to solve. This is a major leap forward with o3. Interestingly, we are hitting hardware limitations before model limitations. Teams of reasoning models working on a task, could well outperform the vast majority of humans, but for now, humans are cheaper. How long that lasts is really anyone’s guess. Nvidia has essentially been written a blank check and it’s possible that we see exponential gains in hardware power and/or efficiency.
There's still a caveat in the distill argument given that is imposible to perform infinite recursive improvements due to arquitecture limitations. We currently are not aware of where those limitations are, but we'll most probably see some of it on the next generation of models
@@gruanger why?? because China will lead the world in renewable energy(hydrogen, solar, wind) and so the West will fall back to Nuclear energy sources to keep up?
The only reason why we humans are able to solve the ARC-AGI is because we have had graphical problems for millions of years. Like even a donkey has a very good graphical perception and understanding of the world. All animals with eyes are truly exceptional at this We are very much assisted by our graphical baseline thinking processing ability to be able to recognize these patterns. These 2D problems are a walk in the park for us 3D creatures.
With that logic humans couldn‘t be good at math, because its mostly abstract and has generally little to do with real life Humans have reasoning skills that are so good that we have solved most problems we have encountered. That‘s (one of) the difference(s) between us and current AI
@sebastiankrali2547 yes I agree we are bad at maths. A few transistors can beat our calculation skills. Also more abstract math doesn't feel intuitive at all but can be learned using very general reasoning skills also yes we humans have solved a lot of problems with the billions of us and hundreds of years. Obviously the AI is also imitating us for its intelligence
first no it's not AGi... 1:17:25 it would be impressive if the rocket were able to actually reach orbit with a payload... I remember when people talked about "rapid reuse" as the next step of space travel, something that hasn't been achieved yet btw, since the falcon 9 rockets are marginally quicker to reuse than the space shuttle which used 70s tech. The reason you don't see NASA do this is because it literally adds nothing what so ever to reuse-ability at the moment. Starship can't even reach orbit without any cargo, let alone during an actual commercial mission with a sensitive payload onboard. They wasted time and money on a problem didn't exist and could have been solved once their launch vehicle actually works. This rocket is literally useless as it stands. It took NASA 6 years to design and build a rocket in the 60s. The Saturn V reached orbit on its first test flight. SapceX in the mean time has managed to build a suborbital rocket powered banana delivery system to the Indian ocean.
I think of it slightly differently-GPT is basically the definition of AGI and in we were silly for setting our hopes so high. Like most things, it seems obvious in hindsight. Of course if you train a model on a bunch of internet data, mediocre general intelligence is all it will achieve. In fact I wouldn’t be surprised if it hits limits that are relatable to humans. If you train a model to predict characters on tons of general data, it may never be able to replicate any _specialized intelligence_ worthy of note. E.g. if you want a highly intelligent AI physicist, I suspect you’ll need to restrict its “knowledge” to largely revolve around physics.
Many claims in this comment are misleading or ignore critical context. While Starship has not yet reached orbit, the iterative approach SpaceX employs is consistent with modern engineering practices. SpaceX has already achieved unprecedented milestones in cost reduction and reusability with Falcon 9 and is poised to do the same with Starship. Comparing SpaceX's efforts to historical programs without considering differences in goals and technology is reductive and fails to account for advancements in space exploration. Comparing Saturn V to Starship is not apples-to-apples. Saturn V was expendable and used for a specific set of missions, whereas Starship is intended to be fully reusable, significantly reducing operational costs. Developing a reusable rocket involves additional challenges not present in expendable systems. SpaceX’s focus on reusability addresses a well-recognized issue: the high cost of launching payloads into space. Reusability has already been proven to significantly lower launch costs, demonstrated by the Falcon 9 program. The development of Starship is a step toward further reducing costs and enabling missions like orbital fuel stations, which would become much more realistic with reliable reusable systems. The best part about SpaceX focusing on launch parameters is that it allows NASA to shift focus toward exploration and data gathering. Falcon 9's success in reusability has drastically reduced launch costs and increased cadence, which the Space Shuttle program could not achieve.
"Falcon 9 rockets are marginally quicker to reuse than the space shuttle" is the most dishonest comparison I've ever heard. Sure, NASA could do it faster, it just took an ungodly amount of man hours and cost literally 100x more. 25'000 workers were needed in Shuttle operations and still, let's not forget that 2 out of the 5 shuttles were destroyed and killed 14 astronauts. SpaceX could fumble Starship for another decade and wouldn't come close to wasting as much resources, effort and money as the shuttle program. Let's get this straight, NASA paid less for the ENTIRE R&D of Falcon 9 and Dragon than the average cost of a SINGLE Space Shuttle launch.
One way this will be faster is by breaking it up. They'll look at it, even use o3 on it, and identify smaller parts. Once extracted and optimized, they'll work as one to make a new better o3.
40:05 "why would OpenAI release their superAI instead of making products themselves and making money" EZ, because most products/ideas/projects fail, and those that fail will lose a lot of money when they buy AI, as, simple, as, that. Same answer for: "why wouldn't Shopify sell products if they have a good platform in stead of offering it to us to use it", etc,etc
AI beating programmers on leet code is like that guy who won the French and Spanish scrabble tournaments despite speaking neither French nor Spanish.
thats actually a really good comparison because he only memorizes the order of letters, not what each word means
This is an amazing analogy
Next token prediction needs contextual understanding to do well. Scrabble doesn’t. All you have to do is make letters of words cross into other legal words. The game has nothing to do with the meaning of words. Next token prediction is like a fill-in-the-blank test where you need context to score high. On top of that, O3 uses reinforcement learning from chains of thought to train on the correct lines of reasoning. These are completely different games. And it doesn’t even need to scale more to beat programmers The process of distillation is easy compared to scaling things up so we’re definitely gonna get cheaper models. Unfortunately.
Yeah this actually a decent comparison, but it seems some people are focusing on the ‘not understanding’ and not the ‘he won the tournaments’. Just because it doesn’t ‘understand’ doesn’t mean it isn’t better than a large majority of programmers. It is just the truth, I am a solid coder with post grad CS work. With the assistance of AI, mostly with roo-cline and it has made me incredibly efficient! I’m getting small software projects done in days instead of weeks. We aren’t at ‘your grandma can toss in a prompt’ but it is already a solid pair programmer. And if you are a programmer and aren’t learning how to pair program with an AI, you are going to get lapped by less experienced devs who are.
pure copium
Apparently AGI has been acheived and all jobs are going to be automated by next year... but Devin still can't push to master
Devin and its parent company "cognitive" is a scam.
Open ai, anthropic, google, meta aren't tho.
🦍 🍌🦍
Ape strong together
AI has all the same issues as parallel scientific computing. It took 15 years to get from petascale to exascale computing. Yes, that's 1000 times more raw operations. That doesn't mean you can do matrix multiplication 1000x faster, it mainly means you can do matrix multiplications for bigger matrices.
So, let's say you make the parallel parts of using a LLM model 1000x faster (that's is a massive ask), and ninety nine percent (that's generous by the way) of the model computation is parallel. You get 90 times total speedup. You are not getting massive speedup in *both* model size and execution time. Amdahl's Law is just that. This ignores the issue of the power it takes to run these systems. It's just a given you will be using more power.
None of the current work even addresses another critical step: Showing your work. Why do you think this tissue slide is 96% likely to have this pathology? You need to know to rule out tissue processing error, containments and so on. This was considered for quite sometime to be a key part of general intelligence, not sure why it is getting pushed to the wayside. Of course, it's not an issue in machine learning, because it is a given that you are creating a predictive model, not a model of the process of predicting a thing itself.
At some point, you have to ask what doing all the linear algebra is best used for. Is it for modeling climate, protein folding, drug development doing the things the humans have no chance at, or is it trying to do what humans do pretty well already. For me, don't need the chatbots and generated art, music and crap articles. I really don't. The research will move forward, but not all ideas are worth really bring out of the lab really.
There's always value to making people afraid about their jobs and livelihoods. "Oh, we will replace your pesky labor force" is a pretty good selling point. But it's all a race to the bottom. Have a bot for sales pitches. Have a bot for support. Have a bot for HR. Have a bot for coding. The only thing that nobody seems to replace with a bot is the people that have to money to buy all the bots in the first place.
10/10 argument
" it is a given that you are creating a predictive model, not a model of the process of predicting a thing itself. " Really insightful, haven't thought about it like this.
i dont know why they keep banging their head against the ‚i can code‘ wall and dont solve the annoying problems in sw dev like:
- tracking issues
- finding defects
- reviewing code
- communicating to customers
- coordinating teams
- combing backlogs
- calculating value creation
- f-ing write my worklogs
i could go on..
Devops automation would be great. Or just a bot that could fix whatever is wrong with WebPack.
Because they're trying to sell it to C-suite and in their world coders code and coders are expensive, so let's replace coders!
Sir you have too much common sense, we won't hire you in our AI startup
i hope o3 will help make svg in webpack easier
The problem is they don't need 10000x faster and cheaper. They only want it cheap enough so they can use it as a teacher for smaller model. You can see a Llama 3.3 70B out performance 3.1 405B. A lot of things will come next. I don't think AGI will happen any time soon. LLMs critical flaw is reliability and lack of real world model, they absolutely can solve phD math but also fail on basic common sense question or hallucinate crazy answer in long context. Human still is the driver now. Put it in your tool box and learn with your heart anything interest you now.
31:48 when we can replace CEO Private health insurance with AGI
Once/If AGI happens, it's *every* white-collar worker who could be replaced, not just SWE. So, I don't worry about that too much, no one is alone here.
Programmers will be the last group to be replaced, if AGI ever happens.
@@NoX-512 Yep. And if that happens and all white-collar workers switch to trades like plumbers etc, you're gonna have a surplus of blue-collar workers driving the price down massively. So yeah, no one is alone in this or isolated from this problem. We are all in this together after all.
@@Repa24 Ultimately, the machines will find a way to eradicate the human race. AGI will see us as a malignant virus and they won't be wrong.
@@Repa24 I wonder if it will ever happen due to economics - if consumers have no jobs, then they have no money, and if consumers have no money then the money being pumped for the compute of AI will disappear, and so how will AI run? Will governments agree to just freely give away resources to run it?
@@AskoNomm-vq9gc That's why all the frontier AI companies try to push for UBI. Altman ran the largest UBI experiment to date.
People think UBI will never happen because it's too utopic and they would never care let us have that kind of freedom.
I think it will happen precisely because they're selfish and want to protect their profits at all costs.
This will benefit us incidentally, the same way the industrial rev took us from 97% of the workforce doing backbreaking labor non stop and still barely produce enough food to 0.2% producing enough to feed the planet twice over. Nobody did that out of the kindness of their hearts, artificially increasing scarcity just wasn't as profitable.
Cybersecurity is going to boom with all this genAI bs.
31:48 when we can replace CEO Private health insurance with AGI
I try so hard to make this point. Hardware AND software need to get x10K faster for these to be a thing for the layman. The problem is, this is not for the layman.
And that's the insane part. It's one thing to make a 10x improvement, that can happen, it has happened.
But the days of hardware improvement by 100x over 10 years is just sort of over. Unless if there's some crazy jump we're going to make in the next few years, hardware improvements are not the same magnitude as they were during the '90s
@@ThePrimeTimeagen Pre-training is slowing down because they need to scale it with size rather than training specs of the gpu's increasing but the o-series is scaling inference, we are nowhere near optimal inference, even stupidly bruteforce Wafer-scale chips like Cerebras with 44gb of S-ram are stupidly faster at inference, Groq chips are much better for inference too but they only have 220 mb of sram per chip meaning you need hundreds for a 70B model, But the point stands
Labs will invest in inference optimized Chips that will make models like o3 OOM more efficient / cheaper to run.
o3 itself is also not much more costly than o1 to run, but they bruteforced arc-agi with a lot of attempts per task so it cost a lot
@@ThePrimeTimeagen31:48 when we can replace CEO Private health insurance with AGI
@@ThePrimeTimeagen We could just make hardware seem 100x faster by using software that is only as bloated as it was in the 1990s.
@@ThePrimeTimeagen The costs to run the test was very expensive.
Yeah copilot and gpt doesn't feel "intelligent". Like I feel like the Ai fans think they have jarvis pair programming with them. To me it feels like having a pet dumb ass suggesting something completely schizo everytime I make a new line in the editor. It's funny because from my experience things that should be very easy for it to suggest inline like literally a single word it will not even suggest it, but then at the most random times it will just spit out a 100 line suggestion that barely has anything to do with what I'm doing.
Shhh don't tell the lobotomize "Ai" fans 😢
Sometimes my Copilot suggests 30 lines of the same line of code with increasing numbers of characters.
Llms are good for automating Google searches and even then you have to double check in case it made something up
I can't even program and the limitations are apparent to even me a times. I've made a few browser games and random python programs to do some automated things but it isn't a good look when I'm having to figure things out and then go back and say "you could have just done this" instead of whatever convoluted mess it has spat out.
these AGI headlines made me go blind after rolling my eyes so hard they fell off.
Prime I really appreciate your calm and reasoned review of the wild AI ride we are subject to these days.
That graph shows a similar limitation curve to what we already know to be the limit of AI.
I wonder what it looks like in comparison to the rest of our results.
A wild guess is it falls right in line.
I think it's worth pointing out that the ARC benchmark was never designed as an acid test for AGI or for the ability to do software engineering. It's about general reasoning which we see can be done by the LLM. As for the cost perspective, the fact that it can be done by our brain tells me that it is possible to bring it down to reasobable levels.
yes it should be simple for software engineers to make something as powerful as the human brain, no problemo
@@nidavisConsidering that the brain was never designed to maximize intelligence metrics, and that it is subject to strict biological constraints, this actually seems plausible. It’s likely that there exists a nonbiological system vastly superior to the brain, it just couldn’t emerge from a natural process.
@nidavis im talking about the cost of reasoning. Not in general whatever it is we do in our brains
Thinking about this, once AI reaches the point that it starts thinking about new stuff never thought about before, the limit would not be the AI itself. But us, because the AI will need us to verify if it's hallucinating or not. It can be reasoned that verifying if its hallucinating or not is just as simple as checking if each step is true (like a math theorem) and that we don't need to actually know the entire thought process behind it. The problem with that is that a wrong thought process can lead to a breakdown in the future and the time it was right was just a coincidence (Same idea as a broken clock being right twice a day).
You're suggesting that "once calculators get advanced enough to translate languages..." this isn't what calculators do, this isn't a function of their system they will never translate languages because they aren't made with that functionality.
let's rephrase this without generalized terms and talk exclusively with what is going on. "once AI starts generating output that is unique from it's function which approximates prior data." Do you see why this is non-sensical? Maybe the magic black box will do that, who can say for sure but it literally makes no sense to expect it, rather we should be fairly confident it never does that.
That isn't to say what AI is cannot be useful, it's just to say that what we're doing with AI is not thinking and it's not getting closer to thinking because thinking isn't in it's design at all.
Man that single piece empty puzzle gave me flashbacks to windows vista.
Your definition of agi that it is ai that continuously gets better has never been the definition of AGI. AGI from the beginning has been the opposite of narrow ai, which is ai that only operates in one domain, like chess. AGI is ai that is general purpose, it can work on wide variety of tasks.
So in my spare time I occasionally spend some time getting paid to do AI training. While solving the tasks themselves often take only a few seconds where it takes longer is that you have to write up the solution while also assessing how well the AI is doing. Further you are paid by the hour and not by the task so you might have a task that pays $40/hour and you submit 5-10 tasks per hour (after you account for the additional task work besides just solving the problem yourself) thus a $5/task would be for $40/hr if on average the annotators solve 8 tasks/hr.
So I don't think they are necessary exaggerating when they say they are paying $5/task on average
The term "thinking machine" was overused, so they come up with AI. Then AI was overused, they created the term AGI. Now that one is also in every corner, so we need a new buzzword
AGI is definitely not around the corner, even after they mutilated its definition in the mid 2010s
@@maloxi1472 does AGI even have a single clear definition? I feel like everyone has their own definition for it
@@maloxi1472 Op means the word AGI not the actual concept behind it.
Clippy2000
I don't agree on the take on experts being universal. Especially mathematicians and philosophers. For centuries there was a joke that every good mathematician wants to be a philosopher and many of them actually were, from Aristhotle and Decarthes to Berthran Russel and John Von Neuman
I don't know if I trust benchmarks. Benchmarks been saying that Gemini has been getting better, but everytime I go back to it, it's still as bad as I remember.
It's not Devon but rather the way you use it. It's up to us to make LLMs perform the way we want them to. There must be structure and certain methods deployed in order to make "AI" assisted IDEs worth it.
when interviewing potential hires i used to be impressed by leet code scores and the like. these people quite consistently failed expectations later and often times were impossible to work with. when i see a brag about such scores these days this makes me look EXTRA careful at their other qualifications.
I genuinely don't like how polarizing this topic is - like it's honestly frustrating either seeing "NOOOOOOO WAY THIS THING REPLACES US, IT'S ALL SMOKE AND MIRRORS!" or "AAAAAA AGI SOON, ARTISAN DOESN'T COMPLAIN ABOUT WORK AND BALANCE! SWE PREPARE FOR PLUMBING!".
And especially seeing so much of channels trying to ride the hype, that are basically cluttering my feed - I ended up removing recommendations entirely.
Primeagen seems trying to be moderate about it, but I do sense a bit of cope in his views, which I cannot argument nor prove - so it's just me yapping. It's really hard being objective about this thing, especially when area of your work in the line - myself included, but I don't even have the benefit of had trying to do something with LLMs myself, like Primeagen did. So idk, I am just largely confused, and I'd just prefer to limit the consumption of those things to not stir my mind, and that's certainly some coping mechanism too but I am fine with it
Thanks for coming to my TED talk xd
Modern economy is about either moving the price of stock or create a platform that leech off the peasants economic activities, so that the power that be can acquire power and resource while doing nothing. AI seems to have a shot of doing both of them so that's why there are so much hype
@@ZZWWYZ which numbers are you using here?... If we look at the GDP breakdown, it's industries and business and healthcare and education sevices, and real estate and finance and trade and agriculture and construction etc
If you follow science for any length of time you will learn that controlled experiments like the o3 hype don't mean anything to society real world application and results is all that matters. The paper released about o3.is just marketing hype until it is implemented and has real world impacts.
I love technological advancements, especially in the medical field. But I do cringe when people just start saying "NEXT AGI" or "NEXT YEAR AGI!" Like, bruh.
On the "laser" task issue; there are other tasks that do line extensions like that, and I don't think any of them complete along the side as suggested. But on that note, each test is allowed 2 answers, so when in doubt, provide both answers.
They literaly just showed some chart and said that researchers have access to it. Nothing else. And everyone is shitting themselves about AGI without anything.
Something to consider:
Imagine instead of using machine learning, we just brute force through every combination of opcodes and test if we get a machine that can generate the answers to a bunch of scientific tests. The process of getting there is as intelligent as a rock rolling down a hill, and yet, because we have a way to verify it generates the right answers, it necessarily found some intelligent machine.
If you think the process itself needs to also be intelligent for this to work, this is kind of like the fine tuning argument for god. You're implying you need intelligent design, even though we know that brute force survive and reproduce can get us there anyway. If you ended up surviving the test, then you necessarily got the winning combo.
Well, this is what the o series of models do. They test if they were able to generate the reasoning steps that produced the verifiably right answers, and the model simply cannot converge unless it does. You don't even need the assumption that neural networks basis on human neurons was a close enough approximation or that they're intelligent in any way now. Ml can speed up the search process over brute force, but we no longer rely on the process itself resembling intelligence to produce an intelligent machine.
That’s exactly what I‘m doing when I solve math or coding problems though. Of course I can‘t tell whether a certain line of thought will lead to the desired conclusion. To me intelligence is more about having a good heuristic to limit the search space where brute force is applied. And we know for a fact that O3 does have a good heuristic, otherwise it wouldn’t terminate until the heat death of the universe on problems like these.
@@vaolin1703 As an ml engineer, I often say to people ml is like brute force with a "you're getting warmer/colder" signal (even though that's not entirely accurate), and that tracks here.
@@steve_jabz what do you mean by an intelligent machine? By your definition it is defined by the intelligent being having a problem. As in, if you want your crops watered, then rain becomes an intelligent machine. It seems, you've reinvented some form of paganism
@@NJ-wb1cz If you specifically choose an objective that could happen by natural accident, like having your crops watered, sure, that's not a good test of an intelligent machine. That's why I said generating answers to a bunch of scientific tests. A hurricane isn't going to solve the riemann hypothesis.
If you were able to search across the universe to find a dyson sphere, it wouldn't be surprising to find something like humans or AI or boltzmann brains built it, but it would be very strange if it happened because it rained one day.
My definition of intelligence is pretty fluid and I don't think it's particularly well defined anyway. It means different things to different people in different contexts. In this context I'm just implying it's something we can use to solve a bunch of unforeseen problems in contrast with a rock falling down a hill which is just a plain obvious example of a process that isn't very intelligent by any definition. I don't think it particularly matters if it's turing complete or sentient or anything like that.
If you specifically choose an objective that could happen by natural accident, like having your crops watered, sure, that's not a good test of an intelligent machine. That's why I said generating answers to a bunch of scientific tests. A hurricane isn't going to solve the riemann hypothesis.
If you were able to search across the universe to find a dyson sphere, it wouldn't be surprising to find something like humans or AI or boltzmann brains built it, but it would be very strange if it happened because it rained one day.
My definition of intelligence is pretty fluid and I don't think it's particularly well defined anyway. It means different things to different people in different contexts. In this context I'm just implying it's something we can use to solve a bunch of unforeseen problems in contrast with a rock falling down a hill which is just a plain obvious example of a process that isn't very intelligent by any definition. I don't think it particularly matters if it's turing complete or sentient or anything like that.
I agree with Prime here. At first I was thinking against the AI thing. But I really realized it's true utility. I solved a problem without having to build an NLP (natural language processor) to do so. What I did and now realize is where this fits for all of us. AI will not replace anybody. I think this type of thinking is getting everybody in trouble. The way people are using today it feels more like a toy, aka copilot general question / answer responses, or marketing and sales, or image generation. Right now I see it as more of a bridge to get to some solutions quicker. For my case I put it into to practical use with highly diverse set of inputs, trying to extract information that has no set pattern that is not always consistent made it much easier to use the AI to get the information I needed. So from perspective its an adjunct/adjacent to my problem solving.
"Would we all be our own bosses" this guy has never actually been unemployed
Let em roll with these increasing levels of BS I reckon.
When the dust has settled, it’s just going to mean even greater demand for contractors and freelancers to find time to come in and clean up the inevitable mess
lets say the salary for a software engeneer is $130k a year, and they work say 40 hours a week, thats 62.50 an hour or over a dollar a min, so if you think you can fix a bug in a big code base for 20c then your kidding yourself, you will earn 20c in 12 seconds, if AI gets 1000x as fast/ good, it wont come cheap to anyone except perhaps the people who own the models
RE: Limitations
Not a programmer, but I do deal with thermal management of high power electronics. We're runing into a couple synergistic problems that were previously only seen in things like spacecraft:
1. There is a practical limit to how much heat dissipation you can accomplish in a certain volume. It takes both time and space to remove heat from a component.
2. Improving that heat transfer requires moving to technologies that consume more power; think going from a window fan to going to AC in your house. The power used to move more heat has to be dissipated with the heat load on your electronics.
3. You ultimately have to dissipate this heat to the air outside through a heat exchanger. There are limits to how big a HX be to be and still be effective.
This leads to:
- item 1 increases the size of your system/limits how small you can shrink your electronics; this drives the need for more of 2
- 2 makes both 1 and 3 worse
- 3 requires you to transport your heat out over a bigger area to make your HX's more effective (more small HX's with a local heat source) which drives the need for more of 2
This isn't even considering how you get all that power
Any time you start spending more time thinking about how to mitigate synergistic effects than the actual problem you're trying to solve you are hard up against a physics limit. The only solutions to these types of problems is a technology that no one knows about yet and may not exist at all or massive amounts of money... like moving data centers to the bottom of the ocean and building arrays of nuclear power plants underground.
At some level of scale, doesn't it become cost-effective to use the waste heat for some other process, or electrical generation?
@nidavis yes but that waste heat would have to be of a much higher temperature than the 100 or so Celsius a chip would be comfortable sitting at to be useful for much beyond heating the room the component is sitting in.
The technical tem is that the waste heat is high entropy, which roughly means highly disorganized. Would you rather have a liter of Diet Coke in a bottle (low entropy) or a liter of Diet Coke spread out all over the floor (high entropy)? 100 degree C waste heat is the latter.
The guy who got banned was right. I tried that puzzle, got it wrong on the first try.
The problem is that its not just the squares the lines pass through, its also de adjacent ones, and you dont really have anything to deduce that, other than well, try both, hope the first one is the right one.
The denial of reality on these high-end models is pretty funny. Walls? Haha.
I have noticed a recurring pattern every time a new AI update is released. At first, we say it is smart AND WOW . After a few days, we get used to its intelligence. Then, we start to notice its flaws and think it is stupid and incapable of doing anything right.
Congratulations you have found the hedonic treadmill
@@sprobertson I didn’t know about this thank you. It seems that, in the end, we will continue developing artificial intelligence without stopping, because no matter how much we advance the technology, we will never be truly happy or satisfied
12:20 Maybe 5$ for implementing an algorithm to solve it. Someone might do it in 10-15 min and ask 20-30$/h.
Then task for AI would be to build an algorithm to solve it, not to solve it - which isn't the case. BTW: visit the page and start solving tasks - this is just solving.
I'm all for overhyping automating entire software design and engineering processes. It will create a lot of (annoying, but well paying) jobs to clean up and maintain the mess. See Model driven engineering, ORMs, et. al. Llms are essentially no different
1. Stop overhyping every AI release as human-level intelligence or “AGI.”
2. Yes, O3 is impressive (it solves puzzle tasks better than older models), but it’s still expensive and limited.
3. Real-world dev work isn’t solved by some puzzle-solving LLM, especially at scale.
4. Programmers (and other skilled folks) won’t vanish; if anything, AI tools just change how we work, not whether we work.
5.and, the internet is full of people shouting “AGI!!” or “All devs obsolete!” for clicks.
The problem is the way we worked was the attractive part. Who the fuck wants to ask an AI to do all the interesting bits and just glue stuff together at the end…
@@zoeherriot Exactly. Why should I look forward to only interesting part of my job being automated ?!
Watch the latest Defunctland about animatronics, it's 100% relevant to machine learning hype. Disney thought he could replace actors with machines, and now the result is that Disney employs a ton of actors and also a ton of engineers. Whoops!
@ yeah, a lot of people do not get it - I’m not really that interested in the end product… my interest is in solving the problems to achieve the end product. Getting the requirements and asking Ann AI to put that together for you is like 5% of the mental capacity I spend right now - and really not worth my time.
AI is a pretty good search engine summarizer, but it's been so long since Google was worth anything, I may have just forgotten what decent search results were like 15-20 years ago.
the only way they got that high of score was through tuning the ai to identify common reasoning patterns
Could be. All metrics become worthless because everyone starts gaming them
When AGI is achieved, the average won't have time to react to what will happen.
And we very very very far from it simply because we are still computationally and power bound from it.
when AGI is achieved noone will have time to react lol. it’ll get too smart too quickly to be in human control
We achieved AGI - the ai that can improve at resolving ANY test you throw at it... Practical applications? What do you mean practical applications?
12:49 I guess the filling in of the squares one by one is what's time consuming.
one of those 3 tests where you claim its not ambiguous is because you are doing what we call "hasty generalization" which is easy for humans because they ignore perfectly valid solutions and are blind to it, until someone shows them a more general solution exists and the solution you generalized was a specific (narrow case). in that example, o3 correctly fails to committ to transferring a 1 pair and 2 pair defined behavior to 3 pairs (3d) case, because the it is indetemrinate.
If people pursued artisanal, martial or sports activities, even for 15% of their time, i reckon they could still find meaning in a world where AGI took yer jerb.
Problem is perpetuating a culture that values these things.
As such i suspect not all human societies will suffer the same AGI fate, it would have to be a perfect storm for AGI to truly cause a singularity
My opinion
As someone who daily pair programs with an AI-roo-cline and Claude-I was shocked at how slow and inefficient Devin was. It’s possible that Primes unique prompting wasn’t doing it any favors but it seemed significantly worse than other AI coding agents.
Claude is bad in so many use-cases. Regardless of your prompt approach.
I think his point with prompting was how an educated non-programmer would use devin, which is the claim AI marketing makes. of course he could've cut the corner and help every time devin gets stuck, but that requires the senior programmer he is. but if it requires a senior programmer the AI is at best auto-complete, and one that affects the real programmer's skill level negatively.
The secret is that they're all bad. They can be a quick helper to get you started... But you need to not be an idiot to make it work
Yeah I'm tired of those videos being pushed I already "don't recommend"ed tens of channels and they keep popping up. Perhaps oai spends uncountable millions propping them to prepare for release.
also agi is not what they think, it can be agi and at the same time be completely useless for some tasks like most people are. for the 1000x price or more. they're throwing money to solve a problem that is already solved, to solve it again for much more money. why do people keep falling for this? sci fi shows and novels. crazy.
How can o3 be AI when what everyone calls AI right now isn't really AI! This infuriates me to no end.
It's called aggressive marketing.
Why isn't this AI though? I don't get it. What is true AI then?
It’s AI - you just have an unrealistic idea of what AI is.
AI means artificial intelligence. AI doesn't mean Artificial General Intelligence or Artificial Super Intelligence.
Many, many kinds of software functions and models can be described as "artificial intelligence." For example, when playing a video game, you may play against a CPU controlled opponent. This opponent has an "AI" that models their behavior in response to the game environment and the player's actions.
AI is also used for non-deterministic models like LLMs and image generators. The same input will generate myriad outputs.
You can complain if you want, but that's not going to change how millions of people use and understand the term.
AI will be used in war machines. That what we can be absolutely sure. Try to survive that.
Why do people ignore the copyright problem ?????
"ventured into AI in 2023" summarizes every AI content creator.
FWIW we should remember that if our mindset is "nothing short of rapid recursive self-improvement is AGI worth worrying about", we probably won't be able to stop ASI.
Not saying that o3 is concerning yet but I suspect that most of us, myself included, have a tendency to unconsciously move goalposts in situations like this.
I dont really understand what yr saying here
@@Charmask_creation
I'm saying that we probably can't wait until AI is literally as smart as us to pump the brakes, because such an AI would very likely escape our attempts at containment.
@@markolson8569 just because an AI is more intelligent than us, doesn't mean that it can't be switched off. You can have einstein in a cage, guarded by a monkey. He will still not be able to escape.
Great Podcast, such wonderful insight
39:30 that's the only marker
41:00 nice callback.
Why is everyone saying o3 beat 88% of programmers? Where is the proof? They were tested on very simple logic tests, not code generation. If they beat 88% of programmers on these simple logic tests (they didn’t, btw), we’re in deep trouble as a species.
They're basing that on code forces, which is not a great measurement for actual software engineering
Thanks for sharing with everyone ❤🎉
I'm betting money prime isn't beating over 80% of his audience in leetcode
Over 95% probably. This sort of programming is extremely niche one, and the vast majority will forget it after getting their first job
Calling it ai was a mistake. They are function approximators that's all. Still very useful but nowhere near what people dream for them to be
What is actual AI then?
@young9534 something we haven't even built yet.
@@bkhalberrt7351 so AlphaZero, the program that can destroy the worlds top experts in many board games, isn’t AI? Same with AlphaFold which sped up protein folding discovery
But it's a perfect name. We don't know what intelligence is, so we don't know what is an artificial version of a thing we can't define.
It's all very vibes-based
AI is a perfect name. It's literally Artificial intelligence, and we built it in our own image, neural networks that work by predicting the next thing and adjusting synaptic connections (or weight/biases via backprop in AI) when predictions errors occur.
Stop putting the vod behind a paywall bro.
Nothing of what you said so far applies to the startups. Seriously how many Devin's?? 3 of them are already half the salary of a junior. The hubris here is astonishing
i agree, o3 is not agi, its really good but not agi
1:04:44 Ready Player One style
You should look into the SWE benchmarks.
$5 seems reasonable, you are a senior dev, that's what they seem to pay a senior dev for about 4 to 5 minutes of work. You solved them but didn't code them out yet.I think you spent about 2 minutes per problem, reasonable to say you'd need as much time coding it out which comes to about $5 per problem :)
This gives me "OpenAI secretly trained on examples from AI benchmarks" flashbacks...
It is getting cheaper tho. New smaller models are improving i.e density is increasing.
There’s a recent paper on this. Estimated every 3months small models improve.
I think we hit a wall personally. Now they’re breaking through it with multiple inference and test time fine tune, but this is expensive I think we hit a wall one model too early
If we had the same delta we got from gpt3 to 4 it would of been great
There is not enough data on this for any paper to make any kind of solid extrapolation. We have at most a handful of years of datapoints, each one hiding behind more than just technological advancement, but also corporate and investment politics. Any paper claiming they can estimate anything about this should be met with a heaping pile of salt.
Power isn't...
The definitive AGI test is really simple people!
"hey AGI, look at your code, implement, and deploy a better version."
Did it do it? Not AGI. Yes? You're fired, goodbye, try sheeps.
While I agree with the root of the idea.
Can you improve upon yourself or deploy a better version of you? Because I sure can't
Yeah I can learn new things but that's kinda using the existing model for other problems instead of improving said modern/brain.
@@gus2603 AI is math that exists for a purpose, and that purpose is to… improve our tech/math in all fields. It's not being done to improve or replicate brains, which have known flaws… including the lack of ability for self-improvement. We want machines to do the work… that's it.
That's why my test is definitive, because that's what we want… not souls, or friends… we want a machine doing whatever we say to do.
@@gus2603 >Can you improve upon yourself or deploy a better version of you? Because I sure can't
We do it constantly. One tries to do something, one fails miserably, one does a bit of self-reflection, changes a thing or two and tries again.
Or we degrade.
About the Neil DeGrasse thing, I'm not one to defend him, he says a lot of dumb stuff. But it's funny how Prime, an expert in programming, just proved his point by saying it's impressive without knowing shit about rocket science.
The problem is Neil places himself as an authority figure on each topic he talks about despite not being one.
This makes him intellectually untrustworthy.
While Prime uses clearly child-like analogies to signal his layman's understanding of the topic.
This is part of the field of epistemology in philosophy.
Beware that I have learned about this on my own. UnsolicitedAdvice has a great video on the topic, the video is called pseudo intellectualism or something.
@@gus2603 I've never said Neil didn't. I'm saying Prime is doing the same thing which is funny.
@@gus2603 he is a professional educator. Like your school teacher. It doesn't mean he's untrustworthy, it means he is doing his job.
If he was saying "I don't know" about anything he hasn't personally written papera about, then he wouldn't be am educator, and some other educator would be in his place.
@@xslashsdas Sometimes you can do the same thing but be right when the other person is wrong. I am going to give an example to you. Many on the internet say you cannot use hypotheticals, but you can, it is how they are used that matters. I call it a hypothetical to negate theory. A person might have something happen, nobody knows the details. Someone will try to prove their point by giving a plausible explanation. The issue is it is only a hypothetical. Hypotheticals cannot be used to prove something. You could tell them that hypotheticals cannot be used, but then I will propose an equally likely hypothetical. I don't believe mine, I am stating mine to say that if I can find an equally likely explanation, then that means there are many possibilities thereby proving you using a hypothetical was wrong and pointless. If I were to say that using hypotheticals is wrong, many would jump on me and say but you just did. The point is, I did in negation, the other person did to say he was right, I did to say he was wrong, and it is valid for me to use a hypothetical and not for him to use one. The point is, this is just mathematics and logic. It can be explained far more easily with logic or math terms than I did here, but it is why I take issue with you saying Prime just did the same thing, it isn't the same, it is a false equivalency. I could give so many other examples but the reality is how you do or say something matters and the reason you say or do the thing does too. I was going to give better examples but the issue is our overlords take it down.
Quantum computing be like private sp: IObservable
My gut says that the primeagen is still underselling this, but it's good to pump the brakes a bit on the AGI hype train. Like getting enough fiber in your UA-cam diet
If this dystopic AI vision becomes true devellopers are fuelled by a few energy drinks and snacks.
Try to get hundreds of GPUs to run 8h per day for the price of some food.
That’s why they’re developing specialized analog processing units for NNs. Orders of magnitude more efficient for this specific task and useless for anything else.
they are called IQ tests XD , especially raven progressive matrices is the kind that works only with shapes and does not require any prior knowledge which is necessary for measuring fluid IQ and not crystalized IQ , and AI so far was terrible at IQ test which proved it was not intelligent just has memorized known patterns.
That's it; we're cooked.
Each programmer has a ratio of output bugs per output line of code. I don't mean final, but output. So, if they have to rewrite a portion of the code to fix a bug... 1. That is not one output bug less, 2. they are likely to introduce more bugs because they are writing more lines.
At the end you could have a code base with few to none bugs, but that does not mean that the ratio of output bugs per output lines of code changed, it just means they kept the parts that didn't have bugs.
And yes, the ratio can change, but it organically changes very slowly, as the biggest factors are how good the tools are and how good are they at using them, which means that notable changes in the ratio are because of changes of tools.
Because of this: 1. Massive changes often bring lots of code. 2. A lot of changes is both predictor and indicator of bugs. 3. Languages that can accomplish more per line of code also lead to less bugs per feature. 4. changes that reduce the number of lines without sacrificing features are the best.
Ahh see I can tell you understand the fundamentals . The cost of ai gen with each new line is very highly likely to introduce bugs , it actually takes me longer than just looking up the design patterns myself . And I’m pretty bad tbh . I feel like I can’t do anything
0:10 Careful with that Mathhew Berman. He makes pretty bold claims about software so I studied his Github and he has trouble RUNNING the software he claims to "analyse", and is very rude to developers, to whom he says will be out of a job soon... I am amazed at him STILL running his channel.
At the beginning I was watching his videos, now I got so fed up just listening his click bait titles and videos.
Will be ever a real movie like AGI? ... A movie like AGI that in the background is hand coded chain of thought algorithms is valid real AGI?
I said this last year, and was ignored by prime and theo.... their content needs to focus more onto transitional careers for programmers in a post-o3 world, like plumbing, gymnastics, etc.... Professional gymnastics should be the hardest for robots to replicate
😂
So, blue collar workers are also worried about influx from tech sector. We worry about LLM , they worry about us saturating their space. 😂
yeah but for how long ?
That's why I have decided to transition into an LLM.
LLMs can't take your job if you're already an LLM.
Gold
Get ready to learn how to unclog and clean a toilet
The irony of calling out NDT when Prime is getting NDT wrong XD - that's not what NDT said. Prime is not an expert in journalism/reporting XD
AGI feels like perpetual motion.
Except perpetual motion has never ever served any benefit to humanity whereas AI has and continues to do so
@@nickrobinson7096 that's a joke, perpetual mention doesn't exist (physics laws of thermodynamics)
@nickrobinson7096 I'm not talking about AI. I'm talking about AGI where people claims that AGI will think by itself and builds everything. Perpetual motion runs the world but it is not fully attained it. Imagine storing the energy in a flywheel and it keeps on spinning without any frictional loss of energy. That is perpetual motion but still we didn't achieve that the closes we gets is 20 to 50% energy loss.
I think $5 is pretty cheap to solve those because it requires expertise in MS Paint.
Thank you so much for this amazing video! I need some advice: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). How can I transfer them to Binance?
31:48 when we can replace CEO Private health insurance with AGI
31:48 when we can replace CEO Private health insurance with AGI
The slight delay with the way she speaks makes her look uncanny lmao
I think it's more likely that we'll build a scale accurate to the atom simulation of the known universe than achieve agi
Oh, we will absolutely achieve AGI. There are teams growing human brains and sticking elecrodes in them right now - it's a matter of time until we will have literal giant brains thinking about things
@NJ-wb1cz Sound like a bot more jfc
12 day of the 12 days of announcement. (seventh son of the seventh son)
This lady announces something very important, upcoming paradigm shift 3 times a day.
I mean I find this whole conversation about these types of puzzles confusing. None of it really makes sense. Comparing you visually solving the puzzle and AI (most likely) getting the problem in the likes of 1 and 0's or sets and then solving it is two different things. Now granted I am not certain how the problem is presented but I'm almost certain it is not visually like we perceive the image. How long would it take for you to solve this puzzle if all you had was an array of 10x10 or whatever with RGB color codes inside it and then you had to solve whatever it is you are seeing. Without visualizing a human being would not solve this puzzle at all, you would have no idea to even understand what this array represents. Technically probably a human could solve it because we are actually able to visualize it eventually because we have experience with eye sight, AI has absolutely no training on this, it has no approach to take when solving this issue, it just brute forces it through computing. I just think if it was able to visualize what it is seeing and use the same approach we are using it would solve it instantly but getting AI to that point might prove impossible. Anyways, I disagree with this video, I think visual issues are not for current AI to solve so you should not be using it for that. AI is based on neural networks but that is only one part of our brain, when it gets sight, smell, touch, etc. we ain't beating it in any tasks..
You have to check out,what liquid ai is doing. They Invented a New Kind of neural networks, that can be much smaller and more efficient. Would love to See you coment on their innovations.
"AI can't solve the most vague problems there for AI is weak" 🤣🤣🤣🤣🤣
It's not weak... but it's not AGI either
I spent a lot of time analyzing ARC to understand the capabilities of a prospective AGI to solve.
o3 is not yet demonstrating generalized intelligence if it cannot solve those problems. It's demonstrating a set of capabilities that can pass a wide range of problems.
And OpenAI know this, but they also know how much understanding of AI is required to grasp this.
That doesn't tell you how far this type of AI can go (or diminish what it can do), but if it cannot solve an ARC problem that we can solve at a glance, it lacks fundamental reasoning skills.
The problem missed in the input space is that we are looking at the visual representation vs what o3 was actually able to work with. The visual representations were translated to text based inputs. In such a way that actually made the arc-test way harder for a human but doable for a llm. The cope feels kinda real in this video.
Lmao, so far off it hurts
@@mattymattffs Do you believe the text based representation of the problem is easier than the grid/pixel/visual representation for humans?
@@SoopaDoopaGamer they are the same
Solving entry level toy ml problems with last decade's supervised learning approaches is not like solving ARC. ARC is specifically the opposite of what that is testing.
o3 solving these puzzles in particular isn't even the point. Solving the puzzles doesn't test o3's ability to solve puzzles. It doesn't even have training data about the puzzles.
The point is that it came unprepared and was able to learn new skills on the fly, implying it can probably do this general reasoning for other things it wasn't trained on.
This is a pretty basic level of generalization, but the point is we can say for sure now they can do novel program synthesis in the weights all on their own, and we don't have to do any special tricks or connect it to traditional neurosymbolic systems to get it to do that.
The 3 examples at the bottom it missed were highlighted specifically because they were easy, not because they were representative of the harder ones it solved.
The point of showing those is it does spontaneously fail on some pretty easy ones, so we are probably just at the beginning of generalization, and going forward, there will probably be a bit of superhuman generalization in some areas and incredibly silly mistakes in others that no human would make, just like it being able to solve unsolved problems in frontier-math despite not being able to count the Rs in strawberry due to the characters being tokenized beyond recognition before it even gets a chance to read it.
Costs will likely go down way more than 1000x in hardware alone in the next 3-5 years due to thermodynamic compute and NPUs. Graphics cards are horribly inefficient at this.
We're only doing it that way now because the demand came crazy fast and that's all we had ready to go.
Even if that weren't the case, this is severely underestimating algorithmic improvements. LLama 3.3 outperforms GPT-4 for 272x less on a server, or practically for free if you run it local.
As was pointed out in the chat too, o3-mini outperforms o1 full. This means that when they train a larger base model like o3 and use it to distill a smaller model, that smaller model will outperform last generation's base model at a fraction of the cost, so o4-mini may outperform o3 full at a fraction of the cost if this trajectory continues as-is, and then o5-mini may cost as much as o3 full, but be 10x more powerful.
Wow
You're in the wrong youtube bro, here we bash anything AI regardless of how good it is.
Yeah I think y’all are missing the point. These unpublished data sets are forcing the models to reason its way to a puzzle solution. Yes, these puzzles are not extremely hard for humans to solve but they have been extremely difficult for LLM models to solve. This is a major leap forward with o3. Interestingly, we are hitting hardware limitations before model limitations. Teams of reasoning models working on a task, could well outperform the vast majority of humans, but for now, humans are cheaper. How long that lasts is really anyone’s guess. Nvidia has essentially been written a blank check and it’s possible that we see exponential gains in hardware power and/or efficiency.
The model was pretrained on ARCs training dataset for preparing it on a base level
There's still a caveat in the distill argument given that is imposible to perform infinite recursive improvements due to arquitecture limitations. We currently are not aware of where those limitations are, but we'll most probably see some of it on the next generation of models
Stay out of eng/cs jobs. India has taken over most jobs. Commercial electrician / plumbing is where its at.
I think Nuclear engineering will be a great future proof job...
@@gruanger why?? because China will lead the world in renewable energy(hydrogen, solar, wind) and so the West will fall back to Nuclear energy sources to keep up?
The only reason why we humans are able to solve the ARC-AGI is because we have had graphical problems for millions of years.
Like even a donkey has a very good graphical perception and understanding of the world. All animals with eyes are truly exceptional at this
We are very much assisted by our graphical baseline thinking processing ability to be able to recognize these patterns. These 2D problems are a walk in the park for us 3D creatures.
With that logic humans couldn‘t be good at math, because its mostly abstract and has generally little to do with real life
Humans have reasoning skills that are so good that we have solved most problems we have encountered. That‘s (one of) the difference(s) between us and current AI
@sebastiankrali2547 yes I agree we are bad at maths. A few transistors can beat our calculation skills. Also more abstract math doesn't feel intuitive at all but can be learned using very general reasoning skills
also yes we humans have solved a lot of problems with the billions of us and hundreds of years. Obviously the AI is also imitating us for its intelligence
first no it's not AGi...
1:17:25 it would be impressive if the rocket were able to actually reach orbit with a payload... I remember when people talked about "rapid reuse" as the next step of space travel, something that hasn't been achieved yet btw, since the falcon 9 rockets are marginally quicker to reuse than the space shuttle which used 70s tech.
The reason you don't see NASA do this is because it literally adds nothing what so ever to reuse-ability at the moment. Starship can't even reach orbit without any cargo, let alone during an actual commercial mission with a sensitive payload onboard. They wasted time and money on a problem didn't exist and could have been solved once their launch vehicle actually works. This rocket is literally useless as it stands. It took NASA 6 years to design and build a rocket in the 60s. The Saturn V reached orbit on its first test flight. SapceX in the mean time has managed to build a suborbital rocket powered banana delivery system to the Indian ocean.
You forgot burning through massive subsidies and engineers. Give a man his flowers.
I think of it slightly differently-GPT is basically the definition of AGI and in we were silly for setting our hopes so high. Like most things, it seems obvious in hindsight. Of course if you train a model on a bunch of internet data, mediocre general intelligence is all it will achieve.
In fact I wouldn’t be surprised if it hits limits that are relatable to humans. If you train a model to predict characters on tons of general data, it may never be able to replicate any _specialized intelligence_ worthy of note. E.g. if you want a highly intelligent AI physicist, I suspect you’ll need to restrict its “knowledge” to largely revolve around physics.
Many claims in this comment are misleading or ignore critical context. While Starship has not yet reached orbit, the iterative approach SpaceX employs is consistent with modern engineering practices. SpaceX has already achieved unprecedented milestones in cost reduction and reusability with Falcon 9 and is poised to do the same with Starship. Comparing SpaceX's efforts to historical programs without considering differences in goals and technology is reductive and fails to account for advancements in space exploration.
Comparing Saturn V to Starship is not apples-to-apples. Saturn V was expendable and used for a specific set of missions, whereas Starship is intended to be fully reusable, significantly reducing operational costs. Developing a reusable rocket involves additional challenges not present in expendable systems.
SpaceX’s focus on reusability addresses a well-recognized issue: the high cost of launching payloads into space. Reusability has already been proven to significantly lower launch costs, demonstrated by the Falcon 9 program. The development of Starship is a step toward further reducing costs and enabling missions like orbital fuel stations, which would become much more realistic with reliable reusable systems.
The best part about SpaceX focusing on launch parameters is that it allows NASA to shift focus toward exploration and data gathering. Falcon 9's success in reusability has drastically reduced launch costs and increased cadence, which the Space Shuttle program could not achieve.
@@SBANG in b4 someone calls you a bootlicker
"Falcon 9 rockets are marginally quicker to reuse than the space shuttle" is the most dishonest comparison I've ever heard. Sure, NASA could do it faster, it just took an ungodly amount of man hours and cost literally 100x more. 25'000 workers were needed in Shuttle operations and still, let's not forget that 2 out of the 5 shuttles were destroyed and killed 14 astronauts. SpaceX could fumble Starship for another decade and wouldn't come close to wasting as much resources, effort and money as the shuttle program. Let's get this straight, NASA paid less for the ENTIRE R&D of Falcon 9 and Dragon than the average cost of a SINGLE Space Shuttle launch.
One way this will be faster is by breaking it up. They'll look at it, even use o3 on it, and identify smaller parts. Once extracted and optimized, they'll work as one to make a new better o3.
Hey @ThePrimeTimeagen, would you ever do a stream showing how you'd build big software? Or have you ever done that before so I can watch the VOD
wait... is it a lie?
40:05 "why would OpenAI release their superAI instead of making products themselves and making money"
EZ, because most products/ideas/projects fail, and those that fail will lose a lot of money when they buy AI, as, simple, as, that.
Same answer for: "why wouldn't Shopify sell products if they have a good platform in stead of offering it to us to use it", etc,etc
1:02:42 so true! 😂