Jon, I have to say thanks for your in-depth coverage and even more so for keeping your graphs onscreen to give me enough time to smash the pause button. Too many creators flash a graph up for 3 seconds and assume the viewer is completely au fait with the narrative.
Agreed. However, I would appreciate it if Jon would add more padding after the “Alright everyone…” wrap up. The videos end so abruptly, and throw up the next one, that there’s almost not enough time to hit Pause so one can begin reading the comments. I know, First World problems. 🤷🏻♂️
Intel Tejas and Jayhawk were canceled successors to Prescott. They were supposed to be 40-50 stages and 7+GHz. Intel always takes a long time to learn their lessons.
There was also a version of Nehalem that was netburst based, aiming for 10 GHz. The name was re used later for the 45nm processor that actually launched.
@@jrherita The codename for a dual-core Tejas, Cedar Mill, was also reused for the final 65nm revision of the Pentium 4, which also happened to be used for the dual core P4s.
Geez, this was excellent. I love these behind-the-scenes episodes, and, Jon, you have such a good blend of technical and literary storytelling. This was as engaging as the books about the penultimate history of the PC. Don’t burn yourself out, we need you. I can’t comprehend how you can turn your research into world class episodes so quickly. We’re glad you do, but take care of yourself, friend.
The Pentium4 was also an opening for Transmeta to show a different way. We were focused on lower power while still keeping up the IPC. We had developed a really clever clock tree architecture that allowed us to only clock parts of the chip that needed to be updated that cycle (clock enables as far as the eye could see). This meant our active power was really, really low. We also did frequency/voltage scaling before anyone else as far as I am aware. This got our active power even lower. Playing with the substrate voltage was another hack we had to lower our leakage current. You should do an episode on Transmeta.
The "code morphing" always fascinated me, and I'm curious how it compares to modern CPU micro-code. Having more of the instructions be "software defined" sounds like it came with a cost, but offered a lot of options.
One of the intriguing things about Transmeta was that theoretically you could emulate any architecture; you just had to make a version of CMS that supports it. However, as has been learned time and again over the years (see especially Intel's Itanium), sometimes the magical software doesn't work as well as you'd hope, if it even appears at all, and so you're stuck with clever hardware that doesn't really do much outside of a carefully tuned environment. Also, Transmeta chips really would've needed huge high-speed caches (32MB or greater) for the CMS if they were truly going to supplant any of the big players in markets outside of netbooks and web appliances; using main RAM for it wasn't going to cut it in the high-performance sector. And nowadays most CPUs have several sub-architectures (SSE, AVX, MMX, AES generators, etc.) that are hardware-optimized, something the Transmeta chips wouldn't have been able to efficiently emulate without a unique set of SIMD engines of their own.
They were definitely ahead of that curve, and I'm sure they've contributed greatly to Intel, amd, Nvidia when they moved on. Microcode and microops seem like descents
Prescott, also known as Preshot, was AMD's big opening. They'd realized that pushing for ever higher clockspeeds was a fool errands, and designed a much smarter pipeline that could execute many more instructions per clock cycle. It was the Athlon XP and it should have dominated the world. Why it didn't is a story best told by Jim from Adored TV.
The first Pentium Ds (Smithfield, also 90 nm like Prescott) were an enormous gimmick (two separate dies) and would heat an auditorium. The AMD64 stuff of the era was just so clearly blowing it out of the water. I sometimes wonder what would have become of Intel if they hadn’t managed to get the Core architecture out the door in time and Apple had selected AMD for their x86 refresh.
@@andersjjensenUnfortunately, Bulldozer was never meant to be cheap. Sandy bridge cost less to make and the intention of BD was never to clock high, they just didn't have a choice and found out they could clock it high since the end result IPC is well, poor
@@JohnVance Intel was also very, very lucky that their mobile team in Israel kept iterating on the Pentium 3 design to produce the lineage that eventually became the Core processors. Without that alternative approach to fall back on once they finally realized the deep pipeline Pentium 4 approach was a dead end, they would have taken years longer to catch back up. I sometimes wonder if the team that persevered with that design got the recognition they deserved for effectively saving the company from itself.
Nearly everything I both marveled and shook my head at from the sidelines of the semi, computer, and IT industry for the last 20 years (and all that time buried in those HW and chip articles), you pretty much condensed in 15 mins. Brings words to thought. Still excited and blind about what's around the corner, what's coming, and which player is going to turn over next...
Love your videos! I'm a PhD student currently working on the RF test and measurement side of semiconductors at the on wafer level (e.g. S parameters, load pull, NVNA etc) - would be great to see a video on that at some point if possible!
If you weren't around in the 90's you won't understand just how crazy it was. On average CPUs got 10% better IPC/year 50% higher clocks per year. Power went from about 3 watts to about 60 watts. CPUs went went from being bare ceramic packages without a heat sink to having a tiny 8000 RPM earbleedingly awful fan to having larger sinks with low speed fans. Graphics cards went from being a pure 2D affair without any acceleration, to including more and more GUI acceleration features, to rasterizing and texturing triangles to per pixel lighting and hardware T&L to having per pixel programmable shaders in the early 00's. Games had not settle on what they could be and settled into making basically a reskin of the same AAA action-adventure type game with different art assets and settings and the same basic gameplay. Developers were trying to one up each other with all kinds of crazy new ideas throghout the 90's. A lot of it was predictably awful; some of it was pure magic (like ultima underworld; that games is like finding rabbit fossils in precambrian rock; it just shouldn't exist and shouldn't be possible on a 386). Without a technology to replace silicon CMOS and being a new exponential leg of a technological S-curve there will never be anything like it again. To underpin just how crazy the 90's was. If Dennard scaling had magically continued, somehow, we'd have 1.6 THz single core CPUs with 80 kW TDP. That's the exponential we were on for over a decade and 2002-2005 felt like crashing into a brick wall. In the 90's most people defined their hardware by the clock speed; it was nearly synonymous with performance. Before "netburst" only really cyrix was an outlier with terrible IPC. For marketing reasons it was obvious why intel did what they did. They had pulled that same rabbit out of the hat a dozen times before; if they could do it again and AMD did not they'd be years ahead. If they tried it and failed, at least they'd still have higher clocks and manage to sell some CPUs on outdated metrics.
I cannot agree more, the 90s was the time of my late teens/early adulthood and it was just crazy and the time I made software development my professional career. It was a wild-west feeling, always something exciting new thing comeing and going. And extra credit for mentioning Ultima Underworld, I was speechless when I started it the first time on my 386DX-40... magic did not do it justice for its time.
I worked in a computer shop at the time. Processors were hot potatoes, every week unsold lost tangible value. Dissipators went from tack-on gimmicks to frame-mounted behemoths. Software spell checking went from luxury to mainstream.
My favourite expression of this is that singlecore performance of my home computer rose about a million times (counting IPC) in the first half of my computing life, and ten in the last. (ZX81 flops -> 2GHz P4 ->12600)
I remember when I got a "massive" Thermalright heatsink... had to dremel off a couple fins to clear the capacitors on my motherboard, but you could mount an 80mm fan on that bad boy.
The idea of dark silicon was prevalent in the spacecraft electronics industry in the early ‘80s when I used it. Much cruder though. Here the issue was using TTL logic (all that was available for radiation hardened work) for complex functions. We would cycle power to individual chips to reduce the power demands on a very limited supply.
I think this is my favorite video so far! You condensed so much about why it’s hard to design semiconductors into a short and interesting presentation. Well done! 👏👏👏
It's pretty crazy how little the average developer knows about concurrency/threading. 15 years ago it kinda made sense, but it's still mostly the same niche of C++/Rust nerds who get it. It's a big problem in things that DON'T need a ton of CPU. A lot of apps suck because people don't know how to work with a single UI thread so they're constantly blocking the UI with non-UI stuff.
Threading is used extensively in backend and gamedev, which is much more than just C++/Rust. I guess your statement makes sense if you only count frontend/web development
First rule of mobile development (so Obj-C, Swift, Java, Kotlin) is never to block the main thread. iOS and Android have had concurrency/threading systems as core parts of their SDKs basically forever (GCD and now Swift Concurrency on iOS and Handlers and now Kotlin Coroutines) and are a core part of mobile development for basically every engineer working on these apps. The idea that only a few Rust and C++ devs understand concurrency and that everyone else is constantly blocking the main/UI thread is absurd.
9:10 - “AMD’s dual core… in 2004”… “Intel’s Pentium D in 2005” Sun Microsystems dual core UltraSPARC IV was released in 2004… a massive jump was made by Sun Microsystems octal core UltraSPARC T1 in 2005 16 & 32 core processors were later used by Sun Microsystems & then by Oracle Corporation… while handling 4 instructions per cycle & speeding to 5 GHz on the SPARC T8. SPARC by TSMC pushed the market to incredible places, by leading often years ahead of their time.
Pentium 3 had dual socket motherboards (non-server, I think). Hitachi SH2 had a way to link two chips in a master/slave configuration, and by 1997 they made a package that had both cores under one hood. However that one was not traditionally multi core, just two chips shrank into the same silicon.
Dear mask man jon you are WITHOUT QUESTION the Lone Ranger of tech video journalism. A lome ranger. identifys, researches, and produces high quality content with a passion to educade the public is exemplorary. well done! Ty masked man. as your knack for spotting trends and turning them into actionable insights is invaluable to so many.
Your videos are a gift. I was a Photolithography Engineer in the 90's-2000's and it was go, go, go all the time. The work was fascinating, the money was fantastic, but the hours were LONG (pretty much 24/7). Being so busy making the stuff, it was hard to keep up with what was happening in the industry and with the technology at the macro level. When I watch your videos, I think to myself "Oh, that's why we did that!". Thanks for encapsulating what was going on in the Golden Age of Semiconductors in the USA (hopefully, there will be another one!).
Evolution in nature also followed this curve. Nerves started slow, but sped up using electrochemical reactions. It was a dead end. Nature didn't give up on speeding up neurons as much as discovered the advantages of parallelism. Your brain is a relatively slow communicating, but massively parallel machine.
@@brodriguez11000computers aren't necessarily digital. Early computers, and some today, are analog too. Our brains are indeed analog computers, just with a structure we don't understand
1996 to 2001 was really an incredible era in computing. I remember 50 or 100mhz computers being norm then doubling frequency every year then on. There were even tv commercials around that time of a guy driving home, from the electronics store, with a new latest model PC and seeing a billboard announcing the next generation. It did seem like that at the time. Then, around that time, accelerated graphics were also created and advancing. DirectX versions advanced at around the same rate as CPUs. You had to buy a new videocard every year or so because the next generation supported new compute features supported in the latest DirectX version.
I still think there's algorithmic and OS/language optimizations that are yet to be realized... As an old school embedded programmer I'm convinced that at least half of the clock cycles of a modern PC are wasted with bloat... but I'm probably oversimplifying the problem...
Depends what you mean by bloat, lots of spent cycles on non critical things but that's a tradeoff with simpler programming, more io, more OS features etc If you consider that anything more advanced than windows XP is bloat then you can indeed run a whole PC in the single watts range
You are most likely correct. The problem that has crept into all software is that we have the memory to add an extra layer of configuration to everything, so software never resolves to a "finalized automation" of the kind that you get in the embedded world, where every byte and clock cycle can be properly accounted for. Instead we have a vast infrastructure of dynamically adjusting budgets for computing resources, just-in-time recompilation, abstracted programming interfaces, fallbacks, etc. That spills over into even more incidental complexity. It's known that as software gets bigger, more of the code "goes dark" - it either executes once or never, while the "hot spots" become much tighter in nature - tiny sections of assembly code that are hammered. But when there's an excess of configuration, there's no easy way to get at the hot spots. It's related to Conway's Law effects - it's not that we need all that configuration for an application, it's that having it is a way of designing for an adverse environment where you don't know who or what your program talks to, so everyone assumes the worst.
“Nature hates a void”. As the architecture (hardware) makes improvements and optimizations software is there to eventually occupy the newly discovered space - also an oversimplification. 😄
Dear sir, I love you videos. The information is presented so clearly and thoroughly that I feel I learn and retain something new after every one I watch! But, please, I have one request I feel should be very easy for you to implement. Could you PLEASE increase the volume of your voiceovers? In order for me to comfortably hear you I need to raise the volume of my TV to a level that makes the ads obscenely loud. On my TV any volume level over 10 and the ads are ear splittingly loud. I need the volume at 13 to 15 on your videos to hear you clearly. I have to watch with the TV remote in my hand, finger on the mute button, ready to press it the instant an ad starts to play. If I'm off by just a few seconds the effect is so jarring my wife will hear it 3 rooms over through 2 closed doors. And if it wakes the baby, well, my day/night is just ruined, lol! Thank you for all the amazing information you've provided me on so many topics that I find interesting! I truly do appreciate the effort you put in to these videos. If you could fix this one issue you would make me a VERY happy man! Thank you for your time and keep up the great work.
You don't have to guess that Intel chased clock speed above all else. This was stated at the time. To crank up the clock they had to introduce long pipelines. Most code could not keep those pipelines filled. So a few pieces of code that fit the pipeline architecture well ran blazingly fast, but most code spent much its time stalled. They started losing to AMD on real world performance, and were saved by the guys from Isreal, who had been working on a different approach. Semiconductor companies being saved by the guys from Isreal could be the basis for an interesting video. It has happened multiple times. Its unclear if the guys from Isreal are smarter, or if far from headquarters they are not pressured to pursue dumb goals.
Intel's Isreal-based researchers were charged with designing smaller, more efficient cores for laptops, so they didn't have the area, gate or cooling budgets for deep pipelining. This gave them the latitude to explore alternative approaches. However, most everyone really paying attention could see Intel's P4 designs were chasing pure "bigger number = better" Ghz which sound impressive to the clueless but are, in reality, pretty terrible across broad workloads. Those deep pipeline cache stalls, prediction and speculative execution misses just crushed performance. All that deep pipelining infrastructure also consumed a lot of gates on its own. Not doing most of it freed up a gates which could be applied toward doing real work.
@@MarkishMr They were chasing, as well as Intel fans were chasing. All anyone could talk about at the time were faster single core. Understandable because single thread was easy to understand and do. Concurrency took a skilled programmer a lot more work to pull off effectively.
Wish I had my 53Ghz CPU...but another problem alongside physical dimensions downscaling was the way we handle code to make it run fast, ie pipelined with a lot of pre-emptive/speculative execution. What Intel found out the hard way during the peak pentium high frequency era before they gave up and went core2duo was, the longer your pipeline gets to take advantage of this high frequency, the more branch miss and cache miss and pipeline flushes you end up having to deal with, in many workload types. So you reach a wall where it doesn't scale anymore anyway. The core2duo architecture pipeline that replaced the last pentium types was almost twice shorter and still outperformed it in virtually all payloads at much lower frequency.
Also ... What i really like about your work is the industry and expert prospective you cover. It's a very precious insight not just to get to know a chronology of macro effects happened in the consumer word, but instead the reason behind why something happened.
Great, and it's good that Dennard receives the credit that he deserves as he never mentioned in the press due to his obsession with "Moore's law" due to Intel's marketing machine.
At the time multi core processors came out, multi processor computing, hence multi core but each processor in a separate housing unit, had been around for quite a few years for not only simulation but real life applications such as Photoshop, Maya, 3D rendering including visualisation of MRIs in the medical industry. On Intel, MIPS, Sparc, Power PC and Alpha platforms. On workstations and servers.
11:45 is there really no updated version of this graph? It ends in 2015, which is now ten years ago. Would be very interesting to see how things developed since then.
I'd like to comment regarding the Pentium 4. It was a poor performer. I had several systems using these processors back then and they were noticeably sluggish. I am told that this sluggishness was a result of the deep pipelines in the processor. For example, I remember a Microsoft engineer at the time stating that a thread switch consumed roughly 7500 cycles because the pipelines and instruction caches got dumped and had to be refilled. My understanding that the follow-on after the Pentium 4 used older Pentium 3 technology. Anyway, I was not impressed that Colwell was a champion of the Pentium 4. To me, it was a failed processor.
These kind of difficulties always stimulate new inventions. I'm pretty sure if we had these 50GHz chips the systems wouldn't perform 10x faster, also because much less effort put in software optimization, and less hardware acceleration that CPUs / GPUs are stuffed with today. Parallel computing development was an obvious thing anyway, as it reflects the way heavy multi-tasking systems work.
Saymour Cray had a solution for the frequency bareer for CPUs back in early 90's however he was ignored. I guess that all that investment in silicon chip manufacturing must be milked to the maximum and this is still present to this day. I am talking about moving away from silicon and using galium, germanium, carbon etc etc.
Saymour Cray had a lot of good ideas and many of them made it into mainstream. Moving away from silicon requires a significant amount of investment that was from an economical standpoint completely impractical until now. Research labs and companies do look at such alternatives now. These are not as simple as just switching out the semiconductor material.
One thing that the Pentium 4 did right even with all its faults was hyperthreading. 1 core having 2 threads. AMD ignored this until 2017 when Zen 1 came out. Funny, Intel has abandoned hyperthreading with their new Arrow Lake CPUs.
Would we have GPUs today if our CPUs were able to keep scaling in Ghz? Yes, because we started getting GPUs when CPUs were still very slow, around 100mhz if I remember back to when I bought the first voodoo card. So GPUs then just started speeding up as the CPUs sped up as well. By the time we had 4ghz CPUs, we had already had 5 or 6 generations of GPU. If CPUs just kept getting faster, the GPUs would have gotten faster alongside them, and developers would have included bigger and bigger polygon counts.
The first PCs had to have adapter cards to have video output. Those cards had chips on them that could be considered GPUs although display adapters could only produce text. So GPUs were a thing at the very beginning. It took almost 10 years from introduction for CPUs to reach 100 MHz. The first PC was only 4.77 MHz.
Dr. Colwell gave a talk at my university a few weeks ago. He mentioned talking seeing the CEO of INTEL in the airport sometime ago and how the CEO was bragging about how their new chip that was supposed to be so great because of all the cores. He just told the CEO that the product wasn’t good because it was built off a bunch of bad early parts and was too complex and inefficient to actually be viable. He mentioned how he was correct in the end. It was an incredible talk about many things and this was just a very brief story he told. As a sophomore EE student it was so amazing to hear from him though.
So to summarize (and compare), Moore's law is about the evolution of transistor numbers in a chip over time, while Dennard's law is about how transistor size is related to speed and voltage?
I really appreciate your videos! Could you please do a video on the IBM Power series of processors following the dissolution of the AIM alliance to present?
13:50 This sounds very familiar. After the P4 (and some minor iterations) came the Pentium D. All the magazines and reviewers complained of it getting very warm. The Core 2Duo which followed getting a better response from the pundits.
13:10 you wonder if CPU's managed to get to those "53Ghz CPU speeds, would GPU's and the like would be as dominant as they are now...", and you're not the first to wonder. Tim Sweeney, CEO, and co-founder, of of EPIC Games, and lead programmer of "Unreal Engine", had a prophetic message about CPU's becoming more parallel, and GPU's becoming more general purpose: Q: Finally, where do you think 3D hardware and CPU technology should be headed? Do you think we are likely see 3D hardware taking over some of the functions of the CPU, going beyond rendering? A: I think CPU's and GPU's are actually going to converge 10 years or so down the road. On the GPU side, you're seeing a slow march towards computational completeness. Once they achieve that, you'll see certain CPU algorithms that are amicable to highly parallel operations on largely constant datasets move to the GPU. On the other hand, the trend in CPU's is towards SMT/Hyperthreading and multi-core. The real difference then isn't in their capabilities, but their performance characteristics. When a typical consumer CPU can run a large number of threads simultaneously, and a GPU can perform general computing work, will you really need both? A day will come when GPU's can compile and run C code, and CPU's can compile and run HLSL code -- though perhaps with significant performance disadvantages in each case. At that point, both the CPU guys and the GPU guys will need to do some soul searching! - Tim Sweeney, Epic games - 2004 beyond3d.com/content/interviews/18/4
For as long as the desire for high performance exists, and CPU's and GPU's still retain clear performance advantages from each other in important tasks, there's not gonna be any proper convergence.
The ending could explain most of the things that happened in the 2000 before the dual core and the duel core but not dual core era of intel. Thanks for the video.
Your oversimplified the length speed relation because all electric field moves at c but the change in electric field induces a voltage the other way it is called self inductances in L (Henry) and that should also decrease when the gate gets smaller.
It's indeed remarkable that in an industry where everything seemed to get always smaller and faster, one of the principal technical parameters - clockspeed - has stalled for 18 years now. I remember the mid 90s to mid 2000s where it seemed clockspeed would always ever increase. And then it abruptly stopped.
Phase-change cooling isn't really practical for daily use, and without it the absolute best you can do is the temperature transfer rate compared to ambient temperature. Air cooling and water cooling both run into this roadblock
Better cooling can help, but we are hitting physical limits. I do expect liquid in direct chip contact to generalize. Thermal conductance alone just doesn't cut it any more. (Passive) phase-change is powerful, and especially keeps T° range small, helping local expansion strain. Active, below-ambient seems just forever too expensive & unreliable.
Indepth and insightful. Is it possible for you to add a mindmap/flowchart at the end of your video ? so that the audience can get a summary of the entire video as you share a lot of information that is difficult to connect mentally and at the end it feels 80-90 % of the information you have shared has evaporated from the memory. Please Please consider this request . I have tried to create some mind maps by recording your audio in my phone then converting it into text and then putting it into web apps like miron..if you would like to see how the mindmaps turned out please let me know.
What are we going to see ten years from now? How many cores/threads are in a mid-range PC? Is the increase in cores/threads - how long does that increase in parallelism increase?
Utilizing parallelism is very very hard. But currently the bottleneck is the memory. It's possible to design a faster processor but feeding them with data could not keep up. So ten years from now most likely memory will be integrated in the CPU package. There are already CPU and GPU architectures that do this. It will become a lot more common in the future.
Cache sizes are going to get much, much larger to reduce latency, because system memory is simply too slow and modern processors waste a ton of cycles waiting for it. Adding more cores isn't really going to help much- the tasks that can be near-infinitely parallelized are already better shipped off to a device with a GPU-like architecture. We may start seeing consumer chips with a gigabyte of L3 cache or more.
the main game now is packaging, cpus will balloon in total silicon because SRAM is no longer scaling and you cant get more performance without it. the tendency will also be to more tightly integrate memory, things like HBM on package, take a look at AMD's MI300C for example, and Strix halo launching in january. cores will continue to increase slowly overtime, the main problem is cost per transistor isn't improving much which will be a massive problem soon.
I have a question. What would happen after the industrie achieves the last phisicaly posible node? Would the Industry try to scale down the tools to reduce production cost. Would the industry start to work on Quantum Computers. Would the world use optical logic gates.
It's not physics, but economics! Producing smaller and smaller nodes will become more and more expensive, as less and less electrons switch a transistor. So, as we get smaller in the next decade, we will see different technologies emerging: - Changes in architecture to just use the transistors we have better - Move to different types of computing with the same technology: analog, approximate, or neuromorphic - Move to different types of materials, i.e., abandoning CMOS Industry is currently investing Billions to scale up production, as demand is increasing. Scaling down to increase costs makes zero sense in a competitive market, Quantum computers and quantum networking is something that is already being worked on. These are solutions to different and very specific problems. Optical computing is still far off, for many reasons.
The RISK. PowerPCs super scalar architecture planed out at. 1.8 instructions per clock at 1.2GHZ for around 10 Watts, This was low enough to maintain a 105C Tj_max. temp in a 60C military environment. MIPS per watt was the issue for reliable military single board computers.
And as you get colder the efficiency gets better and the power usage goes down. Funny thing is low leakage chips tend to scale better at ambient temps and leaky chips scale better cold
I remember distinctly that Intel pushed clock speeds up to around 4GHz back in the Pentium 4 (?) era, then they hit a wall, the chips were getting hotter and not faster, and lost the lead to AMD for a bit,. A year or two later they came up with the next architecture at something like 1.4 GHz.
There are a few fairly major errors in this video. The most significant one is that there was a rush for Dennard Scaling in the mid to late 90s. This was actually not totally true. There was a lot of Dennard Scaling during this period, but that was not the only thing that happened. This was combined with a lot of other scaling solution souch as standing gate, tri gate, diagonal traces, copper interconnect and so on. Some of those both compacted the circuitry and reduce the power, some did just one or the other. This was when the actual lithography and the one marketed started to slip a part quite a lot. Its now apart something like 30 times. The other was the claim that multicore was introduced because of the cap in frequency. While this did happen around the same time, if we go back even all the way to the 70s, the cores went from subscalar, to scalar (late 80s), to super scalar (mid 90s) to SIMD (late 90s), and making them multicore mid 00s, was really just a logical step.
I remember at some time an IBM guy said clock speeds would stop rising, making Intel stock drop. Intel was quick to deny there was such a limit, but bingo, their clock speeds never increased again.
Isn't the switching speed due to lower input capacitance (gate to source and gate to drain) due to reduced area. I dont think the few micrometer distance reduction makes much difference as the field will b propagate near the speed of light
Boomers developed business methodology that has put all of humanity on its end path. All your videos demonstrate this glaring reality. Thank you, boomers.
Oh the idea alone, making over 10Ghz single core processors the standard... or 15ghz.. or 30ghz... do I want to see that powerbill? I am happy that wall was hit! And we instead have multi core, and the ability to put some of those cores on low power/sleep mode.
I wonder how much Itanium also factored into Intel’s decisions - it seems like they were laser focused on trying to execute as many instructions as possible on one chip, and only AMD introducing AMD64 snapped them out of it.
And you know what is still shitty? Having to listen to people that insist on having the latest multi-core machine while they're still using single-core software. Memory management isn't the best of that piece of software either. Take a 100 MB drawing (it's all vectors, so this should be a lot smaller anyway, their drawing format must suck), you need at least 24 GB of memory, otherwise it starts swapping to disk, and then it's plainly unusable. autocad is in an incredible load of toss.
@@JonathanMaddox No, but a few years ago a cheaper processor with less cores and higher maximum clock was the better option. Some users would say that you need at least a quad-core processor. Mind you, this was a while ago, so I'll bet they would now be asking for 16-cores.
@@JonathanMaddox It was around 2015. Of course you are right that there were no single core processors available. It was just that to get the performance needed you needed to select a cpu that was the fastest (and affordable) in that situation. Even better would have been to ditch that stupid piece of software. Sigh.
what Intel got wrong with their NetBurst microarchitecture is they thought that CPU speed was going to increase; that's why they made NetBurst have such a long pipeline, which is very slow at the frequencies the Pentium 4 was operating at, but would've been much, much faster at say, 7GHz. this is partly why PowerPC was so popular with high-end solutions like Mac workstations, mainframes and gaming consoles: it was both more efficient and more powerful per watt than what x86 could offer.
Jon, I have to say thanks for your in-depth coverage and even more so for keeping your graphs onscreen to give me enough time to smash the pause button. Too many creators flash a graph up for 3 seconds and assume the viewer is completely au fait with the narrative.
Agreed. However, I would appreciate it if Jon would add more padding after the “Alright everyone…” wrap up. The videos end so abruptly, and throw up the next one, that there’s almost not enough time to hit Pause so one can begin reading the comments. I know, First World problems. 🤷🏻♂️
My man loves brevity. Thanks to that, he's brought us the the quickest patreon shill on all of UA-cam. 😂
@@glennac turn off autoplay
Intel Tejas and Jayhawk were canceled successors to Prescott. They were supposed to be 40-50 stages and 7+GHz. Intel always takes a long time to learn their lessons.
There was also a version of Nehalem that was netburst based, aiming for 10 GHz. The name was re used later for the 45nm processor that actually launched.
@@mrbigberd Probably happened on the same cursed timeline as the G5 laptop
These work in the soothing cooling liquid atmospheres of Uranus
Interesting 😮
@@jrherita The codename for a dual-core Tejas, Cedar Mill, was also reused for the final 65nm revision of the Pentium 4, which also happened to be used for the dual core P4s.
Geez, this was excellent. I love these behind-the-scenes episodes, and, Jon, you have such a good blend of technical and literary storytelling. This was as engaging as the books about the penultimate history of the PC.
Don’t burn yourself out, we need you. I can’t comprehend how you can turn your research into world class episodes so quickly. We’re glad you do, but take care of yourself, friend.
7u
Damn not listening to the engineers that work with the technology day in day out, caused them to miss an obvious pitfall?
Crazy
It's unfortunately a very popular management strategy in many companies.
Now you know why P4 was a failed arch
hey, sometimes you gotta put those engineers in their place
Like decarbonising the grid here in the UK.
@@defeatSpace this is usually the first step towards bankruptcy.
The Pentium4 was also an opening for Transmeta to show a different way. We were focused on lower power while still keeping up the IPC. We had developed a really clever clock tree architecture that allowed us to only clock parts of the chip that needed to be updated that cycle (clock enables as far as the eye could see). This meant our active power was really, really low.
We also did frequency/voltage scaling before anyone else as far as I am aware. This got our active power even lower. Playing with the substrate voltage was another hack we had to lower our leakage current.
You should do an episode on Transmeta.
The "code morphing" always fascinated me, and I'm curious how it compares to modern CPU micro-code.
Having more of the instructions be "software defined" sounds like it came with a cost, but offered a lot of options.
One of the intriguing things about Transmeta was that theoretically you could emulate any architecture; you just had to make a version of CMS that supports it. However, as has been learned time and again over the years (see especially Intel's Itanium), sometimes the magical software doesn't work as well as you'd hope, if it even appears at all, and so you're stuck with clever hardware that doesn't really do much outside of a carefully tuned environment. Also, Transmeta chips really would've needed huge high-speed caches (32MB or greater) for the CMS if they were truly going to supplant any of the big players in markets outside of netbooks and web appliances; using main RAM for it wasn't going to cut it in the high-performance sector. And nowadays most CPUs have several sub-architectures (SSE, AVX, MMX, AES generators, etc.) that are hardware-optimized, something the Transmeta chips wouldn't have been able to efficiently emulate without a unique set of SIMD engines of their own.
They were definitely ahead of that curve, and I'm sure they've contributed greatly to Intel, amd, Nvidia when they moved on.
Microcode and microops seem like descents
Prescott, also known as Preshot, was AMD's big opening. They'd realized that pushing for ever higher clockspeeds was a fool errands, and designed a much smarter pipeline that could execute many more instructions per clock cycle. It was the Athlon XP and it should have dominated the world. Why it didn't is a story best told by Jim from Adored TV.
Ironically this is also when they started hashing initial plans for Bulldozer.
The first Pentium Ds (Smithfield, also 90 nm like Prescott) were an enormous gimmick (two separate dies) and would heat an auditorium. The AMD64 stuff of the era was just so clearly blowing it out of the water. I sometimes wonder what would have become of Intel if they hadn’t managed to get the Core architecture out the door in time and Apple had selected AMD for their x86 refresh.
@@jrherita No, that was several generations later. Bulldozer/Piledriver was a desperate attempt at selling cheaply but still making livable margins.
@@andersjjensenUnfortunately, Bulldozer was never meant to be cheap. Sandy bridge cost less to make and the intention of BD was never to clock high, they just didn't have a choice and found out they could clock it high since the end result IPC is well, poor
@@JohnVance Intel was also very, very lucky that their mobile team in Israel kept iterating on the Pentium 3 design to produce the lineage that eventually became the Core processors. Without that alternative approach to fall back on once they finally realized the deep pipeline Pentium 4 approach was a dead end, they would have taken years longer to catch back up.
I sometimes wonder if the team that persevered with that design got the recognition they deserved for effectively saving the company from itself.
Nearly everything I both marveled and shook my head at from the sidelines of the semi, computer, and IT industry for the last 20 years (and all that time buried in those HW and chip articles), you pretty much condensed in 15 mins. Brings words to thought. Still excited and blind about what's around the corner, what's coming, and which player is going to turn over next...
Integrated vacuum tube transistors might have been able to make the single core multi-Gigahertz path more viable.
Love your videos! I'm a PhD student currently working on the RF test and measurement side of semiconductors at the on wafer level (e.g. S parameters, load pull, NVNA etc) - would be great to see a video on that at some point if possible!
Which university & Industry?
If you weren't around in the 90's you won't understand just how crazy it was. On average CPUs got 10% better IPC/year 50% higher clocks per year. Power went from about 3 watts to about 60 watts. CPUs went went from being bare ceramic packages without a heat sink to having a tiny 8000 RPM earbleedingly awful fan to having larger sinks with low speed fans. Graphics cards went from being a pure 2D affair without any acceleration, to including more and more GUI acceleration features, to rasterizing and texturing triangles to per pixel lighting and hardware T&L to having per pixel programmable shaders in the early 00's. Games had not settle on what they could be and settled into making basically a reskin of the same AAA action-adventure type game with different art assets and settings and the same basic gameplay. Developers were trying to one up each other with all kinds of crazy new ideas throghout the 90's. A lot of it was predictably awful; some of it was pure magic (like ultima underworld; that games is like finding rabbit fossils in precambrian rock; it just shouldn't exist and shouldn't be possible on a 386). Without a technology to replace silicon CMOS and being a new exponential leg of a technological S-curve there will never be anything like it again.
To underpin just how crazy the 90's was. If Dennard scaling had magically continued, somehow, we'd have 1.6 THz single core CPUs with 80 kW TDP. That's the exponential we were on for over a decade and 2002-2005 felt like crashing into a brick wall. In the 90's most people defined their hardware by the clock speed; it was nearly synonymous with performance. Before "netburst" only really cyrix was an outlier with terrible IPC. For marketing reasons it was obvious why intel did what they did. They had pulled that same rabbit out of the hat a dozen times before; if they could do it again and AMD did not they'd be years ahead. If they tried it and failed, at least they'd still have higher clocks and manage to sell some CPUs on outdated metrics.
I cannot agree more, the 90s was the time of my late teens/early adulthood and it was just crazy and the time I made software development my professional career. It was a wild-west feeling, always something exciting new thing comeing and going.
And extra credit for mentioning Ultima Underworld, I was speechless when I started it the first time on my 386DX-40... magic did not do it justice for its time.
I worked in a computer shop at the time.
Processors were hot potatoes, every week unsold lost tangible value.
Dissipators went from tack-on gimmicks to frame-mounted behemoths.
Software spell checking went from luxury to mainstream.
My favourite expression of this is that singlecore performance of my home computer rose about a million times (counting IPC) in the first half of my computing life, and ten in the last. (ZX81 flops -> 2GHz P4 ->12600)
AMD K6 also had horrible floating point performance, a Celeron 300 was at least four times faster than a K6-2 450MHz in, say, emulators.
I remember when I got a "massive" Thermalright heatsink... had to dremel off a couple fins to clear the capacitors on my motherboard, but you could mount an 80mm fan on that bad boy.
Asianometry: *uploads a video about a thing I've never even heard of*
me: oh yeah popcorn time
Thanks for covering this fun topic!
The idea of dark silicon was prevalent in the spacecraft electronics industry in the early ‘80s when I used it. Much cruder though. Here the issue was using TTL logic (all that was available for radiation hardened work) for complex functions. We would cycle power to individual chips to reduce the power demands on a very limited supply.
What kind of spacecraft? Satellites, rocket guidance or fuel control or launch, etc, Shuttle? If you’re going to tease us…
I think this is my favorite video so far! You condensed so much about why it’s hard to design semiconductors into a short and interesting presentation. Well done! 👏👏👏
went from 16 mhz to 2 ghz in a bit over 10 years, still a good effort
Fantastic effort, pushing the boundaries until it was clear it was a losing battle to push them any further.
Eh?
286 16mhz February 1, 1982
P4 2.0ghz Aug. 27, 2001
@@myne00 You could still easily buy a new 386 sx 16 in 1991
@@turbinegraphics16 probably can now. Can definitely get 8088s.
An Asianomtery video is always welcome!
It's pretty crazy how little the average developer knows about concurrency/threading. 15 years ago it kinda made sense, but it's still mostly the same niche of C++/Rust nerds who get it. It's a big problem in things that DON'T need a ton of CPU. A lot of apps suck because people don't know how to work with a single UI thread so they're constantly blocking the UI with non-UI stuff.
Erlang would have helped, but........//an idea for a new video// #learnElixir
What a nonsense comment.
@@der.Schtefan Tell me you've never made a native app without telling me you've never made a native app.
Threading is used extensively in backend and gamedev, which is much more than just C++/Rust. I guess your statement makes sense if you only count frontend/web development
First rule of mobile development (so Obj-C, Swift, Java, Kotlin) is never to block the main thread. iOS and Android have had concurrency/threading systems as core parts of their SDKs basically forever (GCD and now Swift Concurrency on iOS and Handlers and now Kotlin Coroutines) and are a core part of mobile development for basically every engineer working on these apps.
The idea that only a few Rust and C++ devs understand concurrency and that everyone else is constantly blocking the main/UI thread is absurd.
Writing parallel software is challenging. It has also taken a long time for the tools to catch up with the need to support it well.
9:10 - “AMD’s dual core… in 2004”… “Intel’s Pentium D in 2005”
Sun Microsystems dual core UltraSPARC IV was released in 2004… a massive jump was made by Sun Microsystems octal core UltraSPARC T1 in 2005
16 & 32 core processors were later used by Sun Microsystems & then by Oracle Corporation… while handling 4 instructions per cycle & speeding to 5 GHz on the SPARC T8.
SPARC by TSMC pushed the market to incredible places, by leading often years ahead of their time.
Pentium 3 had dual socket motherboards (non-server, I think). Hitachi SH2 had a way to link two chips in a master/slave configuration, and by 1997 they made a package that had both cores under one hood. However that one was not traditionally multi core, just two chips shrank into the same silicon.
reminds me of the 5GHZ IBM monsters later on made on SOI wafers, its sad that that tech was left behind, we could gain some free clocks
You are a video making machine! So, so good.
3:25 when we shrink the transistor we also generally lower the capacitance. This is a good piece of the lower-power story. Great video as always!
Optical is going to be interesting.
Dear mask man jon
you are WITHOUT QUESTION the Lone Ranger of tech video journalism. A lome ranger. identifys, researches, and produces high quality content with a passion to educade the public is exemplorary. well done! Ty masked man. as your knack for spotting trends and turning them into actionable insights is invaluable to so many.
LONE RANGER!
Your videos are a gift. I was a Photolithography Engineer in the 90's-2000's and it was go, go, go all the time. The work was fascinating, the money was fantastic, but the hours were LONG (pretty much 24/7). Being so busy making the stuff, it was hard to keep up with what was happening in the industry and with the technology at the macro level. When I watch your videos, I think to myself "Oh, that's why we did that!". Thanks for encapsulating what was going on in the Golden Age of Semiconductors in the USA (hopefully, there will be another one!).
Evolution in nature also followed this curve. Nerves started slow, but sped up using electrochemical reactions. It was a dead end. Nature didn't give up on speeding up neurons as much as discovered the advantages of parallelism. Your brain is a relatively slow communicating, but massively parallel machine.
It's also analog which is where the "it's a computer" paradigm breaks down.
@@brodriguez11000computers aren't necessarily digital. Early computers, and some today, are analog too. Our brains are indeed analog computers, just with a structure we don't understand
@@brodriguez11000 So is AI.
Dude.... fascinating. This is everything i wanted to know about CPUs as a kid (~90s), but didn't have the resources to investigate 🙏
Happy to have found this video! Very interesting to-the-point channel.
I cant help being happy, that you have started saying - correctly - D-RAM, instead of dram. Great channel btw!
Again. The 5 nm is a marketing term. The smallest transistor gaye is 20-30 nm across.
Yes, most everybody paying to this stuff knows this by now. It doesn't need to keep being repeated.
great video! concepts were well explained and easy to understand and your jokes are always fun :)
1996 to 2001 was really an incredible era in computing. I remember 50 or 100mhz computers being norm then doubling frequency every year then on. There were even tv commercials around that time of a guy driving home, from the electronics store, with a new latest model PC and seeing a billboard announcing the next generation. It did seem like that at the time.
Then, around that time, accelerated graphics were also created and advancing. DirectX versions advanced at around the same rate as CPUs. You had to buy a new videocard every year or so because the next generation supported new compute features supported in the latest DirectX version.
I still think there's algorithmic and OS/language optimizations that are yet to be realized... As an old school embedded programmer I'm convinced that at least half of the clock cycles of a modern PC are wasted with bloat... but I'm probably oversimplifying the problem...
Open a Rust or Go binary in a debugger if you want to see what 99% wasted cycles looks like. Bloated mess.
Depends what you mean by bloat, lots of spent cycles on non critical things but that's a tradeoff with simpler programming, more io, more OS features etc
If you consider that anything more advanced than windows XP is bloat then you can indeed run a whole PC in the single watts range
You are most likely correct. The problem that has crept into all software is that we have the memory to add an extra layer of configuration to everything, so software never resolves to a "finalized automation" of the kind that you get in the embedded world, where every byte and clock cycle can be properly accounted for. Instead we have a vast infrastructure of dynamically adjusting budgets for computing resources, just-in-time recompilation, abstracted programming interfaces, fallbacks, etc. That spills over into even more incidental complexity. It's known that as software gets bigger, more of the code "goes dark" - it either executes once or never, while the "hot spots" become much tighter in nature - tiny sections of assembly code that are hammered. But when there's an excess of configuration, there's no easy way to get at the hot spots.
It's related to Conway's Law effects - it's not that we need all that configuration for an application, it's that having it is a way of designing for an adverse environment where you don't know who or what your program talks to, so everyone assumes the worst.
“Nature hates a void”. As the architecture (hardware) makes improvements and optimizations software is there to eventually occupy the newly discovered space - also an oversimplification. 😄
Not wasting your time isn't "bloat". Making the most of a finite resource in which you have better things to do than banging electrons.
Dear sir, I love you videos. The information is presented so clearly and thoroughly that I feel I learn and retain something new after every one I watch! But, please, I have one request I feel should be very easy for you to implement.
Could you PLEASE increase the volume of your voiceovers? In order for me to comfortably hear you I need to raise the volume of my TV to a level that makes the ads obscenely loud.
On my TV any volume level over 10 and the ads are ear splittingly loud. I need the volume at 13 to 15 on your videos to hear you clearly. I have to watch with the TV remote in my hand, finger on the mute button, ready to press it the instant an ad starts to play. If I'm off by just a few seconds the effect is so jarring my wife will hear it 3 rooms over through 2 closed doors. And if it wakes the baby, well, my day/night is just ruined, lol!
Thank you for all the amazing information you've provided me on so many topics that I find interesting! I truly do appreciate the effort you put in to these videos. If you could fix this one issue you would make me a VERY happy man!
Thank you for your time and keep up the great work.
You don't have to guess that Intel chased clock speed above all else. This was stated at the time. To crank up the clock they had to introduce long pipelines. Most code could not keep those pipelines filled. So a few pieces of code that fit the pipeline architecture well ran blazingly fast, but most code spent much its time stalled. They started losing to AMD on real world performance, and were saved by the guys from Isreal, who had been working on a different approach.
Semiconductor companies being saved by the guys from Isreal could be the basis for an interesting video. It has happened multiple times. Its unclear if the guys from Isreal are smarter, or if far from headquarters they are not pressured to pursue dumb goals.
Intel's Isreal-based researchers were charged with designing smaller, more efficient cores for laptops, so they didn't have the area, gate or cooling budgets for deep pipelining. This gave them the latitude to explore alternative approaches. However, most everyone really paying attention could see Intel's P4 designs were chasing pure "bigger number = better" Ghz which sound impressive to the clueless but are, in reality, pretty terrible across broad workloads. Those deep pipeline cache stalls, prediction and speculative execution misses just crushed performance.
All that deep pipelining infrastructure also consumed a lot of gates on its own. Not doing most of it freed up a gates which could be applied toward doing real work.
It might also be a case of “Thinking outside of the Box” in more ways than one.
@@MarkishMr They were chasing, as well as Intel fans were chasing. All anyone could talk about at the time were faster single core. Understandable because single thread was easy to understand and do. Concurrency took a skilled programmer a lot more work to pull off effectively.
2:42 LOL, how many stopped the video to rewatch this?
I didn't notice that. Thanks for pointing it out! 😄✌️
Indeed, his wit is as sharp as his intellect. One of the best tech info channels on YT.
Yeah, great piece of editing and wit !
Wish I had my 53Ghz CPU...but another problem alongside physical dimensions downscaling was the way we handle code to make it run fast, ie pipelined with a lot of pre-emptive/speculative execution.
What Intel found out the hard way during the peak pentium high frequency era before they gave up and went core2duo was, the longer your pipeline gets to take advantage of this high frequency, the more branch miss and cache miss and pipeline flushes you end up having to deal with, in many workload types. So you reach a wall where it doesn't scale anymore anyway.
The core2duo architecture pipeline that replaced the last pentium types was almost twice shorter and still outperformed it in virtually all payloads at much lower frequency.
Also ... What i really like about your work is the industry and expert prospective you cover. It's a very precious insight not just to get to know a chronology of macro effects happened in the consumer word, but instead the reason behind why something happened.
Great, and it's good that Dennard receives the credit that he deserves as he never mentioned in the press due to his obsession with "Moore's law" due to Intel's marketing machine.
Moore's law lasted a lot longer than Dennard scaling though
At the time multi core processors came out, multi processor computing, hence multi core but each processor in a separate housing unit, had been around for quite a few years for not only simulation but real life applications such as Photoshop, Maya, 3D rendering including visualisation of MRIs in the medical industry. On Intel, MIPS, Sparc, Power PC and Alpha platforms. On workstations and servers.
Man I love this channel, thank you for taking me back to my engineering lectures.
9:45 Hence we still struggle to reach 60fps in Crysis Ascension level
Not even zen 5 with v-cache? :p
No amount of CPU's can beat bad optimization.
Another great lecture. Thanks, guy.
@7:37 he said “turn of the century” and it sent me. I’m old.
11:45 is there really no updated version of this graph? It ends in 2015, which is now ten years ago. Would be very interesting to see how things developed since then.
Replying to see if someone searches for one
Colwell's oral history is amazing. Particularly the chapters about Itanium and VLIW. Honestly it's worth a video on its own.
It would be a great follow up to hear the tale of the Pentium M and how it evolved into the Core series!
I'd like to comment regarding the Pentium 4. It was a poor performer. I had several systems using these processors back then and they were noticeably sluggish. I am told that this sluggishness was a result of the deep pipelines in the processor. For example, I remember a Microsoft engineer at the time stating that a thread switch consumed roughly 7500 cycles because the pipelines and instruction caches got dumped and had to be refilled. My understanding that the follow-on after the Pentium 4 used older Pentium 3 technology. Anyway, I was not impressed that Colwell was a champion of the Pentium 4. To me, it was a failed processor.
babe wake up new End of Dennard Scaling just dropped
I'm so happy Sunday
Played out. Go back to bed
How long are people gonna push this stupid forced meme
@@colin351 babeless comment
Until it stops getting likes tbf
Great video! Made my Tuesday!
These kind of difficulties always stimulate new inventions.
I'm pretty sure if we had these 50GHz chips the systems wouldn't perform 10x faster, also because much less effort put in software optimization, and less hardware acceleration that CPUs / GPUs are stuffed with today.
Parallel computing development was an obvious thing anyway, as it reflects the way heavy multi-tasking systems work.
Brilliant clip! Thank you so much for sharing that.
Yes! I was waiting for this. The challenges of doing more on a single package that lead to multicore and chiplets.
Saymour Cray had a solution for the frequency bareer for CPUs back in early 90's however he was ignored. I guess that all that investment in silicon chip manufacturing must be milked to the maximum and this is still present to this day. I am talking about moving away from silicon and using galium, germanium, carbon etc etc.
Saymour Cray had a lot of good ideas and many of them made it into mainstream. Moving away from silicon requires a significant amount of investment that was from an economical standpoint completely impractical until now. Research labs and companies do look at such alternatives now. These are not as simple as just switching out the semiconductor material.
The closing minutes of the vid is great, Intel opened the door for AMD
One thing that the Pentium 4 did right even with all its faults was hyperthreading. 1 core having 2 threads. AMD ignored this until 2017 when Zen 1 came out. Funny, Intel has abandoned hyperthreading with their new Arrow Lake CPUs.
Would we have GPUs today if our CPUs were able to keep scaling in Ghz? Yes, because we started getting GPUs when CPUs were still very slow, around 100mhz if I remember back to when I bought the first voodoo card. So GPUs then just started speeding up as the CPUs sped up as well. By the time we had 4ghz CPUs, we had already had 5 or 6 generations of GPU.
If CPUs just kept getting faster, the GPUs would have gotten faster alongside them, and developers would have included bigger and bigger polygon counts.
The first PCs had to have adapter cards to have video output. Those cards had chips on them that could be considered GPUs although display adapters could only produce text. So GPUs were a thing at the very beginning. It took almost 10 years from introduction for CPUs to reach 100 MHz. The first PC was only 4.77 MHz.
if we had 50GHz cpus we now would have 30GHZ GPUs and it would be quite cool, keeping those beasts feed would be quite the challenge though
Dr. Colwell gave a talk at my university a few weeks ago. He mentioned talking seeing the CEO of INTEL in the airport sometime ago and how the CEO was bragging about how their new chip that was supposed to be so great because of all the cores. He just told the CEO that the product wasn’t good because it was built off a bunch of bad early parts and was too complex and inefficient to actually be viable. He mentioned how he was correct in the end.
It was an incredible talk about many things and this was just a very brief story he told. As a sophomore EE student it was so amazing to hear from him though.
So to summarize (and compare), Moore's law is about the evolution of transistor numbers in a chip over time, while Dennard's law is about how transistor size is related to speed and voltage?
Interesting how it started with memory and jumped to CPU, considering memory doesn't push barriers as hard as CPU.
Yes!
damn dude some genuine quality content, keep it up
Nice video.
4:36 Though, I am pretty sure that "K" is actually κ, or the Greek minuscule letter kappa. Not "K".
As always, fantastic video!
I really appreciate your videos! Could you please do a video on the IBM Power series of processors following the dissolution of the AIM alliance to present?
always chasing the next big bottleneck
13:50 This sounds very familiar. After the P4 (and some minor iterations) came the Pentium D. All the magazines and reviewers complained of it getting very warm. The Core 2Duo which followed getting a better response from the pundits.
The concept leads to Bucket Brigade Devices, the mainstay of "Analog Delay" guitar effects pedals.
13:10 you wonder if CPU's managed to get to those "53Ghz CPU speeds, would GPU's and the like would be as dominant as they are now...", and you're not the first to wonder.
Tim Sweeney, CEO, and co-founder, of of EPIC Games, and lead programmer of "Unreal Engine", had a prophetic message about CPU's becoming more parallel, and GPU's becoming more general purpose:
Q: Finally, where do you think 3D hardware and CPU technology should be headed? Do you think we are likely see 3D hardware taking over some of the functions of the CPU, going beyond rendering?
A: I think CPU's and GPU's are actually going to converge 10 years or so down the road. On the GPU side, you're seeing a slow march towards computational completeness. Once they achieve that, you'll see certain CPU algorithms that are amicable to highly parallel operations on largely constant datasets move to the GPU. On the other hand, the trend in CPU's is towards SMT/Hyperthreading and multi-core. The real difference then isn't in their capabilities, but their performance characteristics.
When a typical consumer CPU can run a large number of threads simultaneously, and a GPU can perform general computing work, will you really need both? A day will come when GPU's can compile and run C code, and CPU's can compile and run HLSL code -- though perhaps with significant performance disadvantages in each case. At that point, both the CPU guys and the GPU guys will need to do some soul searching!
- Tim Sweeney, Epic games - 2004
beyond3d.com/content/interviews/18/4
For as long as the desire for high performance exists, and CPU's and GPU's still retain clear performance advantages from each other in important tasks, there's not gonna be any proper convergence.
The technical marvels that let me shot virtual bad guys on my computer screen never cease to amaze.
I actually clicked on this video because I've never heard of Dennard Scaling 😆
The electric field icon is wrong, the field should be vertical between gate and substrate at 2:09.
Of all the random things to see in my feed, a Socket 423 proc was not expected
The ending could explain most of the things that happened in the 2000 before the dual core and the duel core but not dual core era of intel. Thanks for the video.
Well done, thanks!
Your oversimplified the length speed relation because all electric field moves at c but the change in electric field induces a voltage the other way it is called self inductances in L (Henry) and that should also decrease when the gate gets smaller.
Smaller transistor= smaller impedance= faster switching.
c is irrelevant here because the material facilitating solid-state electron propagation constrains the field-strength.
Thx great breakdown🎉
It's indeed remarkable that in an industry where everything seemed to get always smaller and faster, one of the principal technical parameters - clockspeed - has stalled for 18 years now. I remember the mid 90s to mid 2000s where it seemed clockspeed would always ever increase. And then it abruptly stopped.
Thank you - first time I’ve heard of Dennard and I’ve been following this sort of stuff most of my life - an extra 15 minutes of CPD 👍
One of the best UA-cam channel
8:24 The funniest part of this video. 53 GHz years ago. That would have been something!
Can the "Dark Silicon" problem be solved just by cooling the chip enough?
Phase-change cooling isn't really practical for daily use, and without it the absolute best you can do is the temperature transfer rate compared to ambient temperature. Air cooling and water cooling both run into this roadblock
Better cooling can help, but we are hitting physical limits.
I do expect liquid in direct chip contact to generalize. Thermal conductance alone just doesn't cut it any more.
(Passive) phase-change is powerful, and especially keeps T° range small, helping local expansion strain.
Active, below-ambient seems just forever too expensive & unreliable.
Great vid it’s making me hungry for the next chapter - how arm chips created better performance
Indepth and insightful. Is it possible for you to add a mindmap/flowchart at the end of your video ? so that the audience can get a summary of the entire video as you share a lot of information that is difficult to connect mentally and at the end it feels 80-90 % of the information you have shared has evaporated from the memory. Please Please consider this request . I have tried to create some mind maps by recording your audio in my phone then converting it into text and then putting it into web apps like miron..if you would like to see how the mindmaps turned out please let me know.
Fellow nerds, unite! ✊
8:22
Not even in Stalin's dreams
What are we going to see ten years from now? How many cores/threads are in a mid-range PC? Is the increase in cores/threads - how long does that increase in parallelism increase?
Utilizing parallelism is very very hard. But currently the bottleneck is the memory. It's possible to design a faster processor but feeding them with data could not keep up. So ten years from now most likely memory will be integrated in the CPU package. There are already CPU and GPU architectures that do this. It will become a lot more common in the future.
Cache sizes are going to get much, much larger to reduce latency, because system memory is simply too slow and modern processors waste a ton of cycles waiting for it. Adding more cores isn't really going to help much- the tasks that can be near-infinitely parallelized are already better shipped off to a device with a GPU-like architecture. We may start seeing consumer chips with a gigabyte of L3 cache or more.
the main game now is packaging, cpus will balloon in total silicon because SRAM is no longer scaling and you cant get more performance without it.
the tendency will also be to more tightly integrate memory, things like HBM on package, take a look at AMD's MI300C for example, and Strix halo launching in january.
cores will continue to increase slowly overtime, the main problem is cost per transistor isn't improving much which will be a massive problem soon.
That policy of Intel at that time still echoes today...for the worse. Today's news you tell me...
I have a question. What would happen after the industrie achieves the last phisicaly posible node?
Would the Industry try to scale down the tools to reduce production cost.
Would the industry start to work on Quantum Computers.
Would the world use optical logic gates.
It's not physics, but economics! Producing smaller and smaller nodes will become more and more expensive, as less and less electrons switch a transistor. So, as we get smaller in the next decade, we will see different technologies emerging:
- Changes in architecture to just use the transistors we have better
- Move to different types of computing with the same technology: analog, approximate, or neuromorphic
- Move to different types of materials, i.e., abandoning CMOS
Industry is currently investing Billions to scale up production, as demand is increasing. Scaling down to increase costs makes zero sense in a competitive market,
Quantum computers and quantum networking is something that is already being worked on. These are solutions to different and very specific problems.
Optical computing is still far off, for many reasons.
So educational!
The RISK. PowerPCs super scalar architecture planed out at. 1.8 instructions per clock at 1.2GHZ for around 10 Watts, This was low enough to maintain a 105C Tj_max. temp in a 60C military environment. MIPS per watt was the issue for reliable military single board computers.
overclockers using liquid nitrogen often go above 4GHz so I guess the cooling tech needs more advancement to break the power wall
cpus now easily run 5.5ghz on ambient cooling, but you're right, problem is that there is no easy method that is cheap
And as you get colder the efficiency gets better and the power usage goes down. Funny thing is low leakage chips tend to scale better at ambient temps and leaky chips scale better cold
I remember distinctly that Intel pushed clock speeds up to around 4GHz back in the Pentium 4 (?) era, then they hit a wall, the chips were getting hotter and not faster, and lost the lead to AMD for a bit,. A year or two later they came up with the next architecture at something like 1.4 GHz.
Love this channel ❤
There are a few fairly major errors in this video.
The most significant one is that there was a rush for Dennard Scaling in the mid to late 90s. This was actually not totally true.
There was a lot of Dennard Scaling during this period, but that was not the only thing that happened. This was combined with a lot of other scaling solution souch as standing gate, tri gate, diagonal traces, copper interconnect and so on. Some of those both compacted the circuitry and reduce the power, some did just one or the other.
This was when the actual lithography and the one marketed started to slip a part quite a lot. Its now apart something like 30 times.
The other was the claim that multicore was introduced because of the cap in frequency. While this did happen around the same time, if we go back even all the way to the 70s, the cores went from subscalar, to scalar (late 80s), to super scalar (mid 90s) to SIMD (late 90s), and making them multicore mid 00s, was really just a logical step.
I remember at some time an IBM guy said clock speeds would stop rising, making Intel stock drop. Intel was quick to deny there was such a limit, but bingo, their clock speeds never increased again.
Isn't the switching speed due to lower input capacitance (gate to source and gate to drain) due to reduced area. I dont think the few micrometer distance reduction makes much difference as the field will b
propagate near the speed of light
WOW! when did Asianometry started saying D'ram, instead of dram?! it's the start of a whole new era!
Boomers developed business methodology that has put all of humanity on its end path. All your videos demonstrate this glaring reality. Thank you, boomers.
Oh the idea alone, making over 10Ghz single core processors the standard... or 15ghz.. or 30ghz... do I want to see that powerbill?
I am happy that wall was hit! And we instead have multi core, and the ability to put some of those cores on low power/sleep mode.
If you wait long enough you'll be in the stone age consumer group vector.
I wonder how much Itanium also factored into Intel’s decisions - it seems like they were laser focused on trying to execute as many instructions as possible on one chip, and only AMD introducing AMD64 snapped them out of it.
Good ol socket melting Pres-hott
And you know what is still shitty? Having to listen to people that insist on having the latest multi-core machine while they're still using single-core software. Memory management isn't the best of that piece of software either. Take a 100 MB drawing (it's all vectors, so this should be a lot smaller anyway, their drawing format must suck), you need at least 24 GB of memory, otherwise it starts swapping to disk, and then it's plainly unusable. autocad is in an incredible load of toss.
It's 2024. There are no latest single-core machines, unless you're talking about microcontrollers.
software is really the black sheep of the tech sector, the average software quality is really low.
@@JonathanMaddox No, but a few years ago a cheaper processor with less cores and higher maximum clock was the better option. Some users would say that you need at least a quad-core processor. Mind you, this was a while ago, so I'll bet they would now be asking for 16-cores.
@@jaapaap123 true, if by "a few years ago" you mean "two decades ago".
@@JonathanMaddox It was around 2015. Of course you are right that there were no single core processors available. It was just that to get the performance needed you needed to select a cpu that was the fastest (and affordable) in that situation.
Even better would have been to ditch that stupid piece of software. Sigh.
what Intel got wrong with their NetBurst microarchitecture is they thought that CPU speed was going to increase; that's why they made NetBurst have such a long pipeline, which is very slow at the frequencies the Pentium 4 was operating at, but would've been much, much faster at say, 7GHz. this is partly why PowerPC was so popular with high-end solutions like Mac workstations, mainframes and gaming consoles: it was both more efficient and more powerful per watt than what x86 could offer.