Thank you for this wonderful video! It helped me to finally understand why the L1/2 specs were such big determinants of the Pentiums' performance. And to hear the creators themselves talk about their precious baby is such a treat!
Unless I'm mistaken Branch Prediction used parallel processing where the main program continued to run based on the prediction, and in instances when the prediction turned out to be wrong the program would back-track and rerun.
So "performance" is just "speed" here... Never mind that your compiled code gets twice as large. Almost seems like a software fix for a lazy hardware design.
Yes, performance was a synonym to speed. It still is. If you take a modern processor and disable the caches it will perform like a 486. The latency has improved only a little bit in the last couple of decades while clocks have improved greatly. That's a real, physical thing and there's almost nothing you can do about it. The whole game is about finding more parallelism somewhere and squeezing out the last drop of running more instructions in parallel across more threads. There is just nothing else left.
@@soylentgreenb "It still is", no it's not. And I don't even have to tell you why, because you already mentioned it: parallelism. Instead of spending a large amount of transistors on the speed of a single core, there's an alternative nowadays. Use the same amount of transistors or die space to build multiple smaller cores instead of one big core. And that includes decisions about cache (also transistors) and therefor things like unrolling loops. Nowadays it's not just "speed", it's "speed per square mm" or "speed per watt".
Thank you for this wonderful video! It helped me to finally understand why the L1/2 specs were such big determinants of the Pentiums' performance. And to hear the creators themselves talk about their precious baby is such a treat!
Wonderful video -- Crawford is truly a pioneer.
Really interesting to hear the history captured.
8:47 that was a good one haha :)
Excellent video
Unless I'm mistaken Branch Prediction used parallel processing where the main program continued to run based on the prediction, and in instances when the prediction turned out to be wrong the program would back-track and rerun.
So "performance" is just "speed" here... Never mind that your compiled code gets twice as large. Almost seems like a software fix for a lazy hardware design.
Yes, performance was a synonym to speed. It still is. If you take a modern processor and disable the caches it will perform like a 486. The latency has improved only a little bit in the last couple of decades while clocks have improved greatly. That's a real, physical thing and there's almost nothing you can do about it. The whole game is about finding more parallelism somewhere and squeezing out the last drop of running more instructions in parallel across more threads. There is just nothing else left.
@@soylentgreenb "It still is", no it's not. And I don't even have to tell you why, because you already mentioned it: parallelism. Instead of spending a large amount of transistors on the speed of a single core, there's an alternative nowadays. Use the same amount of transistors or die space to build multiple smaller cores instead of one big core. And that includes decisions about cache (also transistors) and therefor things like unrolling loops. Nowadays it's not just "speed", it's "speed per square mm" or "speed per watt".
@@ArumesYThe's correct that each of those cores is basically a 486 speedwise when you're done.