The legend that the Pentium Pro was a lemon helped me get the best computer deal I ever got in 1996. I traded my Pentium 166 (which I could barely afford as a poor student) for a Pentium Pro 200 (both including mainboard) with a guy who got the PPro from his rich father. He was unhappy about the PPro's limited gaming performance under Windows 95 and jealous of my Pentium. So I offered him the trade and he was happy (and thought I was a bit stupid). But my computer was running NT4 and Linux and my finite element simulations were running now twice as fast. I still own that Pentium Pro.
Yup, my single encounter with a Pentium Pro was when I was in college. They were literally using it as a door stop, so I threw Linux on it. They ran mostly Macs but had a few Windows users and a student lab with Windows systems in it. I threw CAP (Columbia Appletalk Pacakge) on there, and samba, and they thought it was great to be able to have a shared area accessible from both the Mac and Windows systems. And they thought it was black magic when they asked if there was any possible way to unretire the retired but functional Apple Color Laserwriter they had to use from the student lab, which was all Windows PCs, and I was like "Yup!". (Pointed CAP at it, it showed up in lpr.. no cups yet back then... as a useable printer, pointed samba at that printer queue to share Windows-style. Voila! Up and running.)
Sounds like a win/win. He got a machine more suited to what he wanted to use it for, even though it was ostensibly a lower-end machine. The Pentium 166 was better at 16-bit code, and the deep pipeline and out-of-order execution of the 'Pro increased latency, and certain operations common in gaming were slow.
Great video. It was a both a good memory for me and also triggered my PTSD. I was one of the 20 “product development” team leads on P6 architecture responsible for the post silicon platform bug hunting and validation. In retrospect to all my time at Intel (almost 25 years before accepting an early severance) that Intel’s biggest blind spot was their own ego. The architects didn’t listen to product planners and customers and knew better. The architects at Intel wanted a seat at the industry “adult table” and Pentium was not it. It was suppose to be Itanium but we all know how that ended. The team that did P6 architecture were their first crack at an Intel architecture out of Portland and really wanted to go beyond super scaler into speculative execution. It eventually was pushed into the desktop/mobile/gaming segment but it was a bit of a square peg in a round hole. To your point, you were spot on that Marleting naming it Pentium Pro was a big part of the issue. What’s funny is internally, as you mentioned, there was the P4, P5, P6 where it mapped to 486, Pentium (Pent=5), and then Pentium Pro (P6). I might have even been in the room when they were discussing the external name and it dawned on them that P6 would map to “Sextium” and they all went “well crap”. But giving it a name Pentium Pro and Pentium 2 when the market believed anything Pentium was meant for desktop/mobile/gaming meant calling it Pentium Pro got itself in trouble. PPro and its follow on P2 were really amazing architectures. The team that built the out of order speculative execution did such a good job that as post silicon manager, i never even filed a single bug on that part of the design. And the floating point divide worked as well (whew!) Thanks again for sharing! (BTW you should work with “Dave’s Garage” as he was there in the windows dev side of it all during that time).
Thanks for posting the inside Intel veiw. Everything you've said seams to chimes with what the bulk of intels staff seam to have said over the years about attitudes inside. Nice to hear how bug free it was too. I've seen a few of Dave's vidoes and enjoyed them.
i always thought of it as Pentium for Professionals....I didn't and still don't veiw gaming as a Professional application for computing....because of that, I never had a PPro as I was a gamer....
And later on they went with Xeon for their professional option. Which is ironic, as modern Xeons (or rather anything of the past 20 years) is pretty much identical to their desktop/homeuser counterparts.
I had a dual pentium pro 200 from 1996 to 2002. Originally for modelling and rendering in 3d studio max ... later just for rendering. it was a beast, one of my favourite computers ever.
Right, I was going to comment about the Dual processor support. That was one of the more interesting aspects of it for me at the time. I think I was fully Linux/NT at the time first Pro came out but couldn’t justify the upgrade until later.
At that time Linux was still in its infancy, the machine started as an NT 4.0 machine and it was a war machine. You could render on one CPU and still have power for modelling. Then the machine was upgraded to Windows 2000 Server, I tried redhat but there was no serious support for OpenGL at that time. I sold it to a local shop who used it for CAD until 2008 for sure. And btw, I never had a hardware problem even though I kept the machine on for years.
@SuperWasara I didn't have the money for a quad, I remember another guy in my region had a quad in 1997 as a rack system as a rendering farm (but he had multiple problems with that machine)
If you had a compiler targeting P6 (as the video pointed out), P6 arch was pretty amazing. Funny Intel put some much energy into compilers for P5 and much of those optimizations were not helpful for P6. (Sausage making is never good idea to watch)
@@vardekpetrovic9716 Well yeah, because the K6 was a completely different architecture than the K5. The K6 wasn't an evolution of the K5 at all, it was based on the Nx586 architecture that AMD acquired when they purchased NexGen. The Nx586 (and by extension the K6) was more similar to the Pentium Pro in that it was internally a RISC CPU unlike the K5's oldskool fully CISC design.
@@kirillstp AMD got some narrow appreciation with the K5 - as it outperformed the Pentium on integer operations by a fairly wide margin. The half-speed Floating Point unit was it's major failure. The Athlon Thunderbird generation saw AMD beating Intel across the board at the SAME clock speed for the first time - and it CLOCKED higher too.
@@kirillstp I had an AMD 487 dx4 133 for a couple of years. For its price point the performance was great, somewhere between a p75 and a p90. After I stopped using it as my main machine, it became our household router (dynamic dialup using diald), coupled with a 56k voice modem. We even used the voice features, so it was our answering machine. It also acted as a file server with samba and nfs, we mostly stored MPs on it. We later add slim server on it, for driving various Internet radios around the house.
A small correction: The first x-86 processor to use Risc-like µOps was NextGen's Nx-586 . The second one was AMD's K5. NextGen's design was so good, that AMD scraped the K5 going forward, bought NextGen, and based the K6 and latter on what NextGen had in the pipeline. About the PPro, that was pretty much out of reach for mere mortals, so, the trouybles running 8 or 16 bit code were a non-issue, mostly a storm in a teacup stirred by reviewers that did not have to pay for the machines being reviewed, so, there is that.
I got the Pentium Pro poster as a teenager and I put it on my wall and would drool over it. Neither I or anybody I knew could afford it. I laughed out of spite when the reviews came in that it wasn't much faster running anything on Windows 95.
Yup. I had a K5-75, and later on a K6-2 450. That thing was FAST, I don't know how it performed in Windows but the code built by GCC agreed with it's branch predictor and whatever VERY well, it benched about dead even with a 1ghz Pentium 3.
It would have been if the itanic wasn't awful. If CPUs got 2-3 times faster they could just recompile X86 and essentially emulate it on the itanium. If the itanium had genuinely been much faster new code would have been IA-64. They didn't want to make X86-64 / X64 because it would hurt chances for IA-64. The situation was similar with pentium 4/Netburst. Super deep pipelining would have been a good thing if the processing engineers would have once again pulled a rabbit out of the had managed to forestall the end of Dennard scaling another die shrink or two. They were expecting 8 GHz by 2003 when they were designing the architecture in the late 90's. It's the same bet they had made over and over and won against AMD. The late 80's and 90's saw pipelines getting ever longer, superscalar, out of order execution, clock multiplier, adding and then integrating caches into the core to deal with latency etc. Intel was more aggressive than AMD at this and also a process node ahead. By Athlon AMD had caught up; P4 would massively increase pipelines yet again and add more wider SIMD yet again, but Dennard scaling did not hold and it ran up to a thermal wall a bit below 4 GHz which is around the same clocks CPUs still use today. Now this deeply pipelined chip was only going at 3 GHz; only a little bit higher clocked than AMDs 2 GHz-ish Athlon XP and Athlon 64 chips. Had they managed to run at 8 GHz by 2003 as original intended they would have been hailed as having the foresight to boldly deepen pipelines and do all these things to hide latency better; but they looked like a fool having just run into a wall and bribing dell and others to not sell AMD chips.
@@Bluecedor some engineering business. I think I saved a dual Pentium 3 server and a dual Pentium 3 workstation from the same place. Server was the older P3 that maxed out at 550 MHz and workstation was the newer kind that maxed out at 1.4 GHz.
About 25 years later, by the late 2010s, the tables had turned. Sybase on x64 outpaced Sybase on SPARC... In a way, the P6 architecture never truly went away. After the P3 came the dead-end called the P4, littered by power and heat dissipation problems. By the time it had died its inevitably gruesome death, the Core architecture, which was a continuation of the (mobile) P6 architecture saw the light of day. Through that, the PPro's legacy still lives on.
No joke about running into a power brick wall. It was an existential crisis for a company that leaned heavily into their silicon side expertise for faster and faster silicon while the architects just counted on 2x clock every 18 months thanks to Gordon Moore. Very very big projects at intel were canceled when they finally came to terms that they had to “scale out in cores” vs “scale up in MHz”.
The Pentium M mobile processors (based on the Pentium 3) didn't just bring laptops up to desktop class performance, it turned it on its head (if we ignore the Athlon 64). At least one company (Aopen?? - iirc) produced a desktop motherboard for the Pentium M, and seriously upset Intel in the process as it outperformed Pentium 4s running at approaching double the clock speed while using a fraction of the power & cooling requirements. But Intel did not want the Pentium M to be used in anything except mobile computers and tried to block it. They repeated the trick again when the first generation of Core CPUs were released (based on the Pentium M, mostly dual core). These again offered vastly better performance than a Pentium 4 (which really was a bad joke by then) and could have seen Intel become the more desirable CPU for enthusiasts for the first time in years (since before the Athlon was released) despite being 32-bit only, but again Intel didn't want them used in desktop computers. This wasn't the first time a motherboard maker had gone off-script and upset Intel by making a motherboard which allowed CPUs to be used outside their intended market. When Intel moved the budget Celeron processors from Slot 1 to socket 370, someone (think it was Abit) made a dual-socket motherboard for socket 370. Giving enthusiasts who wanted one an inexpensive dual CPU system.
Pentium 4 isn't the Dead End, its the Pentium D and Pentium M tha is the true end, the Pentium 4 is merely the warning sign and speed bump before the wall that it crashes into
@@patg108 The Pentium D basically consisted of two P4 dies on a single package. The Pentium M is actually P6-based, so it was a continuation of the Pentium Pro, and later evolved into the Core architecture, and as such, thus not a dead-end.
I worked for a company back in the late 90s that did early internet work. We had racks of PentiumPro systems running Linux in our server farm. I was told we standardized on the PPro as it was cheap at the time because of the bad press. I didn't hear about the supposed "flaws" until later when I became a Windows Admin. That community seemed to think they were horrible and I never knew why. Thanks for shining a light of the reason the Windows community thought they were rubbish!
Pentium Pro is what saved Intel. After the NetBurst disaster Asus released an adaptor to let you run a Conroe laptop CPU on a socket 370 board. It destroyed every intel CPU in performance and was based on Pentium Pro as NetBurst was a double disaster in laptops. Shortly after that, intel canned Pentium 4, released Merom on desktop and then released the gen 1 Core Solos and Duos. Rest is history.
And the netburst disaster was one of processing engineering expecations and not of architecture design. They expected Dennard scaling to continue and designed the chip accordingly. It was designed to run at 8 GHz by 2003; if Dennard scaling had continued, it would have and it would have been hailed as a forward looking design that pulled out all the stops to get higher clocks and better performance instead of doing a conservative/iterative design like athlon XP and Athlon 64. That didn't happen and they got the worst of both worlds with a very deep pipeline and not much to show for it. To run at 8 GHz you needed liquid nitrogen and a golden sample to deal with the thermal wall and a lot of tweaking to get the motherboard stable and power delivery beefed up.
I think the PPro would have been regarded more highly if Intel could have ramped up the clock speed. They were unable to do so reliably due to the way the L2 cache was placed in the same package as the CPU die. Intel learned from this with the P2 which is why those processors used the slot 1 format with the L2 located physically very close to the CPU die but separate from it. That along with running the L2 cache at a fraction of the CPU speed allowed for much higher clock speeds than the PPro. The 90s were a very interesting and exciting time in the CPU world!
Around that time Intel was having trouble making SRAM run fast enough to keep up with the processor speed and had to use a different process node for the cache, I dont think it had anything to do with the placement of the cache, it essentially had the same results whether or not it was on the same chip package or placed on a PCB together.
@@nzoomed It was more a size issue. SRAM wasn't yet compact enough to fit an L2 cache on the same die as the CPU, so they placed multiple dies on a carrier in the same package. That made production a lot more costly due to yield and binning problems. The Pentium II solved the cost issue by using a PCB and separately packaged (and binned) dies. By the time of the Pentium III, on-die L2 was possible. Admittedly, I don't know the reasons for the cache clockspeed choices.
@@ghostbirdofprey Yeah quite likely, they are actually facing the same issues today when it comes to scaling down memory. I was watching a video recently that explains the reasons why, but in a nutshell you cant just shrink down the memory without causing issues and the transistor design has to be changed completley.
In my experience "so bad at" means "not much faster than a regular Pentium at the same clock speed." I mean, the full clock speed L2 cache, can often actually make up some of the shortfalls. And once you move over to "mostly 32 bit" code, it can really shift. I'm running OS/2 Warp 3 on mine, and it's really fast. I've also seen people run Windows 95 game benchmarks on it, and it almost keeping up with the Pentium II 233. So I don't think it was a lemon at all. I think people simply used it wrong. It's like those people complaining that the Pentium 60 wasn't as fast as a 486DX4 100MHz. I remember a scientist posting in a computer magazine, saying that he'd been testing some calculation heavy software he was using on the DX4 vs. the Pentuim 60, and the DX4 outperformed the Pentium. The problem was, that code had been heavily optimised for the 486 architecture over years and years, and didn't take advantage of any of the new features of the Pentium. However, I've seen a Pentium 60 absolutely stomping on a DX4 in software that actually is Pentium optimised. So, how good something is, will often depend on whether or not you're using it right.
Yeah, a DX4 will putperform a P5 60 with narrow code. Without using the superscalar nature of the Pentium, it basically becomes a 60 MHz 486. And even there some difference showed up. A DX-50 can and will outperform a DX2-66
Hello fellow OS/2 user! Really cool to hear that Warp 3 runs great on PPro. I temporarily got out of computers in late 1994 for a couple of years (got in trouble for running a “bad BBS”), but was a big OS/2 2.1 user. I had pre-ordered OS/2 Warp 3, and I think my last system using 2.1 was a 486DX2/66 running at 80 MHz, and that ran great!. I’d imagine Warp 3 would scream on a PPro…
Brings back memories... The CAD team at the company I worked for back in 1996 had Pentium Pro equiped Dell workstations, AutoCAD, and huge Eizo monitors 👌
The Pentium Pro is STILL running GrandAdmiralThrawn's blog using Wordpress of all things fine. Granted, it's a dual socket system, but still - it manages to handle modern web applications. Impressive beyond all doubt.
Another great video, thanks. My own memories were that the Pentium Pro wasn't so much a lemon as it was a feat of engineering that nobody knew what to do with. You had to admire that it was very clever, but in the same breath, you also realised a nice fast Pentium (or Pentium MMX) would be a better bet for everyday computing. This is perhaps an odd thing about computing. We want the latest and greatest tech to do the tasks we already do. So, CPU engineers have to design brilliant new technology for tomorrow, to do everything people were doing yesterday. As Itanium and the Pentium Pro demonstrate, I think it took Intel a while to figure this out.
The clue is in the name - Pentium "Pro". Criticising it for backwards compatibility issues is like criticising Windows NT 4.0 Professional for not running your old DOS games. It wasn't designed for that. It was designed for the future, and made the necessary changes to show us that future given the limited resources at its disposal. I can see why it got the reputation it had, but it's a case of Caveat Emptor. They should have done better research before buying. But then, I'm biased. I also had a (dual) Pentium Pro server at my workplace, and it flew. It ran Windows NT Server and Microsoft SQL Server 6.5, so all compiled for it. I have fond memories of that beast. But would I have bought one for home use? Hell no. Wrong choice at that time. But without it moving us towards the future, I'd never have had the awesome AMD Athlon I eventually bought to replace my old 386... and that was an excellent machine. (Which ran Windows 2000 Professional, naturally, because I'm not some kind of monster.) Thanks for covering this. An excellent video that's well researched and presented. And thanks to Matt for his footage.
I actually used to have a DEC Alpha Workstation that came shipped with NT 4.0. I had never seen NT run so stably and it ran DOS games quite well. It may have to do with the excellent x86/DOS emulator that shipped with NT 4.0 for Alpha.
You'd be surprised at number of people I knew who hated Windows NT 4.0 for not running games :D That being said it doesn't mean they were in the right. It's just that a lot of people like that existed. They heard Windows NT was better than regular Windows, they installed it, it was indeed more stable etc. but then it didn't run old games. Heck for more professional stuff than games, I myself used Windows NT (then Windows 2000 later), which I had on different drive than Windows 95/98/98SE. I won't mention "Millenium" because it was the worst system Microsoft released ever. Thankfully in 2000's XP became standard and it was based on NT.
nope it's not unreasonable for consumer to expect backwards compatibaility when it has the same name, the pentium 2 and all subsequent processors are backwards compatible, the entire PC platform had been sucessful due to it's compatability so your comment makes no sense it's like someone making a product then blaming the consumers for "not getting it".... no excuse for faliure, they acceped their mistakes then went on to make a better one (but at least not calling it the same name this time)
@@DarkShroom I get that Pentium was perhaps called a bit confusingly, but it wasn't totally without clues with "PRO" being visible part of the name. Also it was backwards compatible. Just not that fast when doing the "old stuff". And yes, they created "better one" in Pentium II, but they went with "Pentium PRO" ideas to make Pentium III. So in a way, "Pentium PRO" was ahead of it's time for desktop. But was fantastic for Workstations back then already.
Cheers John, you’ve taught me a thing or two about the PPro and I was glad to lend a hand, literally, to this video. I’m off to install NT4 onto the Barn Find Pentium Pro machine
I read that as PP Bro and you giving a hand for some reason... I may be a huge child..... 😊.... I appreciate your time and effort! I hope you are doing well and having a great day or night!
Not watched the video yet but It cannot be considered a lemon as Intel's current Core series of CPUs are still based on the Pentium Pro's Architechture (Core Series is based on Pentium M, Pentium M was mostly a die shrunk power usage optimised Pentium Pro with MMX and SSE, the CPUID instruction even identifies Core CPUs as being in the PPro Family). The Pentium 2 Fixed most of the Pro's problem by ramping clock speeds and dropping prices, who cares about 16 bit speed per clock when running at 450 MHz and your 32 bit and floating point performance was that good. Though Intel's choice of clock speeds would make it difficult to compare Pentium and Pentium 2s clock per clock.
Well, Pentium Pro was also introduced more than a year before MMX was a thing ) And Pentium II already was there half a year after Pentium MMX. I highly doubt a lot of programs have made use of MMX instructions in that timespan. Except for maybe some games via DirectX supporting it.
Agreed I'm no computer buff but I soon cottoned on to the Pentium...WAY outpaced the C..E..L..E..R..O..N... I say to anyone get a Pentium if nothing else. Runs basic computing very well. Neither know, nor care, about gaming. Pentiums work well.
You mix architecture with it’s implementation. P6 was insanely good. PPro was a total disaster. Too expensive either to buy or to manufacture, too many bad assumptions (16 bit code was a massive performance problem). Pentium II was just PPro 1.1
We ran PPros at work for CAD, running NT 4.0 and AutoCAD. They performed very well. But tried Win95 on one for fun, and it was slower than the Pentium I had at home. So, lemon if you bought it for the wrong reasons. It was actually a "Pro" product, not a name thrown on to make it look more expensive like we see on so many products nowadays.
10:32 note: 4-byte alignment mostly only matters for 32-bit accesses. (unless dealing with atomics or similar) even RISC chips support arbitrary byte-level reads and writes
I remember this, and just looked up the details: To be more explicit, partial register writes (e.g., updating just the lower 8 bits of a 32-bit register) introduced complexity in maintaining data consistency and coherence within the renamed registers. This didn't necessarily flush the pipeline, though. That could happen if the out-of-order engine couldn't cope with the complexity of the "hazards". More generally, it took more steps to merge partial registers and maintain coherence, and if the processor used the full value after a partial write, the dependency tracking caused a pipeline *stall* while it waited for the partial write operation to complete. I got the impression (in the day) that this was just a punt on the designers' part -- it could be done (and would be in later CPUs) but was considered unimportant.
The people that were salty about the Pentium Pro were probably the ones that paid a premium for the 200 MHz 1M cache variant in late 1997. Its launch price was $2675, more than twice the cost of the original 200 MHz 256k cache variant released two years earlier for $1375. And despite having four times the cache, it was at best 10% faster, but more realistically 3-5% faster. The Pentium Pro wasn't a design that required large caches to perform well, unlike the later Netburst Pentium 4 that was cache starved for its entire existence. In late 1997, you could buy a 300 MHz Pentium II for $700-800 less, which ran circles around the Pentium Pro, and get them in dual processor configurations. The Pentium II was better with mixed 16/32 bit code, faster and cheaper, making the 200 MHz 1M Pentium Pro a waste of sand. There was a 333 MHz "overdrive" chip, but that was basically a Pentium II in a Socket 8 package. And by the time it was available in mid 1998, far faster options were available.
I have a factory tray of new Pentium Pro 1MB cache with their black anodized tops. They look sharp. I also kept two 333MHz OverDrive processors NIB, though I installed two others in a Linux system that ran for many years. Jumped back in at dual P3 1.4GHz Tualatin as they were definitely worth the sand.
The DX had an integrated math co-processor while the SX did not. For the 486 DX2 it also meant that processor was clock doubled internally and ran at twice the system bus speed. For the earlier 386 line I believe it just referred to the external data bus width (SX was 16-bit while the DX was 32-bit).
I've never in my life heard that Pentium Pro was a "lemon" let alone as a "common" belief. It was marketed by Intel as a business machine, it was built specifically to be the fastest CPU in the world measured by one specific benchmark. It even had "pro" in the name before Apple took that name for iPads. It was super expensive. Who on earth would buy one to game on?
Unfortunately a none trivial number of people, many took pro just to mean better. With certain PC magazines singing it's praises as too how good it was at running 32bit code, without mentioning in 8&16 bit performance. A whole bunch of people took a very different view of it. Then there where the businesses that ran 95/98 on it, they had a rude awakening. The Linux and OS2 users where very happy, the NT users where also happy assuming they where not run win16 applications.
My experience with P-Pros was I was trying to save up for a PC in the mid 90s and someone from my church wanted to sell his Pentium Pro PC to me. When I mentioned the issue of running 16-bit DOS code , he showed off a DOS game on it. It ran...OK but it felt more like an early Pentium than a newer one. For professional workstations running NT it's great! But most people at home were still running 16-bit Windows applications and DOS games so that was the wrong market.. . ... Until the Pentium II made it all a mote point and I ended up getting one for my new PC.
Windows XP was released well into the Pentium III's life - and well last the Pentium II. Most of those platforms started with Windows 98. Heck, even the Pentium 4 was released in 2000. Not to say people didn't run XP on Pentium III systems, but I'd say very few Pentium IIs were running it.
I must have lived in a parallel universe, where PPro was regarded as a wonder powerhouse, a machine to dream about. This is the first time I heard someone calling it lemon.
well it depended completely on how you used it after all if you only used it for 32 bit only software it was a boss super power chip for the time but if you needed 8 and 66 bit that where it was a Lemon
Definitely not a lemon, but also definitely not a desktop CPU, they were a beast of a CPU for the day. Ran a dual PPro box as a server for a number of years and it was working just fine when it was 'obsoleted' as part of a HW refresh, still got one of the CPUs on my desk.
Definitely a desktop CPU. WTF are you talking about? I had a dual PPro as well, but it ran Windows NT 4 Workstation and was, for a time, the best platform you could buy for running Office Professional.
A couple of decades ago I got my hand on a Digital Workstation with this processor, a nice SCSI-drive and some other goodies. It had belonged to the record company EMI, an there were NT4 with MS Office stuck on it. I worked, sure. But with a more modern BSD it works really well. I still have it around for reading out DAT, QIC-tapes and such.
I have an old Pentium Pro machine that was still running last time I checked. I used to program on that thing daily. It was a beast at the time, I loved it. I never experienced any of the mentioned drawbacks, but probably because I was programming on it, compiling my own software, and in general not running unoptimized code. I can totally see the problem if you were running canned software on it. Especially software hand-optimized for older platforms. But to anyone using it for its intended purpose, the P-Pro was amazing. And insanely durable. That thing outlasted all my other machines, even newer ones.
I worked as an Engineer throughout the 90's, and while I never had a Pentium Pro in one my computers, many of my peers did, and being Engineers, we often discussed them and compared them- both on paper, and in the real world. The Pentium Pro was a unique processor, with unique ways of doing things. The result was a processor that was never going to be good at everything, but that might just blow everything else away under the right circumstances. Many of my peers loved their Pentium Pro based machines, and they were noticeable faster than other Intel based computers when doing real engineering work. It's main benefit was that it's L2 cache was enormous compared to other processors of the day. About the time when the Pentium Pro's became a darling CPU for Engineers, I inherited a DEC Alpha based workstation. They were a lot more $$$, but when it came to real world number crunching, that machine blew every other desktop off the map. I was a VERY happy camper!!
I had a Dell Poweredge 4100 it was a dual Pentium Pro system that could be made to fit in mere 14u of server rack space...but in that 14u it would handle more than most other things of the era. I finally retired it in 2005 after running 24x7x365 non-stop for 8 years the only thing replaced in that time was 2 hard drives (of the 6 it had) and one of the power supplies, that had a fan failure. It was truly a beast of a machine.
I had the P Pro 200, 256K when they first came out. I worked at a local computer store as a tech at the time and the owner would let us have anything we wanted and just take a small amount out of our check each month... when he remembered.
Great video, I really enjoy your stuff. Small comment, I believe the processor stalls from partial registers happen when you *write* to a partial register and then later read the larger register. For example, write al, then read ax. Reading partial registers, as in your example, I don't think caused stalls. Reason being, if you wrote to al and then read ax, it needs to put together al and ah, which might not be in the same physical register due to renaming, causing a stall. Writing ax and then reading al or ah shouldn't get a penalty, since the value required is already present in a register. Unfortunately I don't have a ppro or other p6 that I can test on.
I used a dual Pro 200 system (with the 512k cache) as a workstation for a number of years and was very happy with it. Ran great under Linux and kept me going until the Athlon was released many years later. Never knew it had performance problems with 8/16 bit code as I only ever ran linux on it. Amazing system with great longevity at release!
I remember at work there was a PentiumPro workstation which was used for video edition. It came with w95 and it was an unstable lemon. But the rig also came with nt4 workstation. I installed nt4 and then this computer ran vert stable and fast
I am sorry, but I don't think the explanation about the cause of the PPro's 16-bit deficiency is correct. According to the Pentium Pro and Pentium II System Architecture Second Edition book (p393), the reason is that the segment registers are not aliased like the general purpose registers. So a write to the DS register will stall the pipeline because the value is set immediately to the physical register, rather than to an aliased register that can be used by subsequent speculative operations. And it just so happens that 16-bit code does a lot of writes to DS. It shouldn't be a problem with 16-bit like code running as a 32-bit process because with mos OSs you would never really change the DS register. Do you have some benchmarks to prove your theory?
9:50 - That largely depends on what kind of 486. An SX/20 was trash, but some of the later DX4s with write through cache and even AMD's 5x86 offerings were faster than P75, and even P90s in certain benchmarks. From my own experience with Pentium Pros running DOS applications (let's call them games ;) ), I WOULD rather have a DX4/100, 120, or 133. Having said all that a regular Pentium 100 would still be better suited for 8/16 bit processing.
Itanium was basically a double down. Ah, the compilers are going to fix it. They didn't. Even today you would be hard pressed to make a LLVM backend that does that, and you would have to decompile back things to LLVM IR, and deoptimize it. No one is deploying IR code, except perhaps the DotNet and Java.
@4:34 Oh! I had been running 32-bit code on OS/2 since 1992 by the time the Pentium Pro came along.... silly to think of going back to 8-bit or 16-bit at that point! It worked very well (PPro 200 MHz) with OS/2.
Betteridge's Law of Headlines. Nope, the PentiumPro was pretty good. At the time it came out, it was the first x86 processor that could, with smaller datasets, perform faster than some of the load/store processors that were in our compute servers... one of our researchers bought a personal PentiumPro system and showed his codes running on it faster on the CFD codes than the MIPS and SPARC processors on the smaller datasets (the ones that could fit in 32-bit address space... that was the big drawback because our compute servers were 64-bit so could handle lots larger datasets). He drew some ire from the admin when he showed that. He said that this showed that the writing was on the wall for the non-x86 machines and he was right.
My experience with the Pentium during my assembly writing years was that it too shared a lot of these drawbacks. Real/virtual-8086 mode was slower than 32bit protected mode, AL/AH/AX were slower than EAX and bit shifts, and compilers that output badly "optimised" code was the standard of the day. I started with Borland C++ and the code it output was shockingly bad. Teenage me with an x86 reference manual could beat it without really knowing what I was doing. DJGPP was markedly better but I felt like it optimised for 386 compatibility more than really fitting your code to the pentium, so it using AL/AH/AX seems less like a "no penalty" optimisation and more of a "we did this years ago for the 386 and nobody wants to touch it" kind of deal. I never got a chance to touch a Pentium Pro and so its non-32bit penalty very likely was much worse, but my experience was that the Pentium didn't like 8/16bit compatibility code either and you really don't want to use it.
It kinda lives on today since the "core 2" archtecture is almost a copy-paste of the pentium 3 with some tweaks. The "core i" is basically the same thing with HT. Only the most recent intels changed enough that you can say is something different, but at what extent? I don't think theres ever been a change quite like it was from 486 to the pentium pro. It was a fundamental change. Today's it seems more like tweaks upon tweaks.
I was building new cloned PCs as a teenager back when the Pro came out. I remember us all being very disappointed in the first Pro powered machines we built when running games. Cool to finally know exactly why.
I still have about ten of them lying around.Nt 3.51 and 4.0 servers ran great on them. I gave away about 15 dual boards a few years ago and regret that as I'd like to build one now just for the memories. :)
I still have an IBM PC Server 330 with dual Pentium Pros. I sometimes turn it on just for kicks, but it's not useful for much these days. But I love the way the case looks.
on the note of your sponser. I used PCB way once, it was pretty good. I ordered a whole bunch of PCBs for a project, though probably ordered way too many copies of them. I could monitor online the progress of the creation of the boards. Apparently, very few people order yellow PCB, it took a couple days before they even started on that one. The only real problem with it was the import tax. The price of the boards was reasonable, the postage a little less reasonable but still okay, but that's down to me ordering a whole heap of boards, but about a month later, I got a bill from the government about the import tax, which was an unexpected extra cost.
Windows XP came out in the late Pentium III and early Pentium 4 era, so saying "by the time we got to the Pentium II and III, Windows XP came out" isn't really accurate. The P3 fits more with Windows 98.
Back in the 90's I used to work for a small computer manufacturer. We used to build custom Pentium Pro portable video editing machines for the tv/video industry, with Dual SCSI HDD. I thought the CPU, in that use case, we fantastic for the time. Me...I was stuck with a Pentium MMX.
Got a second-hand Pentium PRO machine in 2002, it become my home server between 2002 and 2007. It was fun, when Linux 2.6 arrived it used to run more smoothly on P-PRO compared to 500MHz K6-II. Not faster overall, just some tasks went smoother. Also got a dual P-PRO machine (with 2x 1M-cache black beasts), it could run Win2000 and WinXP smoothly, and under Linux certain things didn't work 'cause it used APM instead of ACPI, and it was disabled by default on SMP systems. Still, was usable as a desktop in 2006-7, with RAM maxed-out to 512 megs (4x 128MB 5V EDO DIMMs, yay!)
Only me wondering why not loading the 4 bytes into eax then bit and with 0xff000000 to check the highest byte, then left shift 8 bit and recheck. By that all operation is fully 32bit and still loading the whole 32 but in one go (although need to handle unaligned access first)
23:50 - hahha yeah I've got one in a shadow box for display purposes only. This came from a recycler 15 years ago though, and I'm never parting with it.
wasn't the Amd K5 the first Cisc to Risk cpu. In fact, the umc green 486. It's incredible speed gains, came from more efficient 'micro code' for each cisc instruction.
Moved from a P90 to a Pro 200 at the time. Moved to NT4.0 at home for better performance. DirectX at the lower NT service packs was quite bad but around SP4,it got much better and could game with little to no issues. With the full speed cache, it dominated the Seti@Home vs much higher clocked PII processors!
When Sun released Solaris x86, I tested it on an old Ibm station with dual pentium pro, and it worked very well. There were two led to monitor processor activity, and the SMP was activated as soon as the installation disk was booted, which was unusual at the time, on Linux, the kernel had to be rebuilt.
While desktop PCs were melting down in the net-burst apocalypse, the Pentium M/CoreSolo became a blessing for laptop PCs. It connected the Pentium 3 to the Pentium 4 chipset bus. So in reality, P6 continued for a little longer.
Wow. This video really brings back a lot! I was working at a Kinkos in the mid 90s and it was a huge cross section of older mainframe guys who never moved past Fortran and younger guys working in computer services in the process of getting their CS degrees. I'm sure I had more than a few spirited convos about the Pent Pro! But for sure the whole upshot was that, while it was supposed to be for 32-bit, Intel was naïve to assume that the NT ecosystem would be ready for it. It wasn't the casual users that really got the shaft, but the smaller business users who assumed that they could just run anything on the Pro
The 8 and 16 bit performance did indeed suck on the Pentium Pro but it was thankfully offset by the huge amount of L2 cache in the same package at full processor clock rate. That L2 cache also made it sing on 32 bit code. In particular were the models with 1 MB of L2 cache at 200 Mhz. The Pentium 2/3 moved the L2 cache externally to the CPU and ran at a divider from the processor clock. For a handful of of tasks, the fastest Pentium Pros could out run the first Pentium 2 chips despite having higher clocks because of the differences in the L2 cache. This is the same reason why Coppermine Pentium 3's were seen as superior to some Katmai Pentium 3's despite having less L2 cache over all. 4 way SMP support was also a huge factor in the Pentium Pro's early success. Often at the time, workstations were only sold with a single CPU but had unpopulated sockets for upgrades down the road. This was seen as a good upgrade path, even compared to the first wave of Pentium 2 systems. As for pricing, these chips and complete systems weren't cheap but they definitely undercut various RISC competitors by a lot. I will say that while technically possible, running Linux on a Pentium Pro wasn't that popular at the time of its release: only the 1.0 kernel had been release prior to the Pentium Pro's launch. Its popularity didn't explode until kernel 2.2 many years later and after Intel had moved onto Pentium 2 and 3. Granted Linux builds from the late '90s and early '00s would run perfectly fine on a Pentium Pro (and technically even some today given enough RAM and patience) but rather they were never really sold/installed together when the hardware was new and relevant.
A neighbour of mine had a dual PPro computer. It was absolutely beastly in 3DS Max which required WinNT at the time as well. I mean Max could run in Windows 95 but would run into UI handle pool issues there. I am very dubious on the explanation in this video though. I mean using h l and x subregisters is still a common thing that a lot of compiled code does, it obviously shouldn't cause a pipeline flush, even if it does introduce some register renaming and microcode overhead because say write operations become read modify write operation sequences instead. The overhead can also create pipeline stall if the hidden register file is small enough and the density of the code which accesses subregisters is high enough, but not a flush. I think the problem was in segment offset addressing performance, which is a mode used exclusively by 16-bit code and not by any 32-bit code. It being uncached together with having a very long memory latency compared to processor throughput would cause a substantial performance regression; then again most of that code was not really performance critical any longer, it sort of mostly lingered. You can also not mix 16-bit and 32-bit code in a process context, you can't just load a 16-bit DLL. Also you keep saying "8-bit code" but that's plain not a thing on PC, for the lack of a corresponding memory model.
I had a Pentium Pro as a kid in 96 or 97, they were such “lemons” they went for dirt cheap at pc flea markets I made a pentium 2 class machine for dang near 486 money
7:00 This is NOT precise, Intel 286-586 do have more than 4 registers do have A,B,C,D but also ESI, EDI can be used for general purpose except does have only one 16,32bit port (depending on addressing mode). So low bits aren't accessible sepaterely but can be used as general purpose registers. Also do have 2x special purpose registers for stack pointer, base pointer.While it's true does have only 4 versatile it does have total of 8. With x86-64 some additional 8 was added. On ARM some of these R16 do have predefined special purpose. Making ARM and x86-64 comparable. x86 do have separate ALU and AGU units. Other architectures doesn't. So it can actually make more than 1 operation per cycle. Plus memory copy or comparison are way faster than other architectures, can be done in ONE. ESI is then source, EDI is destination and ECX is counter. Reminds bit of DMA. Set content hit run REP. and it's doing automatically. Pentium Pro wasnt capable of MMX that extended x86 by 8 x 64bit integer registers, accessible via chunks 32,16,8 bits. Later AMD introduced 3DNow! which was just 16bit floating type aritmetics on MMX registers. Upside down situation to SSE1 which introduced 32bit floating point operation, SSE2 allowed integer operations.. and SSE3 exchange data between SSE and MMX, SSE4 added some string operations, bit manipulation, bit counting etc. - if i remember correctly. The only problem is ammount of data needed to be stored / restored grew so much that CPU need to shuffle up to 8kB of memory of all these registers each time OS switches the context. In fact. More registers more problems. It never was for free.
Haha I just happened to build a PPro machine earlier this week, specifically to play with it and see exactly how bad it is. On a 440FX motherboard, 128MB RAM and dual-boot Win98SE and Win2k. So far it meets my expectations. :) (edit) and yes, AHA2940UW with a nice 18GB SCA80 drive from IBM and a NEC SCSI CDROM.
The PPro project was almost a failure, but Intel pulled if off just in the right time. The ceramic dual-chip package was pain in the ass, i.e. expensive with many half-working CPUs thrown away for recycling simply because either the CPU die or the L2 die would be defective, while both were already soldered together. By the time of PPro release, Intel was already hard working on Pentium II and its new packaging, moving away from the ceramic MCM.
I got a Pentium Pro tower from Dell when they came out, and it was no lemon. It was among the first of the PC's that could comfortably handle large vocabulary speech recognition. It had a lot of compute power and plenty of expansion capacity. I compiled my scientific computing applications for 32 bit and things ran very well compared to other processors of the day. I couldn't afford a SUN or SGI workstation, but Pentium Pro was competitive with those higher end RISC workstations of the day.
To be fair, the Pentium Pro had it's 256K (or even 512K) L2 cache integrated and running at full CPU speed. Probably this has reduced the "flushing" penalty quite a bit.
I worked at intel SSD on the partial system Touchstone Paragon that first broke the 300 GFlop barrier, the rest of the system that broke the 600 GFlop barrier. I had left to go back to school before the whole system was combined to break the 1TFlop barrier. Iirc, that was done at the customer site because we didnt have the space to assemble the entire system. I've talked to quite a few of the engineers and technicians over the years that have many memories from those days. I wish i could live another 300 years to see what this all turns into.
I think the retail market of the Pentium Pro was contrived to fill a gap intentional and not intentional between the MMX Pentium and the Pentium 2. The retail market had tiny gaps but huge incremental price opportunity. It took a while longer than it should have to get the P2 up and the PP was capable with the larger onboard cache. Also they were hugely overclockable with the couple I had and I ran a 166PP at 233 for a couple of years on NT 3.5 and 4. The K6 really upset the plans Intel had in place and caused them to make some changes that included a PP. Also I think you may have confused Gig with meg referring to the ram capability.
32 bit = 4GB of memory in theory. 36 bit = 64GB of memory in theory. In 1995 it was rare that PC had more than 16MB of RAM. Servers might have more, but usually at most 64MB. I'm not sure about servers, because back then I was still mostly in desktop space. But for really large projects (supercomputers), this addressing issue of 32 bits (so being capable to address "only" 4 GB of RAM) might have been an issue. If I'm misunderstanding what you are saying, please correct me.
The other advantage was the PPro's cache ran at the same clock speed as the CPU where the PII's cache ran at half clock speed. That was a big difference for certain types of workloads when comparing the PPro at 150 or 200MHz vs the first PII's at 233MHz (so 116.5MHz cache speed). Also the PPros came in variants with up to 1MB of L2 cache and the PII's only had 512KB.
I vaguely remember that Pentium Pro was popular for scientific computing, which typically benefits a lot from long pipelines and where critical code (linear algebra, FFT) would be hand-written assembly
As it relates to the Pro's legacy, you might also note the P2>P3>Pentium M>Core Series progression that we know today. P4 Netburst couldn't scale and ran too hot, so they went back and iterated on the previously abandoned P3/Pentium M to create the Core and Core Duo (which then went on to be i3/5/7).
I worked for a networked library info system at the time. We tried running our mainframe application on a 370 emulation over multiple Pentium Pro processors. Didn't go very smoothly, and when Y2K forced us to merge with a larger outfit, our CPU's were sold off with filing cabinets, etc. I still have the six Pentium Pro chips, with cooling fans, etc. (Opened one up to view the large dies inside -- impressive!)
i know i've said this before in your comments, but you are SUCH an incredible educator and presenter. always love these videos and your style, this would've been on broadcast TV if it were still the early 2000s.
@3:10 PentiumPro wasn't first, NextGen 5x86 was first x86 with RISC core. (about half year earlier). Also i think it was not a full lemon, at this times i worked in pc building company and at server market (Windows NT) it been adopted widely, but only there ;)
This brought back memories. In the late 90s, I was still on a 486 when a friend of mine offered me a Pentium Pro for free. His work had binned it because it's performance was appalling. Happily for me, I'd just left uni and was missing having Sun box to play around on. So mate gave me the junked box, which was dual processor, and I threw Linux on it. Had to recompile the kernel but it then served me for years. I was very happy with the performance especially for a free system that was supposed to be "junk"!
Another great video thank you, We got a bunch of pentium pro desktops and we loved them but we were running nt4 and a 32 but modelling application. It was part of the justification to switch to pc shortly afterwards from sgi workstations. We kept the same budget just got a lot more machines so people didn’t have to share.
Interesting video, but there's a lot of confused terminology in this, and some incorrect basic assumptions, probably stemming from the terminology confusion. Starting from the terminology. When we say "16-bit code" or "32-bit code" on x86, we don't mean that it uses the 16bit parts of the registers or only the 32-bit parts. We refer to the "mode" the CPU running the code in. Specifically the 8086-compatibility 16-bit "real mode" (in which we can still use 32bit registers), or the 32-bit "protected mode". Therefore, there is no such thing as "8-bit code" on x86. There is no x86 mode in which it emulates an 8080. The incorrect assumption is that a "pure" 32-bit program, or a well-written one, or one produced by a good compiler, will not use 'ax' or 'al'. That just not true. Doing 16-bit or 8-bit loads/stores/operations is unavoidable in many use cases, and not just something compilers used to do as a trick and no longer do. Current 64bit x86 code *will* use 8-bit and 16-bit parts of registers, it's a matter of semantics, not just performance tradeoffs. When you do arithmetic in "short"s in C or "char"s most of the time you're forcing the compiler to do that, and many times you have to, because of file format constraints, network packet formats, or just because you're dealing with text. Finally a couple of very minor points: 1. you said the p2 added "segment register caches", if by that you meant "segment descriptor caches", those existed since the 386. 2. Even thought most games until the late 90s ran on DOS, at that stage (since the early 90s) they usually were fully 32-bit, running in protected mode, usually by relying on a DOS extender like DOS/4GW to do so.
Remember to have a development instance of a well known Germany ERP system running on a „dual lemon board“. Fortunately all 32bit, but really impressing compared to the typical HP/SUN/DEC boxes especially comparing the price.
The CISC direct memory instructions use registers internally, but modern RISC and CISC processors also use register renaming internally, so the number of physical registers far exceeds the number of logical registers, and so there's no real bottleneck in either case. The big difference is that CISC processors will use a lot more energy in circuitry to handle all their complicated instructions (decoding, predicting, side-effects, etc...)
ASCI Red a name I havent heard in a long time.. btw it was SNL. I worked in ICL at UTK/ORNL and we managed a 2.144TF linpack run on ASCI Blue Pacific an IBM PowerPC based system only to have thats year top500 winner to be taken by an upgraded Red machine after it also broke the 2TF threshold. My code was modified by LLNL, and the changes might even be declassified by now.. good times. There was also a Blue Moutain machine built on SGI hw. Nowadays post PetaFlop machines we have two exascale machines, but they rely on GPU like accerators.
I do remember the reviews back in the day. I laughed a lot at my friend for doing exactly what you said. Myself I was never an intel fanboy and was still using an old Macintosh with a Motorola 68030 running at a phenomenal 16mhz. Ah the nostalgia. After moving to PC myself I was more AMD, although I did briefly have a pentium II which I’d inherited but i was already in the middle of building myself a high spec gaming rig, well high spec for the day that is. Anyway good video as always. Have you got that bbs working yet?
I never thought it was a lemon. I couldn't afford one anyway so it never got any consideration. Windows 95 forced me into buying one of those Evergreen CPU upgrades which put an AMD 586 or some such in the slot for the 486 on my motherboard at the time. About the same time I discovered Linux thanks to a CD on the front of a magazine and by the time Windows 98 came out I'd switched to Unix like systems and I've never looked back. Eventually the 486 gave way to a Pentium 133 once they were reasonably old hat but you could walk in to a computer shop and walk out with a motherboard, processor, 8Mb of RAM, and a crap graphics card and a sound card for about £130. Add a really shitty case which could slice your fingers up and a cheap PSU that made you fear electrocution if you got too close and you had something reasonable for not much money (1996/1997 ish). Love your videos btw, excellent work :-)
I recently did a project to demonstrate the concepts used in superscalar CPUs. Register renaming is the most important one. They could’ve got excellent performance all-round if they had used 8-bit registers internally. That would be 4x the number of registers though. Presumably they didn’t do that because the silicon area required scales exponentially with the size of the register file (number of distinct registers).
So was there much of a performance difference then between the Pentium Pro and the Pentium MMX? (Not the II or III but the Pentium MMX) - I did have an MMX 200Mhz back in the day but always wanted to have a Pro system if only because 2+ CPU's = Fun times... and bragging rights.
On 32bit application code, the PentiumPro did run a fair bit quicker. For maths operations that could make use of the SIMD instructions MMX provided, those particular instructions would out pace the PPro, also on mi 32/16 bit code the regular Pentium was faster.
I think a Suse Linux on a used CAD station, I‘m sure it wasn’t my first Suse installation (version 6 box?) but one I used a while for serving files and experimenting with Apache and PHP.
The legend that the Pentium Pro was a lemon helped me get the best computer deal I ever got in 1996. I traded my Pentium 166 (which I could barely afford as a poor student) for a Pentium Pro 200 (both including mainboard) with a guy who got the PPro from his rich father. He was unhappy about the PPro's limited gaming performance under Windows 95 and jealous of my Pentium. So I offered him the trade and he was happy (and thought I was a bit stupid). But my computer was running NT4 and Linux and my finite element simulations were running now twice as fast. I still own that Pentium Pro.
Yup, my single encounter with a Pentium Pro was when I was in college. They were literally using it as a door stop, so I threw Linux on it. They ran mostly Macs but had a few Windows users and a student lab with Windows systems in it. I threw CAP (Columbia Appletalk Pacakge) on there, and samba, and they thought it was great to be able to have a shared area accessible from both the Mac and Windows systems. And they thought it was black magic when they asked if there was any possible way to unretire the retired but functional Apple Color Laserwriter they had to use from the student lab, which was all Windows PCs, and I was like "Yup!". (Pointed CAP at it, it showed up in lpr.. no cups yet back then... as a useable printer, pointed samba at that printer queue to share Windows-style. Voila! Up and running.)
Sounds like a win/win. He got a machine more suited to what he wanted to use it for, even though it was ostensibly a lower-end machine. The Pentium 166 was better at 16-bit code, and the deep pipeline and out-of-order execution of the 'Pro increased latency, and certain operations common in gaming were slow.
@@enilenis That's a cool story! Nice score!
@@enilenisdamn, that's a beast of a machine for the time. Excellent choices were made.
Nice! @@enilenis
Great video. It was a both a good memory for me and also triggered my PTSD. I was one of the 20 “product development” team leads on P6 architecture responsible for the post silicon platform bug hunting and validation.
In retrospect to all my time at Intel (almost 25 years before accepting an early severance) that Intel’s biggest blind spot was their own ego. The architects didn’t listen to product planners and customers and knew better. The architects at Intel wanted a seat at the industry “adult table” and Pentium was not it. It was suppose to be Itanium but we all know how that ended. The team that did P6 architecture were their first crack at an Intel architecture out of Portland and really wanted to go beyond super scaler into speculative execution. It eventually was pushed into the desktop/mobile/gaming segment but it was a bit of a square peg in a round hole.
To your point, you were spot on that Marleting naming it Pentium Pro was a big part of the issue. What’s funny is internally, as you mentioned, there was the P4, P5, P6 where it mapped to 486, Pentium (Pent=5), and then Pentium Pro (P6). I might have even been in the room when they were discussing the external name and it dawned on them that P6 would map to “Sextium” and they all went “well crap”. But giving it a name Pentium Pro and Pentium 2 when the market believed anything Pentium was meant for desktop/mobile/gaming meant calling it Pentium Pro got itself in trouble.
PPro and its follow on P2 were really amazing architectures. The team that built the out of order speculative execution did such a good job that as post silicon manager, i never even filed a single bug on that part of the design. And the floating point divide worked as well (whew!)
Thanks again for sharing!
(BTW you should work with “Dave’s Garage” as he was there in the windows dev side of it all during that time).
Thanks for posting the inside Intel veiw. Everything you've said seams to chimes with what the bulk of intels staff seam to have said over the years about attitudes inside. Nice to hear how bug free it was too. I've seen a few of Dave's vidoes and enjoyed them.
i always thought of it as Pentium for Professionals....I didn't and still don't veiw gaming as a Professional application for computing....because of that, I never had a PPro as I was a gamer....
And later on they went with Xeon for their professional option. Which is ironic, as modern Xeons (or rather anything of the past 20 years) is pretty much identical to their desktop/homeuser counterparts.
ahhhhhh pentium is cos of the number 5 💡!.... it only took me a good ten seconds, the "sextium"... lulwat?
Why not Hextium? 😅
I had a dual pentium pro 200 from 1996 to 2002. Originally for modelling and rendering in 3d studio max ... later just for rendering. it was a beast, one of my favourite computers ever.
Right, I was going to comment about the Dual processor support. That was one of the more interesting aspects of it for me at the time. I think I was fully Linux/NT at the time first Pro came out but couldn’t justify the upgrade until later.
At that time Linux was still in its infancy, the machine started as an NT 4.0 machine and it was a war machine. You could render on one CPU and still have power for modelling. Then the machine was upgraded to Windows 2000 Server, I tried redhat but there was no serious support for OpenGL at that time. I sold it to a local shop who used it for CAD until 2008 for sure. And btw, I never had a hardware problem even though I kept the machine on for years.
@@75slaine P1 and P2/3 can also do dual CPU.. quad is another story..
@SuperWasara I didn't have the money for a quad, I remember another guy in my region had a quad in 1997 as a rack system as a rendering farm (but he had multiple problems with that machine)
If you had a compiler targeting P6 (as the video pointed out), P6 arch was pretty amazing. Funny Intel put some much energy into compilers for P5 and much of those optimizations were not helpful for P6. (Sausage making is never good idea to watch)
The k5 - k6 - athlon saga would be a good topic for this explanation style
@@vardekpetrovic9716 Well yeah, because the K6 was a completely different architecture than the K5. The K6 wasn't an evolution of the K5 at all, it was based on the Nx586 architecture that AMD acquired when they purchased NexGen. The Nx586 (and by extension the K6) was more similar to the Pentium Pro in that it was internally a RISC CPU unlike the K5's oldskool fully CISC design.
Yes, AMD is underappreciated before they came up with amd64
@@kirillstp AMD got some narrow appreciation with the K5 - as it outperformed the Pentium on integer operations by a fairly wide margin.
The half-speed Floating Point unit was it's major failure.
The Athlon Thunderbird generation saw AMD beating Intel across the board at the SAME clock speed for the first time - and it CLOCKED higher too.
@@kirillstp I had an AMD 487 dx4 133 for a couple of years. For its price point the performance was great, somewhere between a p75 and a p90. After I stopped using it as my main machine, it became our household router (dynamic dialup using diald), coupled with a 56k voice modem. We even used the voice features, so it was our answering machine. It also acted as a file server with samba and nfs, we mostly stored MPs on it. We later add slim server on it, for driving various Internet radios around the house.
I got about 10 years use out of it.
A small correction: The first x-86 processor to use Risc-like µOps was NextGen's Nx-586 . The second one was AMD's K5. NextGen's design was so good, that AMD scraped the K5 going forward, bought NextGen, and based the K6 and latter on what NextGen had in the pipeline.
About the PPro, that was pretty much out of reach for mere mortals, so, the trouybles running 8 or 16 bit code were a non-issue, mostly a storm in a teacup stirred by reviewers that did not have to pay for the machines being reviewed, so, there is that.
I got the Pentium Pro poster as a teenager and I put it on my wall and would drool over it. Neither I or anybody I knew could afford it. I laughed out of spite when the reviews came in that it wasn't much faster running anything on Windows 95.
Yup. I had a K5-75, and later on a K6-2 450. That thing was FAST, I don't know how it performed in Windows but the code built by GCC agreed with it's branch predictor and whatever VERY well, it benched about dead even with a 1ghz Pentium 3.
I couldn't afford P200MMX but got a K5-100,I was more than happy, Duke Nukem was flying on that machine 😊
Sounds like their experience with the PPro was what made them think Itanium's terrible 32-bit x86 compatibility was a good idea.
Some foreshadowing for sure
It would have been if the itanic wasn't awful. If CPUs got 2-3 times faster they could just recompile X86 and essentially emulate it on the itanium. If the itanium had genuinely been much faster new code would have been IA-64. They didn't want to make X86-64 / X64 because it would hurt chances for IA-64.
The situation was similar with pentium 4/Netburst. Super deep pipelining would have been a good thing if the processing engineers would have once again pulled a rabbit out of the had managed to forestall the end of Dennard scaling another die shrink or two. They were expecting 8 GHz by 2003 when they were designing the architecture in the late 90's. It's the same bet they had made over and over and won against AMD. The late 80's and 90's saw pipelines getting ever longer, superscalar, out of order execution, clock multiplier, adding and then integrating caches into the core to deal with latency etc. Intel was more aggressive than AMD at this and also a process node ahead. By Athlon AMD had caught up; P4 would massively increase pipelines yet again and add more wider SIMD yet again, but Dennard scaling did not hold and it ran up to a thermal wall a bit below 4 GHz which is around the same clocks CPUs still use today. Now this deeply pipelined chip was only going at 3 GHz; only a little bit higher clocked than AMDs 2 GHz-ish Athlon XP and Athlon 64 chips. Had they managed to run at 8 GHz by 2003 as original intended they would have been hailed as having the foresight to boldly deepen pipelines and do all these things to hide latency better; but they looked like a fool having just run into a wall and bribing dell and others to not sell AMD chips.
Itanium's slow x86 emulation was a symptom of its more basic problem of the architecture making it difficult for compilers to optimize code.
Probably 20 years ago, I saved an e-waste Compaq Workstation with a Pentium Pro and Windows ME. I was surprised how well it ran with only 200 MHz.
OMG who was running WinME on PPros??
@@Bluecedor some engineering business. I think I saved a dual Pentium 3 server and a dual Pentium 3 workstation from the same place. Server was the older P3 that maxed out at 550 MHz and workstation was the newer kind that maxed out at 1.4 GHz.
@@David-he6uj I had a dual 550 at one point. Great times.
About 25 years later, by the late 2010s, the tables had turned. Sybase on x64 outpaced Sybase on SPARC...
In a way, the P6 architecture never truly went away. After the P3 came the dead-end called the P4, littered by power and heat dissipation problems. By the time it had died its inevitably gruesome death, the Core architecture, which was a continuation of the (mobile) P6 architecture saw the light of day. Through that, the PPro's legacy still lives on.
No joke about running into a power brick wall. It was an existential crisis for a company that leaned heavily into their silicon side expertise for faster and faster silicon while the architects just counted on 2x clock every 18 months thanks to Gordon Moore. Very very big projects at intel were canceled when they finally came to terms that they had to “scale out in cores” vs “scale up in MHz”.
Yep, that's why the Core & Core 2 series of CPUs didn't support hyperthreading when their immediate predecessors did.
The Pentium M mobile processors (based on the Pentium 3) didn't just bring laptops up to desktop class performance, it turned it on its head (if we ignore the Athlon 64). At least one company (Aopen?? - iirc) produced a desktop motherboard for the Pentium M, and seriously upset Intel in the process as it outperformed Pentium 4s running at approaching double the clock speed while using a fraction of the power & cooling requirements. But Intel did not want the Pentium M to be used in anything except mobile computers and tried to block it.
They repeated the trick again when the first generation of Core CPUs were released (based on the Pentium M, mostly dual core). These again offered vastly better performance than a Pentium 4 (which really was a bad joke by then) and could have seen Intel become the more desirable CPU for enthusiasts for the first time in years (since before the Athlon was released) despite being 32-bit only, but again Intel didn't want them used in desktop computers.
This wasn't the first time a motherboard maker had gone off-script and upset Intel by making a motherboard which allowed CPUs to be used outside their intended market. When Intel moved the budget Celeron processors from Slot 1 to socket 370, someone (think it was Abit) made a dual-socket motherboard for socket 370. Giving enthusiasts who wanted one an inexpensive dual CPU system.
Pentium 4 isn't the Dead End, its the Pentium D and Pentium M tha is the true end, the Pentium 4 is merely the warning sign and speed bump before the wall that it crashes into
@@patg108 The Pentium D basically consisted of two P4 dies on a single package. The Pentium M is actually P6-based, so it was a continuation of the Pentium Pro, and later evolved into the Core architecture, and as such, thus not a dead-end.
I worked for a company back in the late 90s that did early internet work. We had racks of PentiumPro systems running Linux in our server farm. I was told we standardized on the PPro as it was cheap at the time because of the bad press. I didn't hear about the supposed "flaws" until later when I became a Windows Admin. That community seemed to think they were horrible and I never knew why. Thanks for shining a light of the reason the Windows community thought they were rubbish!
Pentium Pro is what saved Intel.
After the NetBurst disaster Asus released an adaptor to let you run a Conroe laptop CPU on a socket 370 board.
It destroyed every intel CPU in performance and was based on Pentium Pro as NetBurst was a double disaster in laptops.
Shortly after that, intel canned Pentium 4, released Merom on desktop and then released the gen 1 Core Solos and Duos.
Rest is history.
And the netburst disaster was one of processing engineering expecations and not of architecture design. They expected Dennard scaling to continue and designed the chip accordingly. It was designed to run at 8 GHz by 2003; if Dennard scaling had continued, it would have and it would have been hailed as a forward looking design that pulled out all the stops to get higher clocks and better performance instead of doing a conservative/iterative design like athlon XP and Athlon 64.
That didn't happen and they got the worst of both worlds with a very deep pipeline and not much to show for it. To run at 8 GHz you needed liquid nitrogen and a golden sample to deal with the thermal wall and a lot of tweaking to get the motherboard stable and power delivery beefed up.
I think the PPro would have been regarded more highly if Intel could have ramped up the clock speed. They were unable to do so reliably due to the way the L2 cache was placed in the same package as the CPU die. Intel learned from this with the P2 which is why those processors used the slot 1 format with the L2 located physically very close to the CPU die but separate from it. That along with running the L2 cache at a fraction of the CPU speed allowed for much higher clock speeds than the PPro.
The 90s were a very interesting and exciting time in the CPU world!
It was 200mhz when the more common Pentiums were 90/133/166
Around that time Intel was having trouble making SRAM run fast enough to keep up with the processor speed and had to use a different process node for the cache, I dont think it had anything to do with the placement of the cache, it essentially had the same results whether or not it was on the same chip package or placed on a PCB together.
@@KodiakWoodchuck I think the Pentium Pro came in 2 speeds: 180 MHz or 200 MHz.
@@nzoomed It was more a size issue. SRAM wasn't yet compact enough to fit an L2 cache on the same die as the CPU, so they placed multiple dies on a carrier in the same package. That made production a lot more costly due to yield and binning problems. The Pentium II solved the cost issue by using a PCB and separately packaged (and binned) dies. By the time of the Pentium III, on-die L2 was possible.
Admittedly, I don't know the reasons for the cache clockspeed choices.
@@ghostbirdofprey Yeah quite likely, they are actually facing the same issues today when it comes to scaling down memory. I was watching a video recently that explains the reasons why, but in a nutshell you cant just shrink down the memory without causing issues and the transistor design has to be changed completley.
In my experience "so bad at" means "not much faster than a regular Pentium at the same clock speed." I mean, the full clock speed L2 cache, can often actually make up some of the shortfalls. And once you move over to "mostly 32 bit" code, it can really shift. I'm running OS/2 Warp 3 on mine, and it's really fast. I've also seen people run Windows 95 game benchmarks on it, and it almost keeping up with the Pentium II 233. So I don't think it was a lemon at all. I think people simply used it wrong. It's like those people complaining that the Pentium 60 wasn't as fast as a 486DX4 100MHz. I remember a scientist posting in a computer magazine, saying that he'd been testing some calculation heavy software he was using on the DX4 vs. the Pentuim 60, and the DX4 outperformed the Pentium. The problem was, that code had been heavily optimised for the 486 architecture over years and years, and didn't take advantage of any of the new features of the Pentium. However, I've seen a Pentium 60 absolutely stomping on a DX4 in software that actually is Pentium optimised.
So, how good something is, will often depend on whether or not you're using it right.
Yeah, a DX4 will putperform a P5 60 with narrow code. Without using the superscalar nature of the Pentium, it basically becomes a 60 MHz 486.
And even there some difference showed up. A DX-50 can and will outperform a DX2-66
Hello fellow OS/2 user! Really cool to hear that Warp 3 runs great on PPro. I temporarily got out of computers in late 1994 for a couple of years (got in trouble for running a “bad BBS”), but was a big OS/2 2.1 user. I had pre-ordered OS/2 Warp 3, and I think my last system using 2.1 was a 486DX2/66 running at 80 MHz, and that ran great!. I’d imagine Warp 3 would scream on a PPro…
Brings back memories... The CAD team at the company I worked for back in 1996 had Pentium Pro equiped Dell workstations, AutoCAD, and huge Eizo monitors 👌
which is better a 1.4 ghz pentium 3 or a Pentium pro??🤔
We used Pros in the Gfx lab at Uni. They were definitely a bit more powerful than the non pro machines at rendering tasks.
The Pentium Pro is STILL running GrandAdmiralThrawn's blog using Wordpress of all things fine. Granted, it's a dual socket system, but still - it manages to handle modern web applications. Impressive beyond all doubt.
The what?
@@LagrangePoint0 What didn't you understand?
@@samiraperi467 GrandAdmiralThrawn, is that an active website?
Where is this website because modern search engines are broken and I can't find it.
@@vardekpetrovic9716 pretty sure it runs on 2k server
Another great video, thanks. My own memories were that the Pentium Pro wasn't so much a lemon as it was a feat of engineering that nobody knew what to do with. You had to admire that it was very clever, but in the same breath, you also realised a nice fast Pentium (or Pentium MMX) would be a better bet for everyday computing. This is perhaps an odd thing about computing. We want the latest and greatest tech to do the tasks we already do. So, CPU engineers have to design brilliant new technology for tomorrow, to do everything people were doing yesterday. As Itanium and the Pentium Pro demonstrate, I think it took Intel a while to figure this out.
The clue is in the name - Pentium "Pro". Criticising it for backwards compatibility issues is like criticising Windows NT 4.0 Professional for not running your old DOS games.
It wasn't designed for that. It was designed for the future, and made the necessary changes to show us that future given the limited resources at its disposal.
I can see why it got the reputation it had, but it's a case of Caveat Emptor. They should have done better research before buying.
But then, I'm biased. I also had a (dual) Pentium Pro server at my workplace, and it flew. It ran Windows NT Server and Microsoft SQL Server 6.5, so all compiled for it. I have fond memories of that beast. But would I have bought one for home use? Hell no. Wrong choice at that time.
But without it moving us towards the future, I'd never have had the awesome AMD Athlon I eventually bought to replace my old 386... and that was an excellent machine. (Which ran Windows 2000 Professional, naturally, because I'm not some kind of monster.)
Thanks for covering this. An excellent video that's well researched and presented. And thanks to Matt for his footage.
I actually used to have a DEC Alpha Workstation that came shipped with NT 4.0. I had never seen NT run so stably and it ran DOS games quite well.
It may have to do with the excellent x86/DOS emulator that shipped with NT 4.0 for Alpha.
You'd be surprised at number of people I knew who hated Windows NT 4.0 for not running games :D
That being said it doesn't mean they were in the right. It's just that a lot of people like that existed. They heard Windows NT was better than regular Windows, they installed it, it was indeed more stable etc. but then it didn't run old games. Heck for more professional stuff than games, I myself used Windows NT (then Windows 2000 later), which I had on different drive than Windows 95/98/98SE. I won't mention "Millenium" because it was the worst system Microsoft released ever. Thankfully in 2000's XP became standard and it was based on NT.
nope it's not unreasonable for consumer to expect backwards compatibaility when it has the same name, the pentium 2 and all subsequent processors are backwards compatible, the entire PC platform had been sucessful due to it's compatability
so your comment makes no sense it's like someone making a product then blaming the consumers for "not getting it".... no excuse for faliure, they acceped their mistakes then went on to make a better one (but at least not calling it the same name this time)
@@DarkShroom I get that Pentium was perhaps called a bit confusingly, but it wasn't totally without clues with "PRO" being visible part of the name. Also it was backwards compatible. Just not that fast when doing the "old stuff". And yes, they created "better one" in Pentium II, but they went with "Pentium PRO" ideas to make Pentium III. So in a way, "Pentium PRO" was ahead of it's time for desktop. But was fantastic for Workstations back then already.
The important bit of your reply was 'I may be biased'. Yes.
Cheers John, you’ve taught me a thing or two about the PPro and I was glad to lend a hand, literally, to this video. I’m off to install NT4 onto the Barn Find Pentium Pro machine
I read that as PP Bro and you giving a hand for some reason... I may be a huge child..... 😊.... I appreciate your time and effort! I hope you are doing well and having a great day or night!
So not lemon, maybe orange? Mandarin?
Not so lemon 🍋…. More like: Pentium Pro, was it a potato 🥔 ?
A lime perhaps
@@vardekpetrovic9716 Blood Oranges..
tangerine
Not watched the video yet but It cannot be considered a lemon as Intel's current Core series of CPUs are still based on the Pentium Pro's Architechture (Core Series is based on Pentium M, Pentium M was mostly a die shrunk power usage optimised Pentium Pro with MMX and SSE, the CPUID instruction even identifies Core CPUs as being in the PPro Family). The Pentium 2 Fixed most of the Pro's problem by ramping clock speeds and dropping prices, who cares about 16 bit speed per clock when running at 450 MHz and your 32 bit and floating point performance was that good. Though Intel's choice of clock speeds would make it difficult to compare Pentium and Pentium 2s clock per clock.
Well, Pentium Pro was also introduced more than a year before MMX was a thing ) And Pentium II already was there half a year after Pentium MMX.
I highly doubt a lot of programs have made use of MMX instructions in that timespan. Except for maybe some games via DirectX supporting it.
Agreed I'm no computer buff but I soon cottoned on to the Pentium...WAY outpaced the C..E..L..E..R..O..N... I say to anyone get a Pentium if nothing else. Runs basic computing very well. Neither know, nor care, about gaming. Pentiums work well.
You mix architecture with it’s implementation. P6 was insanely good. PPro was a total disaster. Too expensive either to buy or to manufacture, too many bad assumptions (16 bit code was a massive performance problem). Pentium II was just PPro 1.1
Current Intel is fast… and that's it.
Otherwise it's a mess.
Correct. The actual lemon was the P4 with its failed Netburst architecture.
We ran PPros at work for CAD, running NT 4.0 and AutoCAD. They performed very well. But tried Win95 on one for fun, and it was slower than the Pentium I had at home. So, lemon if you bought it for the wrong reasons. It was actually a "Pro" product, not a name thrown on to make it look more expensive like we see on so many products nowadays.
10:32 note: 4-byte alignment mostly only matters for 32-bit accesses. (unless dealing with atomics or similar) even RISC chips support arbitrary byte-level reads and writes
I remember this, and just looked up the details: To be more explicit, partial register writes (e.g., updating just the lower 8 bits of a 32-bit register) introduced complexity in maintaining data consistency and coherence within the renamed registers.
This didn't necessarily flush the pipeline, though. That could happen if the out-of-order engine couldn't cope with the complexity of the "hazards". More generally, it took more steps to merge partial registers and maintain coherence,
and if the processor used the full value after a partial write, the dependency tracking caused a pipeline *stall* while it waited for the partial write operation to complete. I got the impression (in the day) that this was just a punt on the designers' part -- it could be done (and would be in later CPUs) but was considered unimportant.
The people that were salty about the Pentium Pro were probably the ones that paid a premium for the 200 MHz 1M cache variant in late 1997. Its launch price was $2675, more than twice the cost of the original 200 MHz 256k cache variant released two years earlier for $1375. And despite having four times the cache, it was at best 10% faster, but more realistically 3-5% faster. The Pentium Pro wasn't a design that required large caches to perform well, unlike the later Netburst Pentium 4 that was cache starved for its entire existence.
In late 1997, you could buy a 300 MHz Pentium II for $700-800 less, which ran circles around the Pentium Pro, and get them in dual processor configurations. The Pentium II was better with mixed 16/32 bit code, faster and cheaper, making the 200 MHz 1M Pentium Pro a waste of sand. There was a 333 MHz "overdrive" chip, but that was basically a Pentium II in a Socket 8 package. And by the time it was available in mid 1998, far faster options were available.
I have a factory tray of new Pentium Pro 1MB cache with their black anodized tops. They look sharp. I also kept two 333MHz OverDrive processors NIB, though I installed two others in a Linux system that ran for many years. Jumped back in at dual P3 1.4GHz Tualatin as they were definitely worth the sand.
I had one and it was way faster than anything else. It was the ThreadRipper of its day.
Even sized like a thread ripper, the chip was massive
Question: what was the "DX" in the 486DX line? Was it some technical feature or just some marketing term?
The DX had an integrated math co-processor while the SX did not. For the 486 DX2 it also meant that processor was clock doubled internally and ran at twice the system bus speed. For the earlier 386 line I believe it just referred to the external data bus width (SX was 16-bit while the DX was 32-bit).
@@ilovethe70s Much appreciated!
@@cavvieirayou could buy a 487 to supplement your 486SX - but it was actually a complete 486DX that disabled your 486SX!
I've never in my life heard that Pentium Pro was a "lemon" let alone as a "common" belief. It was marketed by Intel as a business machine, it was built specifically to be the fastest CPU in the world measured by one specific benchmark. It even had "pro" in the name before Apple took that name for iPads. It was super expensive. Who on earth would buy one to game on?
Unfortunately a none trivial number of people, many took pro just to mean better. With certain PC magazines singing it's praises as too how good it was at running 32bit code, without mentioning in 8&16 bit performance. A whole bunch of people took a very different view of it. Then there where the businesses that ran 95/98 on it, they had a rude awakening. The Linux and OS2 users where very happy, the NT users where also happy assuming they where not run win16 applications.
My experience with P-Pros was I was trying to save up for a PC in the mid 90s and someone from my church wanted to sell his Pentium Pro PC to me. When I mentioned the issue of running 16-bit DOS code , he showed off a DOS game on it. It ran...OK but it felt more like an early Pentium than a newer one.
For professional workstations running NT it's great! But most people at home were still running 16-bit Windows applications and DOS games so that was the wrong market..
.
... Until the Pentium II made it all a mote point and I ended up getting one for my new PC.
Windows XP was released well into the Pentium III's life - and well last the Pentium II. Most of those platforms started with Windows 98. Heck, even the Pentium 4 was released in 2000.
Not to say people didn't run XP on Pentium III systems, but I'd say very few Pentium IIs were running it.
I must have lived in a parallel universe, where PPro was regarded as a wonder powerhouse, a machine to dream about. This is the first time I heard someone calling it lemon.
well it depended completely on how you used it after all if you only used it for 32 bit only software it was a boss super power chip for the time but if you needed 8 and 66 bit that where it was a Lemon
still got 2 of these bad boys in my gold collection....they came from an old HP server
If I'm not wrong the P6 also formed the basis of Intel's excellent Core architecture which brought Intel back from their Netburst induced malaise.
Definitely not a lemon, but also definitely not a desktop CPU, they were a beast of a CPU for the day.
Ran a dual PPro box as a server for a number of years and it was working just fine when it was 'obsoleted' as part of a HW refresh, still got one of the CPUs on my desk.
I wished i kept my ppro as a trophy but sadly I didn’t. But i do have my P2 brick proudly on display in my home office.
Definitely a desktop CPU. WTF are you talking about? I had a dual PPro as well, but it ran Windows NT 4 Workstation and was, for a time, the best platform you could buy for running Office Professional.
A couple of decades ago I got my hand on a Digital Workstation with this processor, a nice SCSI-drive and some other goodies. It had belonged to the record company EMI, an there were NT4 with MS Office stuck on it. I worked, sure. But with a more modern BSD it works really well. I still have it around for reading out DAT, QIC-tapes and such.
I have an old Pentium Pro machine that was still running last time I checked. I used to program on that thing daily. It was a beast at the time, I loved it. I never experienced any of the mentioned drawbacks, but probably because I was programming on it, compiling my own software, and in general not running unoptimized code. I can totally see the problem if you were running canned software on it. Especially software hand-optimized for older platforms. But to anyone using it for its intended purpose, the P-Pro was amazing. And insanely durable. That thing outlasted all my other machines, even newer ones.
I worked as an Engineer throughout the 90's, and while I never had a Pentium Pro in one my computers, many of my peers did, and being Engineers, we often discussed them and compared them- both on paper, and in the real world. The Pentium Pro was a unique processor, with unique ways of doing things. The result was a processor that was never going to be good at everything, but that might just blow everything else away under the right circumstances. Many of my peers loved their Pentium Pro based machines, and they were noticeable faster than other Intel based computers when doing real engineering work. It's main benefit was that it's L2 cache was enormous compared to other processors of the day. About the time when the Pentium Pro's became a darling CPU for Engineers, I inherited a DEC Alpha based workstation. They were a lot more $$$, but when it came to real world number crunching, that machine blew every other desktop off the map. I was a VERY happy camper!!
I had a Dell Poweredge 4100 it was a dual Pentium Pro system that could be made to fit in mere 14u of server rack space...but in that 14u it would handle more than most other things of the era. I finally retired it in 2005 after running 24x7x365 non-stop for 8 years the only thing replaced in that time was 2 hard drives (of the 6 it had) and one of the power supplies, that had a fan failure. It was truly a beast of a machine.
The Pentium Pro was the 1st commercial Out Of Order CPU, it was tremendously faster than the best Pentium, it was just expensive
it's all about the Pentiums Baby!!!!
it's all about the Pentiums Baby!!!!
I had the P Pro 200, 256K when they first came out. I worked at a local computer store as a tech at the time and the owner would let us have anything we wanted and just take a small amount out of our check each month... when he remembered.
Great video, I really enjoy your stuff.
Small comment, I believe the processor stalls from partial registers happen when you *write* to a partial register and then later read the larger register. For example, write al, then read ax. Reading partial registers, as in your example, I don't think caused stalls.
Reason being, if you wrote to al and then read ax, it needs to put together al and ah, which might not be in the same physical register due to renaming, causing a stall.
Writing ax and then reading al or ah shouldn't get a penalty, since the value required is already present in a register.
Unfortunately I don't have a ppro or other p6 that I can test on.
I used a dual Pro 200 system (with the 512k cache) as a workstation for a number of years and was very happy with it. Ran great under Linux and kept me going until the Athlon was released many years later. Never knew it had performance problems with 8/16 bit code as I only ever ran linux on it. Amazing system with great longevity at release!
I remember at work there was a PentiumPro workstation which was used for video edition. It came with w95 and it was an unstable lemon. But the rig also came with nt4 workstation. I installed nt4 and then this computer ran vert stable and fast
I am sorry, but I don't think the explanation about the cause of the PPro's 16-bit deficiency is correct. According to the Pentium Pro and Pentium II System Architecture Second Edition book (p393), the reason is that the segment registers are not aliased like the general purpose registers. So a write to the DS register will stall the pipeline because the value is set immediately to the physical register, rather than to an aliased register that can be used by subsequent speculative operations. And it just so happens that 16-bit code does a lot of writes to DS. It shouldn't be a problem with 16-bit like code running as a 32-bit process because with mos OSs you would never really change the DS register. Do you have some benchmarks to prove your theory?
It was both the lack of segment descriptor caches and the existence of partial register stalls. Neither caused pipeline flushes. The video is wrong.
So if I may summarize: the software is the lemon!🍋
9:50 - That largely depends on what kind of 486. An SX/20 was trash, but some of the later DX4s with write through cache and even AMD's 5x86 offerings were faster than P75, and even P90s in certain benchmarks. From my own experience with Pentium Pros running DOS applications (let's call them games ;) ), I WOULD rather have a DX4/100, 120, or 133.
Having said all that a regular Pentium 100 would still be better suited for 8/16 bit processing.
16:57 itanium foreshadowing, users do not adapt to your new tech stack as quickly as you hope.
Itanium was basically a double down. Ah, the compilers are going to fix it. They didn't. Even today you would be hard pressed to make a LLVM backend that does that, and you would have to decompile back things to LLVM IR, and deoptimize it. No one is deploying IR code, except perhaps the DotNet and Java.
@4:34 Oh! I had been running 32-bit code on OS/2 since 1992 by the time the Pentium Pro came along.... silly to think of going back to 8-bit or 16-bit at that point! It worked very well (PPro 200 MHz) with OS/2.
The Pentium Pro's register renaming scheme was way ahead of its time.
Betteridge's Law of Headlines. Nope, the PentiumPro was pretty good. At the time it came out, it was the first x86 processor that could, with smaller datasets, perform faster than some of the load/store processors that were in our compute servers... one of our researchers bought a personal PentiumPro system and showed his codes running on it faster on the CFD codes than the MIPS and SPARC processors on the smaller datasets (the ones that could fit in 32-bit address space... that was the big drawback because our compute servers were 64-bit so could handle lots larger datasets). He drew some ire from the admin when he showed that. He said that this showed that the writing was on the wall for the non-x86 machines and he was right.
Did you watch the actual video? He was pretty clear about the specific circumstances in which it performed poorly.
My experience with the Pentium during my assembly writing years was that it too shared a lot of these drawbacks. Real/virtual-8086 mode was slower than 32bit protected mode, AL/AH/AX were slower than EAX and bit shifts, and compilers that output badly "optimised" code was the standard of the day. I started with Borland C++ and the code it output was shockingly bad. Teenage me with an x86 reference manual could beat it without really knowing what I was doing. DJGPP was markedly better but I felt like it optimised for 386 compatibility more than really fitting your code to the pentium, so it using AL/AH/AX seems less like a "no penalty" optimisation and more of a "we did this years ago for the 386 and nobody wants to touch it" kind of deal.
I never got a chance to touch a Pentium Pro and so its non-32bit penalty very likely was much worse, but my experience was that the Pentium didn't like 8/16bit compatibility code either and you really don't want to use it.
It kinda lives on today since the "core 2" archtecture is almost a copy-paste of the pentium 3 with some tweaks. The "core i" is basically the same thing with HT. Only the most recent intels changed enough that you can say is something different, but at what extent? I don't think theres ever been a change quite like it was from 486 to the pentium pro. It was a fundamental change. Today's it seems more like tweaks upon tweaks.
I was building new cloned PCs as a teenager back when the Pro came out. I remember us all being very disappointed in the first Pro powered machines we built when running games. Cool to finally know exactly why.
Pentium pro want for running games it was for servers and workstation
I left for college with a PPRO 200 system when everybody had regular P200. The difference was massive.
I had no money. I bought a Cyrix P200L+, 16 MB EDO RAM, S3 Virge DX 2 MB, 2 GB HDD, Sound Blaster 16. I did not buy a CD drive to cut costs.
A Pentium Pro 200 was still viable years after a Pentium 200 could no longer keep up. The full speed on die L2 cache was a game changer.
I still have about ten of them lying around.Nt 3.51 and 4.0 servers ran great on them. I gave away about 15 dual boards a few years ago and regret that as I'd like to build one now just for the memories. :)
I still have an IBM PC Server 330 with dual Pentium Pros. I sometimes turn it on just for kicks, but it's not useful for much these days. But I love the way the case looks.
on the note of your sponser.
I used PCB way once, it was pretty good. I ordered a whole bunch of PCBs for a project, though probably ordered way too many copies of them.
I could monitor online the progress of the creation of the boards. Apparently, very few people order yellow PCB, it took a couple days before they even started on that one.
The only real problem with it was the import tax. The price of the boards was reasonable, the postage a little less reasonable but still okay, but that's down to me ordering a whole heap of boards, but about a month later, I got a bill from the government about the import tax, which was an unexpected extra cost.
Windows XP came out in the late Pentium III and early Pentium 4 era, so saying "by the time we got to the Pentium II and III, Windows XP came out" isn't really accurate. The P3 fits more with Windows 98.
I was going to say the same. If I'm not mistaken the PII was even before Windows 98.
@@eDoc2020 It was, the Pentium II was released in 1997.
@@eDoc2020I believe it was. I was running 98 on a PII before moving to a new PIII machine running Win2k pro.
pentium 2 runs xp without breaking a sweat.
FYI: Intel was not a new comer to the supercomputing/high performance market. They were making supers back in the 80s as well.
Back in the 90's I used to work for a small computer manufacturer. We used to build custom Pentium Pro portable video editing machines for the tv/video industry, with Dual SCSI HDD. I thought the CPU, in that use case, we fantastic for the time. Me...I was stuck with a Pentium MMX.
Got a second-hand Pentium PRO machine in 2002, it become my home server between 2002 and 2007. It was fun, when Linux 2.6 arrived it used to run more smoothly on P-PRO compared to 500MHz K6-II. Not faster overall, just some tasks went smoother. Also got a dual P-PRO machine (with 2x 1M-cache black beasts), it could run Win2000 and WinXP smoothly, and under Linux certain things didn't work 'cause it used APM instead of ACPI, and it was disabled by default on SMP systems. Still, was usable as a desktop in 2006-7, with RAM maxed-out to 512 megs (4x 128MB 5V EDO DIMMs, yay!)
Only me wondering why not loading the 4 bytes into eax then bit and with 0xff000000 to check the highest byte, then left shift 8 bit and recheck.
By that all operation is fully 32bit and still loading the whole 32 but in one go (although need to handle unaligned access first)
23:50 - hahha yeah I've got one in a shadow box for display purposes only. This came from a recycler 15 years ago though, and I'm never parting with it.
wasn't the Amd K5 the first Cisc to Risk cpu. In fact, the umc green 486. It's incredible speed gains, came from more efficient 'micro code' for each cisc instruction.
Moved from a P90 to a Pro 200 at the time. Moved to NT4.0 at home for better performance. DirectX at the lower NT service packs was quite bad but around SP4,it got much better and could game with little to no issues. With the full speed cache, it dominated the Seti@Home vs much higher clocked PII processors!
When Sun released Solaris x86, I tested it on an old Ibm station with dual pentium pro, and it worked very well. There were two led to monitor processor activity, and the SMP was activated as soon as the installation disk was booted, which was unusual at the time, on Linux, the kernel had to be rebuilt.
While desktop PCs were melting down in the net-burst apocalypse,
the Pentium M/CoreSolo became a blessing for laptop PCs.
It connected the Pentium 3 to the Pentium 4 chipset bus.
So in reality, P6 continued for a little longer.
Wow. This video really brings back a lot! I was working at a Kinkos in the mid 90s and it was a huge cross section of older mainframe guys who never moved past Fortran and younger guys working in computer services in the process of getting their CS degrees. I'm sure I had more than a few spirited convos about the Pent Pro! But for sure the whole upshot was that, while it was supposed to be for 32-bit, Intel was naïve to assume that the NT ecosystem would be ready for it. It wasn't the casual users that really got the shaft, but the smaller business users who assumed that they could just run anything on the Pro
It's a bit like with IA-64/Itanium. Old code was slow but still around.
The 8 and 16 bit performance did indeed suck on the Pentium Pro but it was thankfully offset by the huge amount of L2 cache in the same package at full processor clock rate. That L2 cache also made it sing on 32 bit code. In particular were the models with 1 MB of L2 cache at 200 Mhz.
The Pentium 2/3 moved the L2 cache externally to the CPU and ran at a divider from the processor clock. For a handful of of tasks, the fastest Pentium Pros could out run the first Pentium 2 chips despite having higher clocks because of the differences in the L2 cache. This is the same reason why Coppermine Pentium 3's were seen as superior to some Katmai Pentium 3's despite having less L2 cache over all.
4 way SMP support was also a huge factor in the Pentium Pro's early success. Often at the time, workstations were only sold with a single CPU but had unpopulated sockets for upgrades down the road. This was seen as a good upgrade path, even compared to the first wave of Pentium 2 systems. As for pricing, these chips and complete systems weren't cheap but they definitely undercut various RISC competitors by a lot.
I will say that while technically possible, running Linux on a Pentium Pro wasn't that popular at the time of its release: only the 1.0 kernel had been release prior to the Pentium Pro's launch. Its popularity didn't explode until kernel 2.2 many years later and after Intel had moved onto Pentium 2 and 3. Granted Linux builds from the late '90s and early '00s would run perfectly fine on a Pentium Pro (and technically even some today given enough RAM and patience) but rather they were never really sold/installed together when the hardware was new and relevant.
A neighbour of mine had a dual PPro computer. It was absolutely beastly in 3DS Max which required WinNT at the time as well. I mean Max could run in Windows 95 but would run into UI handle pool issues there.
I am very dubious on the explanation in this video though. I mean using h l and x subregisters is still a common thing that a lot of compiled code does, it obviously shouldn't cause a pipeline flush, even if it does introduce some register renaming and microcode overhead because say write operations become read modify write operation sequences instead. The overhead can also create pipeline stall if the hidden register file is small enough and the density of the code which accesses subregisters is high enough, but not a flush. I think the problem was in segment offset addressing performance, which is a mode used exclusively by 16-bit code and not by any 32-bit code. It being uncached together with having a very long memory latency compared to processor throughput would cause a substantial performance regression; then again most of that code was not really performance critical any longer, it sort of mostly lingered. You can also not mix 16-bit and 32-bit code in a process context, you can't just load a 16-bit DLL.
Also you keep saying "8-bit code" but that's plain not a thing on PC, for the lack of a corresponding memory model.
I had a Pentium Pro as a kid in 96 or 97, they were such “lemons” they went for dirt cheap at pc flea markets I made a pentium 2 class machine for dang near 486 money
7:00 This is NOT precise, Intel 286-586 do have more than 4 registers do have A,B,C,D but also ESI, EDI can be used for general purpose except does have only one 16,32bit port (depending on addressing mode). So low bits aren't accessible sepaterely but can be used as general purpose registers. Also do have 2x special purpose registers for stack pointer, base pointer.While it's true does have only 4 versatile it does have total of 8. With x86-64 some additional 8 was added. On ARM some of these R16 do have predefined special purpose. Making ARM and x86-64 comparable. x86 do have separate ALU and AGU units. Other architectures doesn't. So it can actually make more than 1 operation per cycle. Plus memory copy or comparison are way faster than other architectures, can be done in ONE. ESI is then source, EDI is destination and ECX is counter. Reminds bit of DMA. Set content hit run REP. and it's doing automatically.
Pentium Pro wasnt capable of MMX that extended x86 by 8 x 64bit integer registers, accessible via chunks 32,16,8 bits. Later AMD introduced 3DNow! which was just 16bit floating type aritmetics on MMX registers. Upside down situation to SSE1 which introduced 32bit floating point operation, SSE2 allowed integer operations.. and SSE3 exchange data between SSE and MMX, SSE4 added some string operations, bit manipulation, bit counting etc. - if i remember correctly. The only problem is ammount of data needed to be stored / restored grew so much that CPU need to shuffle up to 8kB of memory of all these registers each time OS switches the context. In fact. More registers more problems. It never was for free.
Haha I just happened to build a PPro machine earlier this week, specifically to play with it and see exactly how bad it is. On a 440FX motherboard, 128MB RAM and dual-boot Win98SE and Win2k. So far it meets my expectations. :) (edit) and yes, AHA2940UW with a nice 18GB SCA80 drive from IBM and a NEC SCSI CDROM.
The PIII was YEARS before XP. It was released in 99, while XP was released late 2001 and really didn't catch on a few more years.
meant for NT, the cheap Itanium !
I worked quite a bit in Sonic Foundry Acid on a ppro 200, it was nice and snappy. It actually ran that program better than my k6-2 (333?) did.
The PPro project was almost a failure, but Intel pulled if off just in the right time. The ceramic dual-chip package was pain in the ass, i.e. expensive with many half-working CPUs thrown away for recycling simply because either the CPU die or the L2 die would be defective, while both were already soldered together. By the time of PPro release, Intel was already hard working on Pentium II and its new packaging, moving away from the ceramic MCM.
I got a Pentium Pro tower from Dell when they came out, and it was no lemon. It was among the first of the PC's that could comfortably handle large vocabulary speech recognition. It had a lot of compute power and plenty of expansion capacity. I compiled my scientific computing applications for 32 bit and things ran very well compared to other processors of the day. I couldn't afford a SUN or SGI workstation, but Pentium Pro was competitive with those higher end RISC workstations of the day.
To be fair, the Pentium Pro had it's 256K (or even 512K) L2 cache integrated and running at full CPU speed. Probably this has reduced the "flushing" penalty quite a bit.
I worked at intel SSD on the partial system Touchstone Paragon that first broke the 300 GFlop barrier, the rest of the system that broke the 600 GFlop barrier. I had left to go back to school before the whole system was combined to break the 1TFlop barrier. Iirc, that was done at the customer site because we didnt have the space to assemble the entire system. I've talked to quite a few of the engineers and technicians over the years that have many memories from those days. I wish i could live another 300 years to see what this all turns into.
I think the retail market of the Pentium Pro was contrived to fill a gap intentional and not intentional between the MMX Pentium and the Pentium 2. The retail market had tiny gaps but huge incremental price opportunity. It took a while longer than it should have to get the P2 up and the PP was capable with the larger onboard cache. Also they were hugely overclockable with the couple I had and I ran a 166PP at 233 for a couple of years on NT 3.5 and 4. The K6 really upset the plans Intel had in place and caused them to make some changes that included a PP. Also I think you may have confused Gig with meg referring to the ram capability.
32 bit = 4GB of memory in theory. 36 bit = 64GB of memory in theory. In 1995 it was rare that PC had more than 16MB of RAM. Servers might have more, but usually at most 64MB. I'm not sure about servers, because back then I was still mostly in desktop space. But for really large projects (supercomputers), this addressing issue of 32 bits (so being capable to address "only" 4 GB of RAM) might have been an issue.
If I'm misunderstanding what you are saying, please correct me.
The other advantage was the PPro's cache ran at the same clock speed as the CPU where the PII's cache ran at half clock speed. That was a big difference for certain types of workloads when comparing the PPro at 150 or 200MHz vs the first PII's at 233MHz (so 116.5MHz cache speed). Also the PPros came in variants with up to 1MB of L2 cache and the PII's only had 512KB.
I vaguely remember that Pentium Pro was popular for scientific computing, which typically benefits a lot from long pipelines and where critical code (linear algebra, FFT) would be hand-written assembly
As it relates to the Pro's legacy, you might also note the P2>P3>Pentium M>Core Series progression that we know today. P4 Netburst couldn't scale and ran too hot, so they went back and iterated on the previously abandoned P3/Pentium M to create the Core and Core Duo (which then went on to be i3/5/7).
I worked for a networked library info system at the time. We tried running our mainframe application on a 370 emulation over multiple Pentium Pro processors. Didn't go very smoothly, and when Y2K forced us to merge with a larger outfit, our CPU's were sold off with filing cabinets, etc.
I still have the six Pentium Pro chips, with cooling fans, etc. (Opened one up to view the large dies inside -- impressive!)
i know i've said this before in your comments, but you are SUCH an incredible educator and presenter. always love these videos and your style, this would've been on broadcast TV if it were still the early 2000s.
@3:10 PentiumPro wasn't first, NextGen 5x86 was first x86 with RISC core. (about half year earlier).
Also i think it was not a full lemon, at this times i worked in pc building company and at server market (Windows NT) it been adopted widely, but only there ;)
Weren't Dothan and Bianes (and Core) also P6 based?
This brought back memories. In the late 90s, I was still on a 486 when a friend of mine offered me a Pentium Pro for free. His work had binned it because it's performance was appalling. Happily for me, I'd just left uni and was missing having Sun box to play around on. So mate gave me the junked box, which was dual processor, and I threw Linux on it. Had to recompile the kernel but it then served me for years. I was very happy with the performance especially for a free system that was supposed to be "junk"!
Another great video thank you,
We got a bunch of pentium pro desktops and we loved them but we were running nt4 and a 32 but modelling application. It was part of the justification to switch to pc shortly afterwards from sgi workstations. We kept the same budget just got a lot more machines so people didn’t have to share.
Interesting video, but there's a lot of confused terminology in this, and some incorrect basic assumptions, probably stemming from the terminology confusion.
Starting from the terminology. When we say "16-bit code" or "32-bit code" on x86, we don't mean that it uses the 16bit parts of the registers or only the 32-bit parts. We refer to the "mode" the CPU running the code in. Specifically the 8086-compatibility 16-bit "real mode" (in which we can still use 32bit registers), or the 32-bit "protected mode". Therefore, there is no such thing as "8-bit code" on x86. There is no x86 mode in which it emulates an 8080.
The incorrect assumption is that a "pure" 32-bit program, or a well-written one, or one produced by a good compiler, will not use 'ax' or 'al'. That just not true. Doing 16-bit or 8-bit loads/stores/operations is unavoidable in many use cases, and not just something compilers used to do as a trick and no longer do. Current 64bit x86 code *will* use 8-bit and 16-bit parts of registers, it's a matter of semantics, not just performance tradeoffs. When you do arithmetic in "short"s in C or "char"s most of the time you're forcing the compiler to do that, and many times you have to, because of file format constraints, network packet formats, or just because you're dealing with text.
Finally a couple of very minor points: 1. you said the p2 added "segment register caches", if by that you meant "segment descriptor caches", those existed since the 386. 2. Even thought most games until the late 90s ran on DOS, at that stage (since the early 90s) they usually were fully 32-bit, running in protected mode, usually by relying on a DOS extender like DOS/4GW to do so.
Remember to have a development instance of a well known Germany ERP system running on a „dual lemon board“. Fortunately all 32bit, but really impressing compared to the typical HP/SUN/DEC boxes especially comparing the price.
We used to run them at our ISP in the 90s..
They ran the radius servers mainly.
how are risc processors not bottlenecked by the number of registers in them if they have no direct access to memory?
The CISC direct memory instructions use registers internally, but modern RISC and CISC processors also use register renaming internally, so the number of physical registers far exceeds the number of logical registers, and so there's no real bottleneck in either case. The big difference is that CISC processors will use a lot more energy in circuitry to handle all their complicated instructions (decoding, predicting, side-effects, etc...)
ASCI Red a name I havent heard in a long time.. btw it was SNL. I worked in ICL at UTK/ORNL and we managed a 2.144TF linpack run on ASCI Blue Pacific an IBM PowerPC based system only to have thats year top500 winner to be taken by an upgraded Red machine after it also broke the 2TF threshold. My code was modified by LLNL, and the changes might even be declassified by now.. good times. There was also a Blue Moutain machine built on SGI hw. Nowadays post PetaFlop machines we have two exascale machines, but they rely on GPU like accerators.
Great video! I hope you'll do a video on how the PPC did it's 68k magic. Looks like there's a story there to be told.
I do remember the reviews back in the day. I laughed a lot at my friend for doing exactly what you said. Myself I was never an intel fanboy and was still using an old Macintosh with a Motorola 68030 running at a phenomenal 16mhz. Ah the nostalgia. After moving to PC myself I was more AMD, although I did briefly have a pentium II which I’d inherited but i was already in the middle of building myself a high spec gaming rig, well high spec for the day that is. Anyway good video as always. Have you got that bbs working yet?
I never thought it was a lemon. I couldn't afford one anyway so it never got any consideration. Windows 95 forced me into buying one of those Evergreen CPU upgrades which put an AMD 586 or some such in the slot for the 486 on my motherboard at the time. About the same time I discovered Linux thanks to a CD on the front of a magazine and by the time Windows 98 came out I'd switched to Unix like systems and I've never looked back. Eventually the 486 gave way to a Pentium 133 once they were reasonably old hat but you could walk in to a computer shop and walk out with a motherboard, processor, 8Mb of RAM, and a crap graphics card and a sound card for about £130. Add a really shitty case which could slice your fingers up and a cheap PSU that made you fear electrocution if you got too close and you had something reasonable for not much money (1996/1997 ish). Love your videos btw, excellent work :-)
I recently did a project to demonstrate the concepts used in superscalar CPUs. Register renaming is the most important one. They could’ve got excellent performance all-round if they had used 8-bit registers internally. That would be 4x the number of registers though. Presumably they didn’t do that because the silicon area required scales exponentially with the size of the register file (number of distinct registers).
So was there much of a performance difference then between the Pentium Pro and the Pentium MMX? (Not the II or III but the Pentium MMX) - I did have an MMX 200Mhz back in the day but always wanted to have a Pro system if only because 2+ CPU's = Fun times... and bragging rights.
On 32bit application code, the PentiumPro did run a fair bit quicker. For maths operations that could make use of the SIMD instructions MMX provided, those particular instructions would out pace the PPro, also on mi 32/16 bit code the regular Pentium was faster.
Isn't the core architecture which intel is still using based more on the piii than the p4?
It was great. My first post-Amiga PC. Served me for years with a few upgrades.
I think a Suse Linux on a used CAD station, I‘m sure it wasn’t my first Suse installation (version 6 box?) but one I used a while for serving files and experimenting with Apache and PHP.