Most of us was suspecting processors to have some dirty instructions hidden inside them, but this is certainly not an easy task to search for them. Sincerely, hats off to this guy!
Not really, well known intel has entire independent risc processor for the purpose of remote monitoring, even when battery is off your intel pc can access microphone and listen to you talk.
The battery may not be completely off. If the motherboard was designed to provide power to the microphone in the "off" state and provide enough power to the CPU to run the necessary instructions, remote monitoring may be possible. The only way to prevent it would be to remove the battery completely. However, monitoring may not be feasible. I expect the process would have to dump data to a hard drive, which would require a lot of power.
Well, we occasionally find out that "secure" products have back doors and we know that the gov't frequently approaches tech companies requesting spy features, so we shouldn't be surprised if hidden ring bypass instructions turn up one day. I'm sure Intel was well compensated for them if they exist.
I am jealous. Not only is this guy much more capable of mentally processing this complex information than I am, he's also incredibly good at presenting it!
they should crowdsource the undocumented instructions the same way that pass mark gets benchmarks. Everyone uploads the results and a website shows a nice breakdown of what instruction run on what.
wiipronhi but if they do stumble across a secret instruction there’s a good chance it will mess up the OS operation. And an emulator wouldn’t have the instructions programmed into it anyway.
It looks like the "halt and catch fire" instruction he describes starting at 38:46 was never described publicly, at least that I could find. As he explains in this talk there was no time for vendors to address the issue under responsible disclosure so he couldn't give all the details at Black Hat 2017, but I couldn't find any reference to publications in the following months as he said would happen. There's a question about it on the Reverse Engineering StackOverflow site, but no one seems to know. I'm not particularly looking for a way to execute this instruction, but I'm curious to know if any mitigation was possible for CPUs that had already shipped. If anyone knows and has a reference this would be much appreciated.
A brilliant talk showing persistence and out of the box thinking. As pointed out, the time is well past for trusting the docs of closed hardware designs! I need to go now and think about working the ideas into my disasm and emulator.
Really good. Having done some Z80 programming once upon a time I know that some undocumented instructions are merely side effects of the microcode and not necessarily intentional. But the point is to figure out the instructions that really are intentional and undocumented. Unintentional undocumented instructions could of course be fun too if you want to do some "smart" programming, but don't expect them to work in the next generation of processors.
This is so well done in every aspect. There are smart people and there are people like this, that goes beyond that. Well performed presentation. As of 2021 I find surprisingly little follow up info on that Halt and Catch Fire instruction - dunno if that's me sucking in searching or that the disclosure still apply.
This is why open source is so important. I would love to see a viable open CPU alternative emerge, on the scale of Linux in the software world. It's not impossible, but it would be a much different and more challenging problem to solve.
Thanks..I was aware of SPARC but didn't realize it was open source. I'm also aware of some other open source CPUs. Still, I don't think it quite meets the mark for a genuine competitor to x86. I mean, I can go out and buy a laptop running Linux but I don't think I can go buy a laptop with a SPARC so easily....
Problem here is licensing. It can't be GNU. Would need to be BSD based to entice innovation and protection of intellectual property. Which means the option to not release certain intellectual property for free. Or rather, the freedom to not be compelled to do so at the cost of the innovator.
Oh hey! It's Tom! Love your learnfun and playfun videos and programs, it's really fun editing the code and seeing if I can optimize the behavior, and it's also just plain hilarious to run them on hacked roms, especially speedhacked ones. Have you considered doing more work on them?
Thanks for the talk! You used such fascinating methodology to carry out this project. Definitely learned a lot, not just about hidden instructions but also software searching techniques.
what a talk! phenomenally creative, important, and useful. i understood almost all of it despite knowing next to nothing about x86, barely anything about process/OS security schemes and how their traps/exceptions are passed around, what the rings mean, and just generally being very new to OS and hardware stuff.
TL;DR: This tool is awesome, thanks for writing it, good thinking all the props. So, some thoughts, sometimes undocumented instructions can be implemented across multiple architectures because it is simply a logical extension of the architecture that isn’t actually documented, but putting in the logic to block those undocumented instructions is more work than it is worth. (i.e. undocumented, or “behaviour undefined” register combinations) similarly, a short-cut in implementation (pinning a behaviour-changing mux to a specific bit of the opcode, rather than actually decoding it) A good example here is the SL1 instruction on the Z80 processor. The three instructions SRA, SLL, SRL are combined in a quad-structure with a section of missing shift-left instructions. Actually executing these instructions results in a shift-left then set bottom-bit. Likely as a result of opcode bits being pinned to muxes. Additionally, there is still a worth in specifications. Specifications detail what is _expected_ behaviour, and what one can rely upon. As such, there’s a giant chasm of instructions that may work at the time, but are not guaranteed to work in the future, or even potentially after a microcode update. So, we get this horrible area, where like the 66 e9 instruction, good software should never include this instruction, because it is non-standard and “a valid implementation of undefined behaviour is to make monkeys fly out of the user’s butt”, but meanwhile, the processor still accepts these instructions, and can potentially then be malicious. At the same time, you can’t _really_ trust that only malicious code uses these instructions… so… poop.
I should imagine a lot of these undocumented instructions would be work in progress, perhaps left there for eventual future use, perhaps used to reduce the cost of prototyping, but the coordination between x86 manufacturers does raise some serious concerns. These could be anything from hyperoptimised inverse square root calculations to deliberate holes in x86 security, put in place for "the right people"... See "idiocy of back doors"... It could also be as simple as Micro$oft (or Apple?) paying them a handsome sum of money to implement a custom instruction set just for them without telling anyone.
The graphics card does the blinking. The cursor blinking was literally done by special circuitry back in the days of text-only computers decades ago when the CPU couldn't possible spare CPU cycles to blink a cursor. And since there's been a continuous evolution of compatible machines ever since, the graphics hardware has never been relieved of that responsibility. Of course, modern GPUs can handle that in their sleep. (If the bug was demonstrated in a Graphical UI, then it probably would freeze.)
@Brick It really is, not only because it would have been difficult to come up with, but also because it would be easy to think something else entirely. I thought it would only stop executing instructions in a specific ring or something like that.
I had some glitch on nvidia drivers on ubuntu that would show up some previous image sprites that were on my windows after I restarted and dual booted into it, it was something related to dual monitor (it didn't work out of the box, I never bothered to fix it), I have no idea how possibly that could have happened, how some sort of buffer could have survived a system restart, but somehow some internal gpu buffer survived, and I never seen anyone talking about that online. It also seems like the cursor is accelerated by the the GPU too, and probably a lot more going on. Check this video out: [PS2 vector unit demonstration Using only 16K of working space; NONE of the ps2 main CPU was used: ua-cam.com/video/qlQhJCuBYsE/v-deo.html] Also, in other video game consoles, there is a lot of GPU compute that happens independently of the main CPU, search for architectures for the gameboy advanced, nintendo ds, nes, etc.
Undocumented instructions have been around at least since the Z80 and perhaps before then. This is an 8-bit CPU which uses a separate 16-bit address bus. The Z80 has two 16-bit index registers, IX and IY, intended solely for indirectly accessing memory, but people noticed that the binary code for accessing these registers was just one byte to say "Use IX" or "Use IY" followed by exactly the same instructions used to access the HL register pair, which are two 8-bit registers which can be used together as a 16-bit address. Since the H and L registers can be used as separate 8-bit registers, people decided to try adding the "Use IX" or "Use IY" byte in front of the 8-bit H or L register instructions and discovered that they could access IXH, IXL, IYH and IYL as 8-bit registers. Many programmers then wrote an include file which defined these undocumented instructions as macros so they could use them directly in their programs.
Nice to hear about the Z80, learned asm programming on that long time ago. My coding style comes from this book by professor Miller (bing will find it online); 8080-Z80AssemblyLanguage-TechniquesForImprovedProgramming.pdf Appendix J mentions those instructions.
With simple processors like the 6502, you could enable parts of more than one instruction because the opcodes weren't discretely decoded. On a modern CPU, a working instruction is most likely intentional.
bryede the thing is: we don't know the exact internals of x86 like we do with 6502 because it's closed hardware and it's constantly changing with each new microarchitecture iteration.
Right, but the x86 is fully microcoded and it can even be updated in the field. This means instructions point to specific internal sequences to be executed. When you find a sequence that works without throwing an exception, it probably means there's something in the table. It doesn't mean it's top secret functionality, but it might be. On the 6502, the undocumented opcodes came from enabling more than one instruction because the sequencing ROM was optimized for size and is built out of fragments that can overlap with illegal bit combinations.
bryede several millions of undocumented instructions would require a fairly big microcode ROM, so we can speculate that there's overlap between some of the documented and some of the undocumented.
Right. Because of how the instructions are prefixed and expanded in a byte-wise fashion, you don't need a table supporting all bit combinations, but rather you move from one 256-entry table to another. I'm just saying all unused byte entries are supposed to throw an exception. Just modifying instructions with prefix bytes won't.
32:56 *_This man is claiming that it would be possible to write a malicious program that is completely benign and undetectable on any x86_64 computer that is not using a physical Intel CPU, but when executed on a computer with an Intel CPU it could arbitrarily execute whatever otherwise deliberately avoided lines of ASM the programmer desired, and also vice versa by inverting the principle._* That's the killing blow of this video and, if his demonstration is true, means that nobody should be using x86 CPUs for cryptography of any meaningful level of security.
actually if you click on the timestamp at the beginning of my comment he is literally saying that a difference between intel and amd (+via) is what causes the vulnerability to exist. it could be said that intel is at fault because their noncompliance with their own documentation is what causes the problem. however all cpu manufacturers are at fault for everything he finds with his method, as he goes on to reveal a pentium f00f-like CPU hardlock he purportedly discovered in an AMD CPU.
The behavior of 0x66 with jumps is well-documented since quite some time. This is just tools not being correct visavi the spec, which is pretty common to be honest for a complex spec. The fact that AMD processors behave differently is a compounding of the problem, however. Qemu is not an instruction set reference by a long shot; it is a best-effort project - and there are likely many other similar bugs to be found. When using VT-x to run, you would get the right results.
I think this was 80% nifty and 20% scary. Nifty for coming up with that approach for finding undocumented opcodes. Scary for the fact that so many were found.
Oh boy. And now it's been revealed that Intel chips created in the past decade have a kernel memory leak "bug"/backdoor. Well, at least the Intel CEO sold as many shares as he could during Q4...
This is awesome! This guy's very resourceful to come up with these tools. Questions tho... Why wouldn't a processor's Instruction Cache prefetch generate the page-fault much earlier than when the straddling instruction was hit?
Ultrajamz technically not part of it the ISA, a whole other layer of shit... Joanna Rutowska has a very long review of the problems with the arch, x86 considered harmful. there's also a great paper on doping attacks in the fabrication process that can reduce the rng entropy, so even if the spec is good the chip itself might not be... It's all so rotten to its core, we really need open hardware and auditable fabrication
That happens with older ME versions as well, anything past Penryn has a 30 minute reset timer. mail.coreboot.org/pipermail/coreboot/2016-September/082021.html
8086/8088 processors had an instruction prefix interrupt bug which wasn't fixed until the 80c88 was introduced. There was also a rather critical bug in early 8086/8088 processors which would cause an interrupt immediately after a MOV SS or POP SS to push data off the stack into an incorrect memory location, causing data corruption.
Well have fun writing fast software for RISC. Cisc is the way to go because you need less instructions to get more stuff done. Like sqare root in 1 op.
HD_Picard Thanks to significant microarchitectural advances in CPU design in the last couple decades, the number of instructions executed is no longer relevant to execution speed.
I have good news for you: x86 has been a RISC CPU for decades. The bad news: it's running a proprietary real-time OS known as "microcode" that emulates the CISC ISA and does all manner of shenanigans behind your back.
thank yu very much yu have rightly guided me to pursue a career in reverse engineering software yu are actually a genius and open sourcing those software tools makes yu true master in your field yu can turn those software tools into million dollar startup
Couldn't help but notice the venue for your seminar. Kind of puts a different slant on things, doesn't it? What a difference a couple of months can make.
This is truly disturbing. This guy was able to do this single handed (of course we can assume that a lot of people helped him, im too lazy to watch the video again to see if he said that he was not alone in this project), imagine what a group of malicious people, that literally do this all day, can actually do? I imagined these "malicious people" have literally all the backdoors in every single [Intel, Amd] hardware (processors) . If they were to find and fix a vulnerability, the "malicious people" still have the rest. But what do i know :) im just a student, and I am sorry for my english, I hope you still be able to get my concerning point.
His search scheme seems to exclude some possibilities. From what he described, if say 00 xx is a two-byte instruction, he checks 00-FF for xx. Then he goes back and checks the length of 01. If it's also two bytes as before, he then doesn't search 01 xx, assuming that the last byte won't do anything special. But what if 01 00 is a three-byte instruction, or 01 23 is a three-byte instruction? Essentially, he seems to assume that there won't be two different types of instructions of the same length, next to each other in his search order. That is, if say 00 is a two-byte instruction, 01 is a one-byte instruction, and then 02 is a two-byte instruction, he will properly search both two byte instructions' second bytes, but if instead 00 and 01 are the two-byte instructions, he won't search the second one. Later he says he has the instruction at the end of a page to use a page fault to detect instruction length. I'd be wary of trusting that mechanism, especially for hidden instructions. I'm sure I've read errata for processors where instructions near or crossing a page boundary have issues.
The only true way to know is to reverse engineer the die. There might be some secret knock codes that is outside the instruction space that could enable special functionality.
I'm kind of curious what sort of similar tools the processor developers have. When I've made processors in uni we would generally brute force/cherrypick things we thought may break stuff.
My guess is many undocumented opcodes just do nothing. My guess the number of really secret opcodes are few, they just treat the potential execution paths as NOPs. Complex CISC CPUs use microcode to decode instructions so just routing them to NOP is more economical. However I still am concerned that the OEMs have included opcodes which may only be used for debugging but are vectors for abuse. What’s worse, some of the vulnerabilities may require several undocumented opcodes to do something so just poking single opcodes isn’t enough to scope the problem set.
Not super familiar with these architectures, but isn't it feasible to have certain combinations of instructions be used to cause specific, hidden behaviors? Seems like something that this tool would miss.
I am reminded of Joe and Gordon running current through every pin of an early 80s IBM, resulting in that 2 inch thick blue binder, which Cameron was subsequently legally barred from reading it while she coded the BIOS for "The Giant."
I like this. But my approach would be more like: Ok, put the candidate processor to be scanned on it's own, isolated mainboard/breadboard/experimental board, and feed the candidate processor instructions from another machine. That way everything is nice n isolated during crashes and could make scanning the ENTIRE instruction set faster and more reliable.
Not a layer, but you could make the loader refuse to load programs that have an undocumented instruction, or that generate code that they later execute. Among other things, you could not use the JIT compilers in Java or ECMAscript runtimes.
Torpcoms Thanks for the answer. I was hoping for something in a lower level and not language restricted, blocking anything in the kernel level and above to use such instructions.
You could execute suspicious software on a fully virtualizing VM patched for this purpose. The VM would have to scan all code scheduled for execution for "undocumented" instructions. Problem is, you will lose a lot of performance. And there would still be the slight issue of hypervisor and disassembler bugs creating false positives/negatives. Watch the video for what can go wrong with such approaches.
@@bakedbillybacon The detail about JIT compilers in Java and ECMAscript runtimes kind of is low-level in ways. Some processors have instructions that allow a subset of JVM bytecode instructions to be used kind of natively... as in in the assembly for the processor. It's not a documented feature in the ARM specification, but I think the name nowadays is Thumb2EE or something like that... the technology known by about as many previous names as a cheap hooker has teeth... so this proposal would indeed strip these devices (like your mobile phone) of their ability to quickly execute various JVM bytecode operations... the JIT compiler would have to simulate using software instead, which would be much slower... Similarly, I think you would lose a lot of multiscalar instructions that Intel has developed which aren't x86-standard...
Interesting! Generally speaking, should the problem of undocumented instructions also be relevant for RISC as well? Even with an open architecture, implementation can be (and likely will be) done secretly. Manufacturers are risking their brand name to have these hidden instructions that they may lose track of. Agree with the concept of developing more universal tools to audit our hardware. We shouldn't only audit software.
About 6m40s in it says #40 is a single byte instruction `inc eax`. I assumed an #40 would cause a situation where if `ax` reached #ffff it would go back to #0000 and set the carry flag, but #66 #40 should increment eax to #00010000. There are several other minor factually confusing statements throughout the video. On the whole this is a great talk and seems to be presented by an absolute expert so now I'm doubting myself.
First of all, inc and dec do not affect the carry flag. An overflow will set the zero flag, but not the carry flag. As for the rest, i think it depends on whether you're executing in 16 or 32 bit mode. If you're in 16 bit mode, then 40 is inc ax, and 66 40 is inc eax, as you describe. If you're in 32 bit mode, then 40 is inc eax, and 66 40 is inc ax, as he describes.
The real question is whether these instructions yield some result because of the inherent physical logic-fabric and the way the op-codes are implemented or if it is a designed instruction.
Fantastic thinking, using PF to find the instruction length, reminds me of when I learnt how traceroute works, ping with TTL 1, ping with TTL 2 etc. And then learning the DNS of assembly is BSing you! And cross platform (Intel.AMD, VIA) GOD BLESS AMERICA
15:40 how that is truth? I thought we don't relay on what's said in documentation and it's sounds like a good idea to hide backdoor/hiden instructions by continuously throwing exceptions.
A few things come to my mind: He talks a lot about "trusting" the processor. I don't think that anyone truly trusts the processor any more than they trust the software. We just have fewer options when it comes to the processor. We can either use a computer or not use it. If I were nefarious and wanted to hide a secret instruction, a couple good candidates would be an "undefined" opcode with ESI and EDI set to special values or DS: MOV AL, AL (an effective no-op that no one would ever use) again with ESI and EDI set to special values. The gaps in the op-code table are supposed to be values that do not correspond to an instruction. They may be filled in by later processors. This is, after all, how the processors have evolved. He says he is doing the entire thing in ring-3. I happen to know that accessing the CR2 register requires ring-0 access. Maybe the operating system is facilitating some of these things. But it still struck me as odd. Setting all the registers to zero is a good start. But some of those instructions include address offsets, which can still overwrite your "supervisor" code. (Okay, he addresses this one.) As for the priority error for undefined opcode vs page fault: Yes, it is an erratum. They decided it was a documentation error and fixed their documentation. First off, I can see where they might miss this. Very few people, outside of maybe myself, are going to deliberately execute an undefined instruction over a page-faulting area. Admittedly, I do a few unusual things. I have used MOV CS, AX as a processor check (runs fine on 8088, undefined opcode on 286, man I'm old.) I miss the days when computers would detect a processor shutdown and just reset the processor. You could use certain memory locations to tell the BIOS where to resume execution. Ah, good times, shut the processor down 20 times and return to DOS like nothing ever happened.
What concerns me is let's say he executes one of these unknown instructions and it alters the way the processor executes a currently benign instruction. Perhaps changing a jump and read to a jump and execute. I'm not saying it's changing the hardware, I'm saying maybe this instruction is inside the processor for a reason that we're not supposed to know about. Such as this instruction flips a bit that diverts an execution pipeline to a part of the processor that has been idle for years that now is operating under an altered instruction set. So I suppose that when this scanning tool is run, you should probably run it numerous times to see if you find some different results during the second pass, indicating a new instruction or instruction set has been enabled. (I'm talking like NSA level stuff)
Another thing that needs to be addressed is capturing what microcode level is being used in the processor. When you flash your bios on your computer, the motherboard vendor will frequently add microcode revisions to the bios that patch bugs in the CPU, but they could also activate or disable these things you weren't supposed to know about. The same thing with silicon revisions and steppings. Those are hardware alterations to the CPU internally.
It seems like exploiting a f00f-like bug (where it stops the processor) could be a more efficient DDOS-like attack provided you can manage to run unsigned code. Just get a company's servers to execute the processor stop instruction and they'll have to reboot all their computers. If you set it to run on startup, then they'll be locked out of their OS, forcing them to solve the issue from the BIOS.
it's DoS, not DDoS! I don't understand why your comment is here, since all of this was said in the talk. Also, your nickname... is this the stop instruction? Did you not hear about responsible disclosure ;)
Just tried the call with 66 prefix in IDA 7 on an i7-6700K and both IDA *and* the CPU interpret it as a call with a 16 bit offset. What CPUs were you testing on?
Is there anything to be learned any more through physical disassembly of the chip? I know the Soviets used to grind the chip, miniscule layer by miniscule layer, but eventually the chips became too tightly packed for this to be viable, but I wonder if technological advances have been made in the last 25 years that could re-enable the technique, even if exotic technology like electron microscopes are needed to observe the chip...
I'm a late viewer of this video. It's some brilliant research, however not unexpected at all. However, CPU instruction fetchers and decoders are not static in this day an age. So. The theoretical malformed but working opcode space is potentially much larger. There is this theory called "Opcode knocking" or "Instruction knocking". This hidden key sequence opens instruction decoding for a class of non-disclosed instructions which have valid and probably security implicating behavior. This opcode space would not be visible unless the opcode key knocking sequence has been entered. So while this helps, there is absolutely no way of knowing backdoor instructions, even with complete pertubation of the variation of opcode space.
Oh man, that page fault analysis is genius.
Guy Smith Intresting, it's the first time hearing it for me though. Just thought it such an creative way to test for instructions.
@@Essometerwhy do the two of you have the same pfp
This guy is impressively good at what he does.
Most of us was suspecting processors to have some dirty instructions hidden inside them, but this is certainly not an easy task to search for them. Sincerely, hats off to this guy!
Not really, well known intel has entire independent risc processor for the purpose of remote monitoring, even when battery is off your intel pc can access microphone and listen to you talk.
marshalcraft how? If your battery is off then the microphone won’t work...
The battery may not be completely off. If the motherboard was designed to provide power to the microphone in the "off" state and provide enough power to the CPU to run the necessary instructions, remote monitoring may be possible. The only way to prevent it would be to remove the battery completely.
However, monitoring may not be feasible. I expect the process would have to dump data to a hard drive, which would require a lot of power.
Well, we occasionally find out that "secure" products have back doors and we know that the gov't frequently approaches tech companies requesting spy features, so we shouldn't be surprised if hidden ring bypass instructions turn up one day. I'm sure Intel was well compensated for them if they exist.
The use of No Execute on memory to find instruction length made me go "oh man" out loud. Genius.
nop nop
and then AMD's #ud before #pf "bug" to circumvent this
its kind of scary how good some of these guys are at figuring these things out
And this is just "some guy". Think of what a team of people like him with an evil motive could do.
@@lbgstzockt8493 especially if given a bottomles budget
I am jealous. Not only is this guy much more capable of mentally processing this complex information than I am, he's also incredibly good at presenting it!
thats why he is speaking!! dum dum dum dum duummmmmmmmmmmmmmmmmmmmmmmmm
no such thing as jealoux or capabx or good or not, cepu, do, say, think any nmw anda ny be perfect
He's a Genius.
He makes it easy to understand that I though I was good. :-)
cause of hard work and countless hours spent hacking/reverse-engineering...
With Intel's ZombieLoad, PlunderVolt, Meltdown and more severe vulnerabilities these days, this video is more relevant than ever
they should crowdsource the undocumented instructions the same way that pass mark gets benchmarks. Everyone uploads the results and a website shows a nice breakdown of what instruction run on what.
I feel like already running sandsifter to be one of the first that uploads to that site when it inevitably starts existing
github.com/rigred/sandsifter-tests
+Josh Beach damn cool, i'd like to run this shit on the brazilian voting machines using cyrix..
wiipronhi but if they do stumble across a secret instruction there’s a good chance it will mess up the OS operation. And an emulator wouldn’t have the instructions programmed into it anyway.
My god that page fault technique is so awesome. So clever
I just keep thinking how poor Terry Davis has wasted all his time.
He needs to make his own silicon as well as his own compiler and OS.
damn i'm in, atleast we got risc v
RIP
May his soul find rest and find the peace he deserves.. RIP dear Terry! ☹
And all this in his own universe :)
@@tsnp423 no not Universe.. the correct word would be *Reality*
It looks like the "halt and catch fire" instruction he describes starting at 38:46 was never described publicly, at least that I could find. As he explains in this talk there was no time for vendors to address the issue under responsible disclosure so he couldn't give all the details at Black Hat 2017, but I couldn't find any reference to publications in the following months as he said would happen. There's a question about it on the Reverse Engineering StackOverflow site, but no one seems to know. I'm not particularly looking for a way to execute this instruction, but I'm curious to know if any mitigation was possible for CPUs that had already shipped. If anyone knows and has a reference this would be much appreciated.
Thanks for following this up! I was extremely curious as well.
Also hoping to hear more about it eventually. It's been a while.
Where are you, Domas? What did you learn?!
A brilliant talk showing persistence and out of the box thinking. As pointed out, the time is well past for trusting the docs of closed hardware designs!
I need to go now and think about working the ideas into my disasm and emulator.
Really good. Having done some Z80 programming once upon a time I know that some undocumented instructions are merely side effects of the microcode and not necessarily intentional. But the point is to figure out the instructions that really are intentional and undocumented.
Unintentional undocumented instructions could of course be fun too if you want to do some "smart" programming, but don't expect them to work in the next generation of processors.
Goddamn, did Intel send assassins after this guy or what?
They hired him.
@@morwar_ To expose themselves?
@@AbhishekBM No, my guess is that what he knows is valuable.
He works for Intel now
Wow. This just shows me I need to brush up on my awful, awful assembly skills. Hats off to Christopher Domas.
This is so well done in every aspect. There are smart people and there are people like this, that goes beyond that. Well performed presentation.
As of 2021 I find surprisingly little follow up info on that Halt and Catch Fire instruction - dunno if that's me sucking in searching or that the disclosure still apply.
This is why open source is so important. I would love to see a viable open CPU alternative emerge, on the scale of Linux in the software world. It's not impossible, but it would be a much different and more challenging problem to solve.
already exists. SPARC.
Thanks..I was aware of SPARC but didn't realize it was open source. I'm also aware of some other open source CPUs. Still, I don't think it quite meets the mark for a genuine competitor to x86. I mean, I can go out and buy a laptop running Linux but I don't think I can go buy a laptop with a SPARC so easily....
Problem here is licensing. It can't be GNU. Would need to be BSD based to entice innovation and protection of intellectual property. Which means the option to not release certain intellectual property for free. Or rather, the freedom to not be compelled to do so at the cost of the innovator.
Kunou RISC-V is BSD-licensed ISA.
RISC V. Linus tech tips made a video on it recently...
Neat talk!
Oh hey! It's Tom! Love your learnfun and playfun videos and programs, it's really fun editing the code and seeing if I can optimize the behavior, and it's also just plain hilarious to run them on hacked roms, especially speedhacked ones. Have you considered doing more work on them?
Thanks for the talk! You used such fascinating methodology to carry out this project. Definitely learned a lot, not just about hidden instructions but also software searching techniques.
what a talk! phenomenally creative, important, and useful. i understood almost all of it despite knowing next to nothing about x86, barely anything about process/OS security schemes and how their traps/exceptions are passed around, what the rings mean, and just generally being very new to OS and hardware stuff.
TL;DR: This tool is awesome, thanks for writing it, good thinking all the props.
So, some thoughts, sometimes undocumented instructions can be implemented across multiple architectures because it is simply a logical extension of the architecture that isn’t actually documented, but putting in the logic to block those undocumented instructions is more work than it is worth. (i.e. undocumented, or “behaviour undefined” register combinations) similarly, a short-cut in implementation (pinning a behaviour-changing mux to a specific bit of the opcode, rather than actually decoding it) A good example here is the SL1 instruction on the Z80 processor. The three instructions SRA, SLL, SRL are combined in a quad-structure with a section of missing shift-left instructions. Actually executing these instructions results in a shift-left then set bottom-bit. Likely as a result of opcode bits being pinned to muxes.
Additionally, there is still a worth in specifications. Specifications detail what is _expected_ behaviour, and what one can rely upon. As such, there’s a giant chasm of instructions that may work at the time, but are not guaranteed to work in the future, or even potentially after a microcode update.
So, we get this horrible area, where like the 66 e9 instruction, good software should never include this instruction, because it is non-standard and “a valid implementation of undefined behaviour is to make monkeys fly out of the user’s butt”, but meanwhile, the processor still accepts these instructions, and can potentially then be malicious.
At the same time, you can’t _really_ trust that only malicious code uses these instructions… so… poop.
This talk takes a while to get interesting but it's worth it
I should imagine a lot of these undocumented instructions would be work in progress, perhaps left there for eventual future use, perhaps used to reduce the cost of prototyping, but the coordination between x86 manufacturers does raise some serious concerns. These could be anything from hyperoptimised inverse square root calculations to deliberate holes in x86 security, put in place for "the right people"... See "idiocy of back doors"...
It could also be as simple as Micro$oft (or Apple?) paying them a handsome sum of money to implement a custom instruction set just for them without telling anyone.
Maybe a stupid question, but how does the cursor keep blinking if the CPU is locked up?
The graphics card does the blinking. The cursor blinking was literally done by special circuitry back in the days of text-only computers decades ago when the CPU couldn't possible spare CPU cycles to blink a cursor. And since there's been a continuous evolution of compatible machines ever since, the graphics hardware has never been relieved of that responsibility. Of course, modern GPUs can handle that in their sleep. (If the bug was demonstrated in a Graphical UI, then it probably would freeze.)
no such thing as stupix qor not
@@TheHuesSciTech eu ia morrer sem saber disso
@Brick It really is, not only because it would have been difficult to come up with, but also because it would be easy to think something else entirely. I thought it would only stop executing instructions in a specific ring or something like that.
I had some glitch on nvidia drivers on ubuntu that would show up some previous image sprites that were on my windows after I restarted and dual booted into it, it was something related to dual monitor (it didn't work out of the box, I never bothered to fix it), I have no idea how possibly that could have happened, how some sort of buffer could have survived a system restart, but somehow some internal gpu buffer survived, and I never seen anyone talking about that online. It also seems like the cursor is accelerated by the the GPU too, and probably a lot more going on.
Check this video out:
[PS2 vector unit demonstration Using only 16K of working space; NONE of the ps2 main CPU was used: ua-cam.com/video/qlQhJCuBYsE/v-deo.html]
Also, in other video game consoles, there is a lot of GPU compute that happens independently of the main CPU, search for architectures for the gameboy advanced, nintendo ds, nes, etc.
When I was a kid guys like this were the rockstars and astronauts to me.
I still feel the same way today in 2024 and I never grew up.
Toys R Us kid till the end!
*"jk i'm malicious af"*
wrr
Finally! Someone speaking at a tech event who isn't a stuttering incompetent mess onstage!
You gotta realize these sorts of people are specialized hard into 1 thing.
Undocumented instructions have been around at least since the Z80 and perhaps before then. This is an 8-bit CPU which uses a separate 16-bit address bus.
The Z80 has two 16-bit index registers, IX and IY, intended solely for indirectly accessing memory, but people noticed that the binary code for accessing these registers was just one byte to say "Use IX" or "Use IY" followed by exactly the same instructions used to access the HL register pair, which are two 8-bit registers which can be used together as a 16-bit address.
Since the H and L registers can be used as separate 8-bit registers, people decided to try adding the "Use IX" or "Use IY" byte in front of the 8-bit H or L register instructions and discovered that they could access IXH, IXL, IYH and IYL as 8-bit registers.
Many programmers then wrote an include file which defined these undocumented instructions as macros so they could use them directly in their programs.
Nice to hear about the Z80, learned asm programming on that long time ago.
My coding style comes from this book by professor Miller (bing will find it online);
8080-Z80AssemblyLanguage-TechniquesForImprovedProgramming.pdf
Appendix J mentions those instructions.
This guy is a genius
Still my favourite talk
Most of those undefined opcodes could be what LAX was on 6502: opcodes without explicit microcode.
With simple processors like the 6502, you could enable parts of more than one instruction because the opcodes weren't discretely decoded. On a modern CPU, a working instruction is most likely intentional.
bryede the thing is: we don't know the exact internals of x86 like we do with 6502 because it's closed hardware and it's constantly changing with each new microarchitecture iteration.
Right, but the x86 is fully microcoded and it can even be updated in the field. This means instructions point to specific internal sequences to be executed. When you find a sequence that works without throwing an exception, it probably means there's something in the table. It doesn't mean it's top secret functionality, but it might be. On the 6502, the undocumented opcodes came from enabling more than one instruction because the sequencing ROM was optimized for size and is built out of fragments that can overlap with illegal bit combinations.
bryede several millions of undocumented instructions would require a fairly big microcode ROM, so we can speculate that there's overlap between some of the documented and some of the undocumented.
Right. Because of how the instructions are prefixed and expanded in a byte-wise fashion, you don't need a table supporting all bit combinations, but rather you move from one 256-entry table to another. I'm just saying all unused byte entries are supposed to throw an exception. Just modifying instructions with prefix bytes won't.
I watched this video 𝟓 𝐓𝐈𝐌𝐄𝐒!... not because I didn't understand it but because it's just wonderful and so INTERESTING. Amazing Black Hat
10:25 when you make the processor roll over for you, you know you're next level
32:56 *_This man is claiming that it would be possible to write a malicious program that is completely benign and undetectable on any x86_64 computer that is not using a physical Intel CPU, but when executed on a computer with an Intel CPU it could arbitrarily execute whatever otherwise deliberately avoided lines of ASM the programmer desired, and also vice versa by inverting the principle._* That's the killing blow of this video and, if his demonstration is true, means that nobody should be using x86 CPUs for cryptography of any meaningful level of security.
His claim applies to all processors. Intel just happens to be the most popular, thus relevant one.
actually if you click on the timestamp at the beginning of my comment he is literally saying that a difference between intel and amd (+via) is what causes the vulnerability to exist. it could be said that intel is at fault because their noncompliance with their own documentation is what causes the problem. however all cpu manufacturers are at fault for everything he finds with his method, as he goes on to reveal a pentium f00f-like CPU hardlock he purportedly discovered in an AMD CPU.
The behavior of 0x66 with jumps is well-documented since quite some time. This is just tools not being correct visavi the spec, which is pretty common to be honest for a complex spec. The fact that AMD processors behave differently is a compounding of the problem, however. Qemu is not an instruction set reference by a long shot; it is a best-effort project - and there are likely many other similar bugs to be found. When using VT-x to run, you would get the right results.
Remarkable person with a unique mind.
I think this was 80% nifty and 20% scary. Nifty for coming up with that approach for finding undocumented opcodes. Scary for the fact that so many were found.
i couldn't agree more with your words, literally.
Wow, that is amazing! Just curious though, any updates on that Ring3 DoS instruction that locked up his CPU?
Has the hardware bug been disclosed yet?
Not to me, at least.
Bump.
Dunno yet, but today some people hinted at a whole new family of Spectre-class vulnerability.
@@y__h This aged quite well haha
22:26 ok this part genuinely looks like what you would see in hollywood movies
ok woah.. this guy has some impressive talks here. a bright mind for sure.
Information dense talk in a manner of John Carmack.
Now i need time to digest.
Awesome.
I understood nothing, but enjoyed it.
Great talk, nice ideas!
So did he ever post an update to the undisclosed bug he talks about at 38:43?
holyfuck. I might be high, but tell me if that wasn't the best blackhat talk yet?
Save the best for last as they say.
damn yeah what kind of person doesn't get excited over ring -1 stuff....
Oh boy. And now it's been revealed that Intel chips created in the past decade have a kernel memory leak "bug"/backdoor. Well, at least the Intel CEO sold as many shares as he could during Q4...
so much respect for those smart people
great stuff! hopefully resulting in some good microcode updates and more secure cpus in the future.
😂
Incredible job done presenting this critical topic. Keep up the good work :)
This is awesome! This guy's very resourceful to come up with these tools. Questions tho... Why wouldn't a processor's Instruction Cache prefetch generate the page-fault much earlier than when the straddling instruction was hit?
Link to GitHub repositories!
github.com/xoreaxeaxeax/sandsifter
THANK YOU!
Github test results repo
github.com/rigred/sandsifter-tests
Intel ME, AMD PSP...
Ultrajamz technically not part of it the ISA, a whole other layer of shit... Joanna Rutowska has a very long review of the problems with the arch, x86 considered harmful. there's also a great paper on doping attacks in the fabrication process that can reduce the rng entropy, so even if the spec is good the chip itself might not be... It's all so rotten to its core, we really need open hardware and auditable fabrication
Johanna *
Rutkowska*
That happens with older ME versions as well, anything past Penryn has a 30 minute reset timer. mail.coreboot.org/pipermail/coreboot/2016-September/082021.html
Yes; also nothing except ME11 is even using x86. AMD's ASP/PSP is an ARM core, and ME versions 10 and below were ARC cores.
8086/8088 processors had an instruction prefix interrupt bug which wasn't fixed until the 80c88 was introduced. There was also a rather critical bug in early 8086/8088 processors which would cause an interrupt immediately after a MOV SS or POP SS to push data off the stack into an incorrect memory location, causing data corruption.
Oh my. That's why I love my RISC CPU. Those x86 CPUs are so complex!
Well have fun writing fast software for RISC. Cisc is the way to go because you need less instructions to get more stuff done. Like sqare root in 1 op.
HD_Picard Thanks to significant microarchitectural advances in CPU design in the last couple decades, the number of instructions executed is no longer relevant to execution speed.
I have good news for you: x86 has been a RISC CPU for decades. The bad news: it's running a proprietary real-time OS known as "microcode" that emulates the CISC ISA and does all manner of shenanigans behind your back.
Waiting and waiting and waiting for some company to push out some RISC-V ISA based CPU for consumers.
If you completely disable the microcode, could you run code on "bare metal"?
thank yu very much yu have rightly guided me to pursue a career in reverse engineering software yu are actually a genius and open sourcing those software tools makes yu true master in your field yu can turn those software tools into million dollar startup
Couldn't help but notice the venue for your seminar. Kind of puts a different slant on things, doesn't it? What a difference a couple of months can make.
18:30 Why fuzzing only 1 random bite will not corrupt memory state? I would have assume that it would be a specific bite that we should not change.
Extremely captivating!
This is truly disturbing. This guy was able to do this single handed (of course we can assume that a lot of people helped him, im too lazy to watch the video again to see if he said that he was not alone in this project), imagine what a group of malicious people, that literally do this all day, can actually do? I imagined these "malicious people" have literally all the backdoors in every single [Intel, Amd] hardware (processors) . If they were to find and fix a vulnerability, the "malicious people" still have the rest. But what do i know :) im just a student, and I am sorry for my english, I hope you still be able to get my concerning point.
His search scheme seems to exclude some possibilities. From what he described, if say 00 xx is a two-byte instruction, he checks 00-FF for xx. Then he goes back and checks the length of 01. If it's also two bytes as before, he then doesn't search 01 xx, assuming that the last byte won't do anything special. But what if 01 00 is a three-byte instruction, or 01 23 is a three-byte instruction?
Essentially, he seems to assume that there won't be two different types of instructions of the same length, next to each other in his search order. That is, if say 00 is a two-byte instruction, 01 is a one-byte instruction, and then 02 is a two-byte instruction, he will properly search both two byte instructions' second bytes, but if instead 00 and 01 are the two-byte instructions, he won't search the second one.
Later he says he has the instruction at the end of a page to use a page fault to detect instruction length. I'd be wary of trusting that mechanism, especially for hidden instructions. I'm sure I've read errata for processors where instructions near or crossing a page boundary have issues.
You are my favorit! Love your stuff.
The only true way to know is to reverse engineer the die. There might be some secret knock codes that is outside the instruction space that could enable special functionality.
probably not. Why would they create something like that, invest their time and money and never use it again? Idk
@@maksymiliank5135 espionage backdoors. There's a reason china wants its own silicon so badly
good stuff
This has been fascinating!
I'm kind of curious what sort of similar tools the processor developers have. When I've made processors in uni we would generally brute force/cherrypick things we thought may break stuff.
Hidden instructions BETWEEN chip manufacturers = NSA. 26:00
... Or simply one manufacturer licensing the instruction set to the other, and implementing them all for inter-compatibility reasons.
My guess is many undocumented opcodes just do nothing. My guess the number of really secret opcodes are few, they just treat the potential execution paths as NOPs.
Complex CISC CPUs use microcode to decode instructions so just routing them to NOP is more economical.
However I still am concerned that the OEMs have included opcodes which may only be used for debugging but are vectors for abuse. What’s worse, some of the vulnerabilities may require several undocumented opcodes to do something so just poking single opcodes isn’t enough to scope the problem set.
42:20 Mine didn't literally halt and catch fire but maybe yours will, _that would be awsome_
Not super familiar with these architectures, but isn't it feasible to have certain combinations of instructions be used to cause specific, hidden behaviors? Seems like something that this tool would miss.
Does anybody have a link for full disclosure?
Mind blowing!! Neat and complete... bravo!! :)
2:31 - secret functionality, list of hardware bugs .. 20:30 Capstone disassembler ..
i wonder if AI could be used to make correlations and or do testing or variations of things/data?
The man is a genius!
I am reminded of Joe and Gordon running current through every pin of an early 80s IBM, resulting in that 2 inch thick blue binder, which Cameron was subsequently legally barred from reading it while she coded the BIOS for "The Giant."
can you send a link...?
He did fuzzing bytes up to 15, seeing how the instruction pointer register incremented.
Results start at 25:00
I like this.
But my approach would be more like: Ok, put the candidate processor to be scanned on it's own, isolated mainboard/breadboard/experimental board, and feed the candidate processor instructions from another machine.
That way everything is nice n isolated during crashes and could make scanning the ENTIRE instruction set faster and more reliable.
i saw some shit like that already... don't remember where.
This is really well explained.
Couldn't you create a "layer" in the linux kernel to block all instructions that are not documented?
Not a layer, but you could make the loader refuse to load programs that have an undocumented instruction, or that generate code that they later execute. Among other things, you could not use the JIT compilers in Java or ECMAscript runtimes.
Torpcoms Thanks for the answer. I was hoping for something in a lower level and not language restricted, blocking anything in the kernel level and above to use such instructions.
You could execute suspicious software on a fully virtualizing VM patched for this purpose. The VM would have to scan all code scheduled for execution for "undocumented" instructions. Problem is, you will lose a lot of performance. And there would still be the slight issue of hypervisor and disassembler bugs creating false positives/negatives. Watch the video for what can go wrong with such approaches.
+David Kofler vm's can't prevent magic numbers...
@@bakedbillybacon The detail about JIT compilers in Java and ECMAscript runtimes kind of is low-level in ways. Some processors have instructions that allow a subset of JVM bytecode instructions to be used kind of natively... as in in the assembly for the processor. It's not a documented feature in the ARM specification, but I think the name nowadays is Thumb2EE or something like that... the technology known by about as many previous names as a cheap hooker has teeth... so this proposal would indeed strip these devices (like your mobile phone) of their ability to quickly execute various JVM bytecode operations... the JIT compiler would have to simulate using software instead, which would be much slower... Similarly, I think you would lose a lot of multiscalar instructions that Intel has developed which aren't x86-standard...
Interesting! Generally speaking, should the problem of undocumented instructions also be relevant for RISC as well? Even with an open architecture, implementation can be (and likely will be) done secretly. Manufacturers are risking their brand name to have these hidden instructions that they may lose track of.
Agree with the concept of developing more universal tools to audit our hardware. We shouldn't only audit software.
26:53 **Puts on tin foil hat**
No seriously, that shit's scary
About 6m40s in it says #40 is a single byte instruction `inc eax`. I assumed an #40 would cause a situation where if `ax` reached #ffff it would go back to #0000 and set the carry flag, but #66 #40 should increment eax to #00010000. There are several other minor factually confusing statements throughout the video. On the whole this is a great talk and seems to be presented by an absolute expert so now I'm doubting myself.
First of all, inc and dec do not affect the carry flag. An overflow will set the zero flag, but not the carry flag. As for the rest, i think it depends on whether you're executing in 16 or 32 bit mode. If you're in 16 bit mode, then 40 is inc ax, and 66 40 is inc eax, as you describe. If you're in 32 bit mode, then 40 is inc eax, and 66 40 is inc ax, as he describes.
Well this video is certainly a great deal more relevant. Fuck.
The real question is whether these instructions yield some result because of the inherent physical logic-fabric and the way the op-codes are implemented or if it is a designed instruction.
did that brick the processor or only until power cycled?
technically unless it have some non volatile status memory
He said he tried it on multiple os and it always locked the cpu. So sounds like it's just locked until power cycles.
Fantastic thinking, using PF to find the instruction length, reminds me of when I learnt how traceroute works, ping with TTL 1, ping with TTL 2 etc. And then learning the DNS of assembly is BSing you! And cross platform (Intel.AMD, VIA) GOD BLESS AMERICA
15:40 how that is truth? I thought we don't relay on what's said in documentation and it's sounds like a good idea to hide backdoor/hiden instructions by continuously throwing exceptions.
Are there registers they don't tell us about?
A few things come to my mind:
He talks a lot about "trusting" the processor. I don't think that anyone truly trusts the processor any more than they trust the software. We just have fewer options when it comes to the processor. We can either use a computer or not use it.
If I were nefarious and wanted to hide a secret instruction, a couple good candidates would be an "undefined" opcode with ESI and EDI set to special values or DS: MOV AL, AL (an effective no-op that no one would ever use) again with ESI and EDI set to special values.
The gaps in the op-code table are supposed to be values that do not correspond to an instruction. They may be filled in by later processors. This is, after all, how the processors have evolved.
He says he is doing the entire thing in ring-3. I happen to know that accessing the CR2 register requires ring-0 access. Maybe the operating system is facilitating some of these things. But it still struck me as odd.
Setting all the registers to zero is a good start. But some of those instructions include address offsets, which can still overwrite your "supervisor" code. (Okay, he addresses this one.)
As for the priority error for undefined opcode vs page fault: Yes, it is an erratum. They decided it was a documentation error and fixed their documentation. First off, I can see where they might miss this. Very few people, outside of maybe myself, are going to deliberately execute an undefined instruction over a page-faulting area. Admittedly, I do a few unusual things. I have used MOV CS, AX as a processor check (runs fine on 8088, undefined opcode on 286, man I'm old.)
I miss the days when computers would detect a processor shutdown and just reset the processor. You could use certain memory locations to tell the BIOS where to resume execution. Ah, good times, shut the processor down 20 times and return to DOS like nothing ever happened.
Did he release info on his f00f bug discovery?
Awesome. Pretty clever tricks!
Ever heard of the Talpiot Program?
Incredible talk
What concerns me is let's say he executes one of these unknown instructions and it alters the way the processor executes a currently benign instruction. Perhaps changing a jump and read to a jump and execute. I'm not saying it's changing the hardware, I'm saying maybe this instruction is inside the processor for a reason that we're not supposed to know about. Such as this instruction flips a bit that diverts an execution pipeline to a part of the processor that has been idle for years that now is operating under an altered instruction set. So I suppose that when this scanning tool is run, you should probably run it numerous times to see if you find some different results during the second pass, indicating a new instruction or instruction set has been enabled. (I'm talking like NSA level stuff)
Another thing that needs to be addressed is capturing what microcode level is being used in the processor. When you flash your bios on your computer, the motherboard vendor will frequently add microcode revisions to the bios that patch bugs in the CPU, but they could also activate or disable these things you weren't supposed to know about. The same thing with silicon revisions and steppings. Those are hardware alterations to the CPU internally.
It seems like exploiting a f00f-like bug (where it stops the processor) could be a more efficient DDOS-like attack provided you can manage to run unsigned code. Just get a company's servers to execute the processor stop instruction and they'll have to reboot all their computers. If you set it to run on startup, then they'll be locked out of their OS, forcing them to solve the issue from the BIOS.
it's DoS, not DDoS! I don't understand why your comment is here, since all of this was said in the talk. Also, your nickname... is this the stop instruction? Did you not hear about responsible disclosure ;)
They carefully wrote "DDoS-like" and were talking about a specific scenario
This is actually something amazing
Just tried the call with 66 prefix in IDA 7 on an i7-6700K and both IDA *and* the CPU interpret it as a call with a 16 bit offset. What CPUs were you testing on?
Outstanding work
Is there anything to be learned any more through physical disassembly of the chip? I know the Soviets used to grind the chip, miniscule layer by miniscule layer, but eventually the chips became too tightly packed for this to be viable, but I wonder if technological advances have been made in the last 25 years that could re-enable the technique, even if exotic technology like electron microscopes are needed to observe the chip...
Look for talks by Chris Tarnovsky, he describes these procedures on modern chips in great depth.
Is this separate from the custom instruction sets added for individual large customers but disabled for everyone else?
I'm a late viewer of this video. It's some brilliant research, however not unexpected at all.
However, CPU instruction fetchers and decoders are not static in this day an age.
So. The theoretical malformed but working opcode space is potentially much larger.
There is this theory called "Opcode knocking" or "Instruction knocking".
This hidden key sequence opens instruction decoding for a class of non-disclosed instructions which have valid and probably security implicating behavior.
This opcode space would not be visible unless the opcode key knocking sequence has been entered.
So while this helps, there is absolutely no way of knowing backdoor instructions, even with complete pertubation of the variation of opcode space.