This same method of fuzzing x86 chips for backdoors was performed by Chris Domas many years ago, by leveraging how many cycles certain instructions took to execute with all interrupts disabled you can gain insight in how much "work" is being done behind the scenes when a certain instruction is being ran. Passing random input to certain instructions and then watching how long it takes to execute the instruction is a very clever way of detecting hidden behavior or even finding hidden instructions in a given architecture.
There's also SandSifter, which fuzzes "by systematically generating machine code to search through a processor's instruction set, and monitoring execution for anomalies."
Kind of, but not exactly the same. Chris Domas has done two similar things. 1) Sandsifter, which looks for undocumented opcodes, although it primarily used access violations for detection rather than a timing side channel. 2) His research on readmsr, where he used the TSC (time stamp counter) to detect whether reading a certain MSR did something covertly in microcode. Tavis' research in this project is similar but distinct in that it seems to only focus on valid opcodes, and use other performance counters than just timing.
This reminds me of an old Defcon talk (I think) where a guy fuzzed CPUs to discover undocumented instructions. The way he did it was exploiting a quirk in the memory controller and I believe putting the instructions across a page boundary in such a way that a valid instruction would go through, but an invalid one generated a page fault. Through this he could generate instructions and compare them to the known instruction set to find the undocumented ones.
@@5555Jacker I looked him up and I didn't realise he also did the MOVfuscator, one of my all-time favourite tech talks! I also highly recommend that one to anyone that hasn't seen it.
That's how he discovered the length of an instruction. By putting the instruction at the end of the page you can adjust it byte by byte and discover whether the CPU wants to read the next page as part of the instruction or not
I just want to leave a complement on the very good animations/visuals in this video. They are very well done and intuitive and kept me engaged to the point where I actually noticed the positive impact while not being too flashy or distracting in any way.
This was *way* more interesting than I assumed. Tavis managed to bring a batch of new angles to the CPU fuzzing, by being a not-CPU fuzzer! I'm glad both you and Tavis do what you do!
Awesome way to explain the discovering process of Zenbleed. It became very intuitive (the whole bug discovery process) as you comparatively explained the fuzzing components in a software and a CPU bug. A great way to explain!
In this kind of videos, it would be also really nice to realize how many time takes to find these kind bugs. Sometimes could be just a couple of hours or a happy idea, but in most of the cases it takes several months. In any case, I love your videos :)
I think it shouldn't be considered medium at all. The PoC can catch nearly all important strings put on XMM registers by a browser just because of memcpy, strcpy and strcmp.
Sounds like security critically software could compile in oracle serialization mode to prevent these side channel attacks (at the expense of execution speed/efficiency)
This technic is already used to an extent in some modern compiler hardenize flags (introduced after Spectre/Meltdown shitstorm), however blindly disabling speculation is unexceptable from the performance standpoint. You can do this ofc if e.g. 10x loss of performance is not the issue for you, but it is better to resolve this in the CPU microcode if possible, due to binary backward-compatibility.
does it? afaiu the issue is not victim programs using speculation, its the attacking programs. its all well and good if *your* program runs absolutely 100% correctly, but if *i* can still abuse the CPU to get a side channel, youre just throwing away performance for no gain
@@jotch_7627 In some cases, the attacking program is a victim program, too. The prime example is a web browser. The web browser executes javascript code, which is not trustworthy in itself, but it is run in a sandbox that tries to prevent the Javascript code to access anything that is protection worthy. If the javascript code manages to trigger a browser (like Chrome or Firefox) to run spectre attack code (which was possible), the javascript code can read parts of browser memory it must not access (like stored passwords). So compiling Chrome or Firefox in a way that it won't speculate "too much" prevents these programs to be ab-used as spectre attackers by Javascript code. Actually, this idea is more general: Any program that processes complex untrustworthy input can by turned into a spectre attack utility by giving it maclicious input, as long as the processor does enough speculation during processing the untrustworthy input. Spectre mitigation is meant to prevent that by reducing speculation when processing untrustworthy data.
Really looking forward to a new part of this!! :) Brilliant video! 5 a.m. here and I am more awake due to this video than the last 5 hours trying to sleep. 😂
I'd be very interested in following the development of automated systems for identifying interesting performance counters. Human review can often overlook innocuous solutions, and I feel like this is JUST up the alley for interesting machine learning classification.
Phew. When I saw this video I got scared and began researching a bit. And luckily it seams to only affect Zen 2. My laptop cpu is too old so it's Zen (2 generations older), and my desktop cpu is too new so it's Zen 3 (1 generation newer) I lucked out with this one.
Also such vulnerabilities are typically possible to fix in microcode / CPU firmware. And chances are when you hear about it your system is already patched, given you update your OS regularly. For this very vulnerability AMD released fix already, but only for server platforms (Epyc, where it's a big issue) -- maybe overhead is tad too high and they hope on finding a fix that has less perf impact for consumer platforms.
Hw vulnerabilities i would argue are generally vulnerabilities at the hw/sw interface. And very few exploits of hw vulnerabilties come at it from the hw side. In general the most critical hw exploits take some advantage of inconsistancies between the hw implementation of the isa and the sw formalization of it.
Ver interesting topic! I am always more fascinated by the process. Wouldn't the serialization instructions affect the performance counters used as coverage information?
Yes, it would. But the serialized reference code isn't execute to detect "whether something interesting happens with these instructions" (what you need the performance counters for), but to provide the expected reference result of the non-instrumented code. You only use the counters to measure the non-serialized code. Also, you compare CPU state (e.g. register contents) after executing the non-serialized code and the serialized code. If there is any difference, the CPU does unexpected things without serialization.
Regarding Oracle Serialization, I don't quite understand how is it supposed to work. For example, if some information is leaked based on the timing, how you detect it with serialized code? You run the unnormalized code and measure that it takes 10-100 cycles and then you measure the serialized code and it always takes 1000 cycles. Did the unnormalized code take a variable amount of time because it was leaking a bit of information about some internal state, or was it just random chance?
yeah it's a bit tricky and I'm also not 100% sure. But checkout my RIDL video to learn how cache timing leaks are measured. You could add such measurement after the fuzzed code (this measurement code would obviously not be serialized)
Disclaimer: didn't read the writeup yet, might be wrong. If I understand correctly, the fuzzer had to generate code that somehow output the leaked information. They were checking if force-serialized code was giving the same output as code without forced serialization. In the abstract model, this is guaranteed. Any discrepancy is a bug. I am assuming they relied on just outputs, as it has to have been something guaranteed to not be affected by serialization and run-to-run variance, so they couldn't check for timings directly.
@@chedatomasz That's how I understood it too. You either output the data your instructions retrieve or otherwise use the data your instructions retrieve in further execution in some way. Any disrepancy between what the speculative execution variant does and what the serialised oracle variant does means there is by definition a speculative execution bug. You might even be able to use some of the same performance counters to measure the degree of disrepancy.
@@chedatomasz Yeah, this is also my impression of the technique, but the video makes it seem like the Oracle Serialization is supposed to catch side channel attacks, not just regular data leaks (for example, 11:31 and 12:11). This is probably a mistake in the video, unless I am missing something.
@@ruroruroI read the writeup. I think they are catching it by the macroarchitectural state (registers, performance counters etc, those guaranteed to not change by optimizations) not agreeing between the raw and oracle versions. This pointed them to vzeroupper not being optimized correctly. The fuzzers contribution ended here, and the escalation to a side channel data leak attack was manual on top of that. I guess the weeks of tuning work they refer to was choosing the elements of macroarch state that are guaranteed by the standard to not change, and they finally came across one that did when it shouldnt.
Hardware has so much security holes. Would be interesting to hack DMA instead. The DMA-chips may be programmable and can access anything. But if you can not touch them.. Maybe your videocard or hard-disk-controller can read directly from memory. Can you read a hard-disk sector from controller-cache, before it is actually read, so you can read the data from another process?
I wonder if there is a mathematical theorem (or a quasi-theorem) that shows, for a cpu and a set of instructions, that beyond a certain well-defined complexity in hardware and software, security can never be guaranteed, even in principle?
The light from above is strong, and the wall behind is quite glossy. Light color gets changed when reflecting of an object, specially in the first bounce How to fix? Get a more matte wall, change the angle of the lights, get a more matte shirt, etc
Am I the only one that thinks these CPU vulnerabilities are pretty scary? I think there should be a safe mode, where for example all the instructions are run serialized.
I think performance is overrated, at least when you are talking about critical systems like webservers. These bugs may be discovered by blackhats long before whitehats find them.
@@helgesupernova788 Then you would turn the miniscule chance of somebody else on your server accidentally stumbling over your information from the cache to making DDoS attacks far easier. You underestimate how much performance we gain from running instructions simultaneously and out of order. It has rendered the CPU clock irrelevant for comparisons. For example, the i7-6950X, a pretty old CPU measures at ~10.6 instructions per cycle. You would lose 10.6 times the performance with that 'safe mode' enabled.
Can someone explain me how performance counter works? They compare 2 cpu's with same processes and if they are much different they know which error is coming?
Thanks for covering this, very useful. AMD patched the EPYC cpus but not the desktop cpus. (ETA is December) So is the TLDR for now to use VM without network if its being used to run potentially hostile code? Publicly this is a info disclosure bug so it should be mitigatable this way. But if there is a risk of write as well, disconnecting network alone wouldn't help.
I was thinking about running the code on a simulator and compare the result to the real stuff. Without realizing they can be run without any optimization on the cpu😂
Running validation and fuzzing on the simulation could catch bugs like this, which are logic bugs. It would also be more efficient, as you could do feedback more directly. The problem would be with accurately modeling timing and other side channels in the simulator.
Of course he works for Google. I still can't believe how can Google be so big in industry yet still can't maintain such a basic service as Google Domains and sell it to another company. I'm so f*cking mad now I have to transfer ~100 domains to another registrar.
Maybe that has something to do with bubble sorting, or a bubble sorted list. Is it possible that people start talking about an attack that doesn't exist yet because they found it and cause a vulnerability? That is to say they are announcing a vulnerability before the update is released?
NEED HELP, how do I create a No clip hack on Need for speed undercover (ULES 01145) PPSSPP and create a CWCHEAT with it?, the same with (ULES 01340) Obscure The Aftermath also PPSSPP game and free camera (the camera also have no clip), thanks
This same method of fuzzing x86 chips for backdoors was performed by Chris Domas many years ago, by leveraging how many cycles certain instructions took to execute with all interrupts disabled you can gain insight in how much "work" is being done behind the scenes when a certain instruction is being ran. Passing random input to certain instructions and then watching how long it takes to execute the instruction is a very clever way of detecting hidden behavior or even finding hidden instructions in a given architecture.
uhhh interesting, thanks for sharing
There's also SandSifter, which fuzzes "by systematically generating machine code to search through a processor's instruction set, and monitoring execution for anomalies."
Chris Domas has a few blackhat talks on youtube and they are all worth a watch!
man is that guy still alive? a month or so ago i tried to find whether he's done anything recently and couldnt find anything. hope hes doing alright.
Kind of, but not exactly the same. Chris Domas has done two similar things.
1) Sandsifter, which looks for undocumented opcodes, although it primarily used access violations for detection rather than a timing side channel.
2) His research on readmsr, where he used the TSC (time stamp counter) to detect whether reading a certain MSR did something covertly in microcode.
Tavis' research in this project is similar but distinct in that it seems to only focus on valid opcodes, and use other performance counters than just timing.
This reminds me of an old Defcon talk (I think) where a guy fuzzed CPUs to discover undocumented instructions. The way he did it was exploiting a quirk in the memory controller and I believe putting the instructions across a page boundary in such a way that a valid instruction would go through, but an invalid one generated a page fault. Through this he could generate instructions and compare them to the known instruction set to find the undocumented ones.
That must be Chris Domas you're thinking of. His work is outstanding.
@@5555Jacker I looked him up and I didn't realise he also did the MOVfuscator, one of my all-time favourite tech talks! I also highly recommend that one to anyone that hasn't seen it.
That's how he discovered the length of an instruction. By putting the instruction at the end of the page you can adjust it byte by byte and discover whether the CPU wants to read the next page as part of the instruction or not
I just want to leave a complement on the very good animations/visuals in this video. They are very well done and intuitive and kept me engaged to the point where I actually noticed the positive impact while not being too flashy or distracting in any way.
This was *way* more interesting than I assumed. Tavis managed to bring a batch of new angles to the CPU fuzzing, by being a not-CPU fuzzer! I'm glad both you and Tavis do what you do!
Fascinating story! Thanks for walking us through the discovery process 😊
Awesome way to explain the discovering process of Zenbleed. It became very intuitive (the whole bug discovery process) as you comparatively explained the fuzzing components in a software and a CPU bug. A great way to explain!
the bounce light from your shirt is crazy ^^
As you said, brilliant methodology to get to that bug. Thanks for the video and Tavis interview.
You know the guys brilliant when he speaks at 250 wpm
Updated my BIOS after watching this video. It put everything into perspective for me how serious this vulnerability is
But the fix would cost performance
@@saadahmed688 The fix you're talking about is different as that disables a CPU feature. The AMD microcode was never said to affect performance
Super cool! Also I love the rollercoaster pic lol.
Fascinating, you are hands down my favorite content creator and matching my intrest perfectly :)
In this kind of videos, it would be also really nice to realize how many time takes to find these kind bugs. Sometimes could be just a couple of hours or a happy idea, but in most of the cases it takes several months. In any case, I love your videos :)
Great video, nicely summarized, wonderful guests, I also like the new background.
This is indeed very interesting. Love getting to know more about the development behind finding bugs
I think it shouldn't be considered medium at all. The PoC can catch nearly all important strings put on XMM registers by a browser just because of memcpy, strcpy and strcmp.
It's still just a data leak, not root escalation or code injection. On a client system, there are easier ways to steal browser secrets.
Sounds like security critically software could compile in oracle serialization mode to prevent these side channel attacks (at the expense of execution speed/efficiency)
This technic is already used to an extent in some modern compiler hardenize flags (introduced after Spectre/Meltdown shitstorm), however blindly disabling speculation is unexceptable from the performance standpoint. You can do this ofc if e.g. 10x loss of performance is not the issue for you, but it is better to resolve this in the CPU microcode if possible, due to binary backward-compatibility.
does it? afaiu the issue is not victim programs using speculation, its the attacking programs. its all well and good if *your* program runs absolutely 100% correctly, but if *i* can still abuse the CPU to get a side channel, youre just throwing away performance for no gain
@@jotch_7627 In some cases, the attacking program is a victim program, too. The prime example is a web browser. The web browser executes javascript code, which is not trustworthy in itself, but it is run in a sandbox that tries to prevent the Javascript code to access anything that is protection worthy. If the javascript code manages to trigger a browser (like Chrome or Firefox) to run spectre attack code (which was possible), the javascript code can read parts of browser memory it must not access (like stored passwords). So compiling Chrome or Firefox in a way that it won't speculate "too much" prevents these programs to be ab-used as spectre attackers by Javascript code.
Actually, this idea is more general: Any program that processes complex untrustworthy input can by turned into a spectre attack utility by giving it maclicious input, as long as the processor does enough speculation during processing the untrustworthy input. Spectre mitigation is meant to prevent that by reducing speculation when processing untrustworthy data.
I totally agree, thanks for sharing this fascinating story!
The zenbleed article by the authors is genius and easy to understand! I can't wait for your video on this
I like the red aura surrounding you using this background
9:00 those damn threats, threating us the moment they aren't in halt states
Really looking forward to a new part of this!! :) Brilliant video! 5 a.m. here and I am more awake due to this video than the last 5 hours trying to sleep. 😂
You explain stuff so good! You're excellent 🎉🎉🎉🎉
Excellent video about someone with a different approach coming to a new field. Pretty epic.
Great work, quite interesting
Awesome video, thank you 👍
Wo wow wow. So interesting!🌟
I'd be very interested in following the development of automated systems for identifying interesting performance counters. Human review can often overlook innocuous solutions, and I feel like this is JUST up the alley for interesting machine learning classification.
thanks for realizing this explanation in itself was interesting and sharing it, very interesting indeed
t-shirt so wrinkled it glows xD
love your work man, thank you very much for enlightening our knowledge.
The best : Europapark xD
Awesome content! Thx a lot!!!
Beautiful studio.
Thank you for this video
Phew.
When I saw this video I got scared and began researching a bit. And luckily it seams to only affect Zen 2.
My laptop cpu is too old so it's Zen (2 generations older), and my desktop cpu is too new so it's Zen 3 (1 generation newer)
I lucked out with this one.
this vuln is only scary when multiple people share a cpu. if someone can execute code on your laptop you are most likely screwed anyway already
Also such vulnerabilities are typically possible to fix in microcode / CPU firmware. And chances are when you hear about it your system is already patched, given you update your OS regularly.
For this very vulnerability AMD released fix already, but only for server platforms (Epyc, where it's a big issue) -- maybe overhead is tad too high and they hope on finding a fix that has less perf impact for consumer platforms.
Hw vulnerabilities i would argue are generally vulnerabilities at the hw/sw interface. And very few exploits of hw vulnerabilties come at it from the hw side. In general the most critical hw exploits take some advantage of inconsistancies between the hw implementation of the isa and the sw formalization of it.
Ver interesting topic! I am always more fascinated by the process. Wouldn't the serialization instructions affect the performance counters used as coverage information?
Yes, it would. But the serialized reference code isn't execute to detect "whether something interesting happens with these instructions" (what you need the performance counters for), but to provide the expected reference result of the non-instrumented code. You only use the counters to measure the non-serialized code. Also, you compare CPU state (e.g. register contents) after executing the non-serialized code and the serialized code. If there is any difference, the CPU does unexpected things without serialization.
the sandsifter project does cpu fuzzing, very interesting defcon talk about it.
Regarding Oracle Serialization, I don't quite understand how is it supposed to work.
For example, if some information is leaked based on the timing, how you detect it with serialized code? You run the unnormalized code and measure that it takes 10-100 cycles and then you measure the serialized code and it always takes 1000 cycles. Did the unnormalized code take a variable amount of time because it was leaking a bit of information about some internal state, or was it just random chance?
yeah it's a bit tricky and I'm also not 100% sure. But checkout my RIDL video to learn how cache timing leaks are measured. You could add such measurement after the fuzzed code (this measurement code would obviously not be serialized)
Disclaimer: didn't read the writeup yet, might be wrong.
If I understand correctly, the fuzzer had to generate code that somehow output the leaked information. They were checking if force-serialized code was giving the same output as code without forced serialization. In the abstract model, this is guaranteed. Any discrepancy is a bug.
I am assuming they relied on just outputs, as it has to have been something guaranteed to not be affected by serialization and run-to-run variance, so they couldn't check for timings directly.
@@chedatomasz That's how I understood it too. You either output the data your instructions retrieve or otherwise use the data your instructions retrieve in further execution in some way. Any disrepancy between what the speculative execution variant does and what the serialised oracle variant does means there is by definition a speculative execution bug. You might even be able to use some of the same performance counters to measure the degree of disrepancy.
@@chedatomasz Yeah, this is also my impression of the technique, but the video makes it seem like the Oracle Serialization is supposed to catch side channel attacks, not just regular data leaks (for example, 11:31 and 12:11). This is probably a mistake in the video, unless I am missing something.
@@ruroruroI read the writeup. I think they are catching it by the macroarchitectural state (registers, performance counters etc, those guaranteed to not change by optimizations) not agreeing between the raw and oracle versions. This pointed them to vzeroupper not being optimized correctly. The fuzzers contribution ended here, and the escalation to a side channel data leak attack was manual on top of that.
I guess the weeks of tuning work they refer to was choosing the elements of macroarch state that are guaranteed by the standard to not change, and they finally came across one that did when it shouldnt.
Great!
tavis is a legend
If only our manager do not have the "Phd. in computer scient" R.I.P.
Hardware has so much security holes.
Would be interesting to hack DMA instead. The DMA-chips may be programmable and can access anything.
But if you can not touch them..
Maybe your videocard or hard-disk-controller can read directly from memory.
Can you read a hard-disk sector from controller-cache, before it is actually read,
so you can read the data from another process?
I wonder if there is a mathematical theorem (or a quasi-theorem) that shows, for a cpu and a set of instructions, that beyond a certain well-defined complexity in hardware and software, security can never be guaranteed, even in principle?
I wonder if this can be used to leak information from below ring 0, SMM and such.
Can you cover CVE-2023-39848?
Why is there no add for hextree?
Did you find any bugs in the rollercoaster?
Why is your shirt glowing
The light from above is strong, and the wall behind is quite glossy.
Light color gets changed when reflecting of an object, specially in the first bounce
How to fix?
Get a more matte wall, change the angle of the lights, get a more matte shirt, etc
he's glowing agent
Push!
See how amazing is this ?
Why is there advertisement in the top right
The video is sponsored by Google
Is fuzzing just brute-forcing inputs in a special way?
Lets gooooo
Am I the only one that thinks these CPU vulnerabilities are pretty scary? I think there should be a safe mode, where for example all the instructions are run serialized.
Wouldnt that absolutely annihilate performance? And most cpu exploits are fixed quickly via cpu microcode and/or in future cpu generations
I think performance is overrated, at least when you are talking about critical systems like webservers. These bugs may be discovered by blackhats long before whitehats find them.
@@helgesupernova788 Then you would turn the miniscule chance of somebody else on your server accidentally stumbling over your information from the cache to making DDoS attacks far easier. You underestimate how much performance we gain from running instructions simultaneously and out of order. It has rendered the CPU clock irrelevant for comparisons. For example, the i7-6950X, a pretty old CPU measures at ~10.6 instructions per cycle. You would lose 10.6 times the performance with that 'safe mode' enabled.
Sure, set your CPU to use only 1 core on the motherboard settings.
@@notalostnumber8660 That's not what we're discussing here. CPUs can delegate multiple instructions to multiple logic units in one cycle.
Can someone explain me how performance counter works? They compare 2 cpu's with same processes and if they are much different they know which error is coming?
why is your shirt glowing!!!!
Shirt : RTX ON
That’s amazing. As a software guy this excites me that hardware vulnerabilities could be not so far from my realm. Tavis is legend. ❤
Isn't sandsifter a hardware fuzzer?
for the algorithm
Discribe the sticker on your laptop
Thanks for covering this, very useful. AMD patched the EPYC cpus but not the desktop cpus. (ETA is December) So is the TLDR for now to use VM without network if its being used to run potentially hostile code? Publicly this is a info disclosure bug so it should be mitigatable this way. But if there is a risk of write as well, disconnecting network alone wouldn't help.
I was thinking about running the code on a simulator and compare the result to the real stuff. Without realizing they can be run without any optimization on the cpu😂
I'm not a hardware guy either, but I wonder if fuzzing somehow VHDL/Verilog simulations would also tell us anything ...
Running validation and fuzzing on the simulation could catch bugs like this, which are logic bugs. It would also be more efficient, as you could do feedback more directly. The problem would be with accurately modeling timing and other side channels in the simulator.
That’s an active research problem
Fuzzing simulations are done in the industry, but they are like 1000x slower than done on actual silicon.
Soo most performance counters are just debug tools left in a production product.
Will the minecraft series ever return ?
Of course he works for Google. I still can't believe how can Google be so big in industry yet still can't maintain such a basic service as Google Domains and sell it to another company. I'm so f*cking mad now I have to transfer ~100 domains to another registrar.
i think it'd come down to bad management and leadership
Maybe its a stupid question but how does this translate to VMs ?
Why is your t-shirt bleeding onto the background?
Maybe that has something to do with bubble sorting, or a bubble sorted list. Is it possible that people start talking about an attack that doesn't exist yet because they found it and cause a vulnerability? That is to say they are announcing a vulnerability before the update is released?
Auto-magically lol
Hello sir, we love you very much, but we have a request. It would be great if your editors could write the article for these videos.
AMD haters ..l..
This is not a new approach at all it’s be used actively by many
can you share more about that? Any talks, papers or blog posts about it?
Second ❤
First!
Third
NEED HELP, how do I create a No clip hack on Need for speed undercover (ULES 01145) PPSSPP and create a CWCHEAT with it?, the same with (ULES 01340) Obscure The Aftermath also PPSSPP game and free camera (the camera also have no clip), thanks
xd