😮 Very tempted by this assembly course. I’ve done a bit of assembly in some really low-level optimisation work (comparing what different Rust functions compile to), very nice very cool
people that figure this stuff out are so amazing. like I understand it, after you explain it, and am like "yep I get it," but I could never actually figure it out beforehand or even consider that it exists.
@@c.ladimore1237 I’m not claiming that it is easy by any means, but these people spend everyday searching for bugs like these. Surely, at some point, they develop some kind of intuition.
@@c.ladimore1237 I don’t professionally find exploits, but I have found unique ways of using things in unintended ways. My understanding is exploits like this are either people looking at how things work and being like “wait, that means theoretically it will do this thing too” or people being like “I wonder if it will also do this thing too” and trying it. So to me, it seems more akin to educated experimentation with the scientific method, while software development (although there is experimentation) is more akin to writing a book.
Every time I hear the phrase 'speculative excution', I am reminded of what a late friend of mine used to say: "CPU designs should never incorporate speculative execution or branch prediction. They will inevitably lead to security vulnerabilities." He was also a big fan of the ARM architecture, because it did not use to do this thing. He passed away about fifteen years ago, but as it turns out he was right...
Only in architectures where it was added long after the instruction set was finalized. The problem is not that CPUs have speculative execution, but that the 8080 they're based on didn't.
@@darrennew8211 Not true. The ARM ISA is not based on the 8080 architecture and now also seems to suffer from it. My friend was very adamant about this at the time, that this would not be restricted to architectures that weren't built around it.
@@juhotuho10 That is the counterargument that I put before him all those years ago and I was treated to a lecture about why the benefits could never outweigh the costs and why especially in multiprocessor/multicore systems this would lead to all kinds of security vulnerabilities. And he pointed out exactly the kind of security vulnerabilities that were discovered in the past decade or so.
@@juhotuho10 It brings huge performance benefits if your architecture is such that it pretends to execute one instruction at a time in order. You don't need it if your instruction set is designed from the ground up to keep every computational unit busy all the time. You need it because you execute one load instruction then one add instruction and then one multiply instruction then one store instruction and expect the CPU to behave like it's not doing all that in parallel.
My God. I guess time to check off "security vulnerability found in something you worked on" off my bucket list. I was an intern at Arm, on the team that worked on MTE. I did some work around the generation of the tags, and on simulating the overhead they would have in caches and memory. I have such mixed feelings right now. :D This seems like something we could have thought of. Meltdown and Spectre were fresh on our minds and a major topic of discussion in the company. I can imagine an alternate universe where I told my manager (or someone else on the team) "hey, have we thought about if tag mismatches could be a cache side channel?" Yet I don't think we ever discussed anything related to this? At least not in any of the meetings I was in. But hindsight is 20/20. In retrospect, these things always seem obvious. We were mostly focused on minimizing the performance overhead of memory tagging, because we were worried it would get in the way of adoption. We wanted our new optional security features to be supported by hardware manufacturers, who might not be happy with there was too much perf or memory overhead, extra hardware complexity, or cost / die area increase. Though, I guess, despite this new vulnerability, it still delivers on its goals. MTE was supposed to be something that offers substantial security improvements for cheap. A "better than nothing" optional feature which, when enabled, has a good chance of catching some bugs that might not be found otherwise. It is probabilistic: even if it worked perfectly, there is still a small chance a memory bug might go undetected by it (if different allocations happen to be assigned the same tag by chance). It was not meant to be perfect, or any sort of bulletproof defense. Just a way to hopefully catch more bugs in the wild. If a vulnerability makes it less effective, that's still better than every other CPU that does not have something like MTE at all.
It has its value as a hardware address sanitizer. I used it on C code within an Android App on the Google Pixel 8, which supports MTE, and it helped to figure out and fix a hidden memory management bug (a use after free).
@@olafschluter706 Yep. "Hardware ASAN" is pretty much how we thought of it when designing it. The motivation for MTE was "imagine ASAN but with low enough overhead that you could deploy it in release/production builds and just enable it everywhere, and hopefully also catch bugs in the wild instead of just during development."
@@inodedentry8887 yeah. Arm have said that the tags aren't secret. The video is somewhat misleading. Not all arm CPUs have mte and it isn't used much it seems
@@HayesHaugen I think if it helps to catch memory management bugs, it helps to reduce the attack surface and the number of possible exploits of software checked by it.
@@HayesHaugen Yeah well, there are several different benefits to MTE. "Stopping an attacker" is certainly one of them, but IMO not even the most important one. There should hopefully be many other security measures on the system, and MTE would only be one piece of the puzzle. Given this new exploit, this use case has been compromised. A determined attacker will be able to bypass MTE. But, like I said, "Hardware-accelerated ASAN" is the other major use case. Traditional compiler ASAN is very slow and only really useful as a dev tool. The idea is that you can ship your production/release software with MTE, and, at an almost-negligible perf cost, the user's CPU will validate all memory accesses and catch memory safety bugs out in the wild for you. If those bugs are reported back to the developers (probably via some automated crash report system), they can hopefully be fixed, before anyone has even tried using them for an attack. In this sense, MTE can be a valuable tool to discover bugs in production software. That in itself is a pretty big benefit. It can help software be more secure in general, regardless of whether anyone is actually trying to exploit anything.
Except neither were backdoors. In the first case it's just a standard buffer overflow bug, except because you're running directly in ring -3 there's no ASLR to save you. The ARM bug is actually a feature that speeds up the CPU, which is good, but accidentally was implemented wrong. The difference is that buffer overflows can be patched by a software update (if you haven't downloaded the UEFI security update please do so right now), but a bug in the CPU itself means you need a new CPU.
I am a (retired) professional programmer. I never wanted my programs to run as fast as possible. I wanted them to run as reliably as possible, i.e. rock-solid reliably. I have seen countless examples of programmers being led astray by the siren song of premature optimization.
It depends. ARM processors are often used in embedded devices with few resources and hard real-time requirements, and programs that are not as efficient as possible may not be appropriate.
@@NoSpeechForTheDumb This is a hard blanket statement to make because a lot of embedded systems will prefer stability over speed. You don't want life critical systems failing due to software bugs that can be mitigated at compile time.
@@TheMixedupstuff there are some instances of embedded systems where reliability is most important, of course. That's why I said it depends. The blanket statement was made by OP who said he ALWAYS wants his systems as reliable as possible when for some applications this may not be necessary or possible.
The saddest thing is that even though it is possible to write both robust and fast software, the capitalist system, that we have now basically everywhere, disincentivizes that by putting "make the most money" above all else, especially something that requires spending more time and effort. In other words, businesses like easy money, and making robust and fast software is not easy.
Great breakdown! Not surprised to see that speculative execution is causing vulnerabilities on more than just x86 - really feels like it was only a matter of time before something like this was uncovered. The way it was done, though, is absolutely wild.
@@alexturnbackthearmy1907 Not doing speculative execution isn't really an option though... That would cause a FULL pipeline stall after every branch. And not doing prefetching is even worse. Complex problems require complex solutions and those oversights are sadly the cost of that. We can only hope that most things are found and fixed before they can turn into widespread exploits in the wild or hope for memory to suddenly get 1000x faster without any other downsides.
@@Momi_V Eh, if thing were actually done the right way, we wouldnt have this conversation whatsoever. At least there is hope that they dont throw it under the rug (just like "superior" windows ARM hardware which isnt really).
@@alexturnbackthearmy1907 modern cpus without any branch prediction wont stand a chance in terms of performance to one that has all mitigations enabled, even the non applicable ones
With the speed of modern processors, who cares if we turn off the MTE functionality. Why inject a process that has a vuln. Surely there are other ways to sandbox for bugs during development.
CPU vulnerabilities usually need relatively low hardware access in order to work. But when I heard you saying somebody managed to exploit it from within V8 (being a web dev) it literally just hit me - We're f**d. JS isn't as much of a toy these days. You can easily manipulate raw binary data in JavaScript. Some more tinkering and this would easily escalate to a sandbox escape and really, really low-level code injection... From within a browser...
@@theairaccumulator7144 Yes, but in general, first you have to escape the sandbox, then find a a way to execute your code in something like a shell, and then gain admin access. The paper covered in this video describes how it was done all in one step.
also: does web assembly still exist? This is lower level than js so it should be more easy to predict which wasm instruction transpiles to native machine code, making side-channel attacks even easier & more reliable then using js.
This reminds me of PAC introduced in iOS 14 that made jailbreaking very difficult. Eventually a couple Chinese researchers found a way to sign the pointers themselves to bypass it, but I still was fascinated enough by it that I did a college presentation on it in my computer architecture class.
Found Ed thru John Hammond, but since John doesn't seem to do vids that aren't just straight ads anymore, I'm excited this is still here to learn from. Thank you, sir!
You have in my opinion some of the best content over hosted on UA-cam. If this existed in 2004 my early programmer self would have had a much easier time learning how to exploit for fun ;).
IA64 had a ton of problems, but I really believe that explicit speculation was a great idea. So many of these attacks would be impossible on Itanium. (Insert joke about them not being attacked because no one used them)
@@deusexaethera IA64 puts the work of avoiding problems due to parallel execution in the hands of the compiler. I.e., no mechanism to back out unexplored paths like with speculative execution. The idea was to run the CPU fast and loose, and just force compiler writers to deal with the burden to take advantage of full speed. Problem is, there are lots of languages and compilers, and not everyone wants to incorporate this stuff into code generation, and not everyone is good at it.
@@MadsterV more correctly, the CPU didn’t hard-code any of the behaviors: the pathways existed in similar ways to x86, but required explicit control via ultra-wide instructions (VLIW architecture) which meant explicit, multi-instruction parallelism. In some ways, this arguably complicated the CPU as it made instruction parsing many times more complicated; on other archs those features would run mostly on autopilot while the instructions remained easy to parse and prevent collisions/weird behavior.
spec. execution is not only about filling up the cache to be ready, it can actually execute part of the code in different execution units but later either keep or discard the results depending on the path taken
OK, interesting, but this is a way to defeat a secondary defence. The program still has to contain an exploitable memory corruption in the first place. I think describing it as an unfixable bug is to some extent click-bait.
@@sylviaelse5086 I agree. It's also not close to EVERY ARM CPU. Only newer Cortex-A CPUs, no M devices at all. Seems like a bad bug, but color me underwhelmed after that title.
Given how many "unfixable bugs" have been found and viola, fixed in one way or another, yeah, clickbait. Clickbait doesn't win subscriptions, it wins unsubscriptions.
Sending my appreciation. Sometimes when searching for work you have a not so wonderful interview for various reasons including just forgetting a term you couldn't recall in a moment. Sometimes a few can affect your mental health especially if not handled with understanding that it has nothing to do with your worth. I had known and worked with assembly. I had known and worked with memory, pointers, understanding buffer overflows, operating systems, and so on building up to a good, extensive software engineering mastery, ethics, and leadership. All of the concepts you mentioned as part of my education. I felt so let down as it seemed no one cared that I knew this stuff and it made me question if I should have specialized in a different path (CE, CS, EE even, physics, etc) when feeling like things weren't working out. I was lifted up as I could follow everything you noted and that I was able to see how worthwhile my time and degree were at my university. I just mean to say I appreciated so much having a reminder when you feel a job struggle to see that you have value and no one can take that away, including in this small way like having an education even if no one is acknowledging it yet. 🙌🏾
The crazy thing about this attack is that the basic concept isn’t even that complicated but being able to come up with that idea in the first place and really pull this off in real life is still impressive and mind blowing.
Who would've thought that doing insane things just so you wouldn't have to admit to yourself that Moore's Law has been dead for a lot longer than people imagine would've caused so many security issues?
Spectre broke literally nothing. It was a hype wave that lingered for a couple weeks and went away. Nothing ever was heard about any hacks exploiting it after. I expect the same is going to happen to this bug too.
Access to leaked tags doesn't ensure exploitation. It simply means that an attacker capable of exploiting a particular memory bug on an affected device wouldn't be thwarted by MTE.
But since this re-opens the door for buffer overflows, which after all is the most commonly found attack vector, we're basically back to square one. If someone finds an exploitable buffer overflow bug in the V8 sandbox, then you're looking at unprivileged code execution, which can be problematic enough. If someone finds one in both V8 and a kernel call then you have complete device pwnage. This smells a lot like how the PS3 was pwned.
@@andersjjensen or uglier, crash-o-matic, one runs into race conditions if the software didn't return a clean abort. Still, code should be able to work around, like all of the other "unfixable bugs" over the years. I am Pentium of Borg, you will be approximated.
The door was never "shut" to buffer overflows by MTE, its a second line of defence, and to breach it you still need a memory vulnerability in a target program (which MTE in this specific case will never catch anyway, its not designed to be perfect) and an incredibly niche one at that for this exploit. Problems like this can be better prevented when we move towards safer languages for userspace like rust and the lot. As is usual with security, you cant rely on any one countermeasure, you need defense in depth.
I suspect we're heading towards a fundamentally unpatchable, ubiquitous and catastrophically effective exploit that forces us to fundamentally re-think chip design. With software moving faster than hardware this has always be inevitable but it's still crazy to think this is probably coming in my lifetime.
@@mfaizsyahmi If an r0 exploit can for example manipulate any memory, nothing running on that system is secure, at any level. Not rust, not other drivers, literally every computer state can be manipulated - the entire stack even the bios.
@@74Gee A vulnerability is not automatically an exploit. If your computer only ran rust programs compiled with a trusted compiler, the chance of an r0 vulnerability leading to an exploit would be drastically reduced. Similarly, if I had a fully secure interpreter I could run untrusted interpreted programs on a CPU architecture without any hardware/firmware security features at all and still be secure. Ergo any hardware vulnerability can theoretically be patched in software, with a certain performance penalty. In practice, any sufficiently severe exploit could take down the internet causing untold damage.
It's a classic side-channel attack, more exactly a timing attack. It's pretty well-known in cryptography. Nice work, in a way. That's hardly a bug, but I suppose the title is more catchy.
Somewhere I read and/or saw John Hennessy and David Patterson. They discussed the limitations of current processor designs, emphasizing that security vulnerabilities like Spectre and Meltdown, as well as diminishing performance returns, stem from reliance on techniques such as speculative execution. They propose a shift towards domain-specific architectures (DSAs) and processors capable of executing high-level language constructs directly. This approach would enhance security, performance, and energy efficiency by reducing the need for complex compiler translations and leveraging the open-source ecosystem for rapid innovation. But then legacy support as we have it now digging back to the 70s would be hard to maintain .. ;)
Thank you for your vids. Any update on that php vulnerability? Couldn't find further info on the details of it, beyond being related to language/encoding.
AI will destroy this world. Not humanity. The surge in mediocrity and destruction of novelty brought on by AI will destroy everything humanity has worked tirelessly to create. AI won't be terminator. It will be an invisible drain on society until every product from the US to China is so dumbed down it might as well be trash.
Thank you, nice and simple. Not so much about the hack, but rather the details of the limitations of the hardware implementation. We need better hardware developers. Which is to fire the crappy software developers. What a wasted effort, on the part of ARM, in the realm of address security. So remove the tag, remove the interrupt, or remove the look forward. We should quit worrying about speed, and actually do the job that is required. But no, OMG we used 4 bits more than before, we used 3 clock cycles. I believe in perfection before speed or space. Anyway thank you so much for the details that you supplied, I really enjoyed your talk. Keep up the good work.
Damn this is such a good video, thanks for explanation. I have only recently started learning stuff abt comp architecture and security and this video is still explaining the paper in the most crystal clear way possible that even I understood it.
The existence of these kinds of bugs reinforces why most of these hardware security features are often not worthwhile. Making all these "secure enclaves" "secure boot" and such are all just waiting to be exploited and broken, and the fixes just make it even more complicated or slow. In the past we had viruses and such but at least that was just software that could be fixed with patches and at most reinstallation. Now we have hardware that will be perpetually flawed, and even closing some of the bugs through microcode updates might not be 100% effective. Now we have to live in fear that something has permanently exploited our systems because the hardware itself is breakable.
Yup, overthink the plumbing making it easier to stop up the drain - to paraphrase a particular engineer. I think you hit a key point that these are permanently baked-in features. Zero day one of these and let the fun begin! 😲
@@meltysquirrel2919 speculative execution specifically has such a massive impact on performance that not doing it just ain't an option. It was to the point where users would go out of their way to disable spectre/meltdown patches and see a *significant performance increase* until the patches were improved. And it's not like speculative execution was disabled, it was just reduced. And even that was noticeable enough to be a concern. So yeah, in a case like this, the plumbing is simply complex, no way around it. That's just how computers are at the lower levels. You aren't piping a sink to a drain. You're piping a thousand sinks to a thousand drains in real time according to a set of given instructions. And as it turns out, it isn't easy. And the incentive for breaking that plumbing is massive, so a lot of people are working on doing so. The end result is what we're seeing here. Complex plumbing getting broken by people with massive interest in doing so. Fun...
There's always a trade-off. You can have a simple, provably safe hardware architecture if you're willing to accept an arbitrary performance impact in return. You can have fast, secure hardware if you're willing to pay significantly more for overprovisioned hardware. You can run on insecure hardware with no risk if you airgap your system, drastically crippling its usefulness. Sure, you can cut out a feature you think is unsafe. But what are you willing to sacrifice in exchange - security? Performance? Flexibility? Compatibility? The tradeoffs are where the real engineering happens.
All those hardware vulnerabilities require a software vulnerability first. That software vulnerability would still exist even if the hardware had no security measures to speak of. At worst, the hardware security features do nothing and lull you into a false sense of security. However, they never directly decrease security. Persistent (against re-install) viruses can only be stopped if you make the firmware read-only at a hardware level. That is one area where I agree with your assessment. A little toggle switch to write-protect all firmware would go a long way. Then if you think the hardware security does more harm than good you can still permanently disable firmware updates making persistence impossible.
Reminds me of my introduction to Java. How to get rid of most security holes? Bounds checking. References, not pointers. Fantastic I thought. Security built into the virtual machine! But here we are literally decades later and we're still in the C/C++ paradigm. Billions of dollars a year this costs, yet we're unwilling to abandon thinking in terms of pointers and unwilling to make things like runtime bounds checking mandatory.
@@denysvlasenko1865 ur paying the performance impact with the cpu trying to fix ur mistakes, bounds checks can also be compiled away in a lot of cases also i feel like u made up the 3 to 10 times number, if the bounds checking always succeeds then isnt branch prediction just gonna be always right and u would have no impact? in hot loops at least
Pro tip: show hex values (like pointers with embedded info for tags or virtual memory) in a monospaced font. Programmers can visually parse the fields much more easily. Thanks.
I find amazing that the people can speak about such advanced subjects, while I try simple to fit an excess 127 code for a normal overflow fix in a vhdl dsp fpu unit. My God, where do you have the time to read these subjects?
0:09 You know that there's three computers in the term "ARM computer"? First, the obvious "computer". Second, "ARM" stands for "ACORN RISC Machine", "Machine" referring to a computer. Third, "RISC" stands for "Reduced Instruction Set Computer", revealing the third computer. Almost blew my mind when I first realized that XD
Arm no longer stands for anything. It stopped standing for Acord and moved to Advanced RISC Machine in the mid 90s. And in 2017 moved from ARM to Arm. (Source: I'm and employee.)
I think calling speculative execution "execution in the future" is misleading as it conveys they idea of a "front-running thread", which is a very distinct and different thing. The processor simply runs a program and if it needs to make a branch/turn and does not know which way to go, it speculates. To keep a proper program state, this speculative execution cannot do certain things, but once the speculation is confirmed to be correct, the accumulated speculated results can be committed. From the processors perspective running the program, it's just execution current code, just of a speculated branch. There is of course a lagging program-state that represents the validated non-speculative outcomes. It can restart from this state when the speculated code turned out to be the wrong code and resume with the correct code instead. A processor is thus not "executing future code". It might run the wrong code and discard the results, but it's not running ahead of the actual program. That is a lot less mystic and magical to me.
@@be8090 From what I could gather from Wikipedia, it’s in ARM Cortex X2 through X4, which means ALL Android-based smartphones of 2024 and 2023 and a good number of Android-based smartphones from 2022 (especially Samsung Galaxy S22 and co.). Note: usually only the performance cores are X2 or later. Interestingly, MTE was introduced with ARMv8.5-A (so really all architecture revisions from 8.5-A through 9.4-A have MTE (though 9.0-A is really just 8.5-A with additional features); whether this bug was ever patched in any of the later revisions, I do not know). This means MTE has been on Apple A-series SoC since A14 Bionic and on EVERY Apple M-series SoC since the first. This means for Apple smartphones *and tablets,* it’s been present since iPhone 12, 3rd gen iPhone SE, 10th gen iPad, 4th gen iPad Air, 6th gen iPad Mini and 5th gen iPad Pro. For Macs, it’s been present since 2020 for MacBook Air, MacBook Pro and Mac Mini, 2021 for iMac, 2022 for Mac Studio, 2023 for Mac Pro and 2024 for Vision Pro.
This is fundamentally similar to a hash collision exploit, so the solution is the same. Increase the entropy on the memory tags so that the reuse is practically impossible.
increasing the entropy on a random generator that can only generate only 16 distinct tags? Not possible. To make it practically unexploitable, the randomized tags should be significantly longer, and this means decreasing the significant length of 64 bit pointers, meaning you need to decrease the maximum size of the usable virtual memory space. On a phone where you can only have about 128GB memory (including virtual memory and the kernel space and I/O space), only 37 significant bits are needed for virtual space, so tagging addresses is possible with up to 27 bits (instead of just 4 in the hardware implementation on MTE in ARM8 chips). The problem is that the MTE hardware was too cheap and does not preserve at least 1 bit of its tag to segregate the cache use for the kernel and the user space. A basic fix that will at least prevent MTE exploit would be for MTE to not assign pseudo-randomly the 4 bits" in its tags, but reserve 1 bit with constant (set it to 1 i.e. 8 distinct MTE tags for the kernel or other rings such as external drivers, or security tools, or GPU internal work buffers, and 8 distinct tags left for user space, e.g. in Javascript v8). This reduces the memory hardware corruption detection in each ring to 7/8 instead of 15/16, but other asynchronous detection is still possible by software (e.g. to detect heap corruption, or stack-allocated buffer overflows caused by software bugs and unchecked boundaries). Segregation of cache usage is the ultimate solution to avoid escaping the sandboxing. Now ARM should think about allowing tags to be larger (and if needed, to use additional internal bits for tagging pointers with for example a 80-bit address space, with help of a supplementary cache for virtual pages tags) and making sure it still uses a strong enough random generator for these much larger tags. Users may then complain that a 128GB virtual space is "not enough" for their smartphones (and their embedded GPUs), but when this will be really significant, enough years will have passed so that we may see the end of 64-bit processors, replaced by 80-bit or 128-bit processors (at that time, speculation execution will not longer be a problem as the "cache vulnerabilities caused by speculative execution" will not longer be exploitable in any reasonable time to cover the very huge cache-tagging space. So may be all these is the sign that the computer industry must think about upgrading their hardware to 128-bit processors (even if the usable significant virtual memory space will still, for long, remain largely below the 64-bit limit). At that time MTE-like tagging will be extremely useful and in practice extremely secure (especially for mobile devices).
There's a lot of 'IF's in there. If you can find the right code, if you can find the tag , if you can change it, if... if.. if... Whilst this is a possible route for an attack has anyone actually used this in the real world, not just in the research lab.
Speculated execution was always pandoras box. This is quite clear after Spectre and Meltdown. Its damn hard for chip designers and ISA designers to do it 100% correct.
@@BrendonGreenNZL Yes true. My statement above is not precise enough. Spectre lives from the behavior of the cache itself in combination with speculative execution and branch prediction.
Oh man, this is an amazing bug. I was there in the 90's when "smashing the stack" hit. It was above my pay-grade at the time, but it was clear in the late 90's that you could get wrecked by a few bad bytes on the wire. Overflow after overflow into the new century, race conditions all over kernels, you sure you want a multi-user system? Nowadays, multi-tenant systems suffer similar problems with any shared resources. You really can't have everything in once package.
It seems like this is very similar to PACMAN except that paper breaks pointer authentication code instead of memory tag. Both takes the approach of brute forcing a 16-bit secret by abusing speculation.
Not *EVERY* ARM cpu! I moved into developing 32 bit asm on the ARM2 and even had a go at an original ARM1 BBC Micro cheese wedge which never was really a product, just a dev system. I can categorically say that this exploit will not work on either of those CPUs as they had exactly zero kilobytes of cache :) With 4k cache on the ARM3, and a 24/26 bit address bus and processor status stuffed into the remaining 6/8 of 32 bits... I still think you'd find it impossible.
it's another "we speculated, rewound and forgot to invalidate the cache" error. When will CPU designer learn to have cache invalidation be the default behavior in case of speculation rewind if there was a cache swap during the speculative block?
@@fluffy_tail4365 exept they never got cached, and thats how they figure out what the memory tag is, they iterate trough the numbers and see wich one was in cache, cuz thats the real one. The real exploit here is the side channel memory access.
This issue here is that there is no cache fill happening for the speculated code, which can be detected later on. And as the wrongly speculated generates no error, they can keep trying with new tags until they found the correct one. For me the real question is how they consistently fool the branch predictor to speculatively execute code for a branch never taken! Because that is what bypasses the security here. I would not call this a timing attack, but a branch algorithm attack.
@@TheEVEInspiration It's in the paper. You can see it in the short glimpse you see of the page before he zooms in (around 6:48). It says that they run the code multiple times with correct pointers and *cond_ptr true, to condition the branch predictor. They then make one guess with *cond_ptr false that triggers the speculative execution.
@@FrankHarwald Ah but in this case the vulnerability only bypasses a security system used to mitigate memory corruption vulnerabilities. If your program is written in rust chances are that there are no memory corruption vulnerabilities to begin with, so the attack is possible but useless. Edit: Changed "prevent" to "mitigate".
Sometimes I imagine the biggest security flaw ever, one that will wreck almost every computer and grind the world to a halt for a decade as companies had to bootstrap back up to the kinds of machines capable of making more computers since those were affected too. I imagine that this security flaw is being implemented around now, by some guy in an office making a small arbitrary decision in some new architecture that nobody thinks to question and eventually makes its way into the industry standard. Eventually leading to that security flaw being discovered decades from now.
Speculative execution really is a double edged sword. On the one hand it made x86 what it is today (performance wise) but on the other hand introduces a lot of complexity and attack surface. And now ARM is affected too. Although this is not nearly as bad as Spectre/Meltdown.
@@trens1005 they apparently created fuzzers you can run to find these, but it is also a challenge to even know what you are looking for. But who would have thought that MTE is vulnerable. This was probably months of research
@@trens1005 They did create fuzzers true, but it is also a challenge to even know what you are looking for, they probably did months of research, like who would have thought MTE was vulnerable
It may do wonders for performance and optimisation, but nondeterministic processing is abysmal in terms of security. Cache management, branch prediction, and speculative execution, what an unholy trinity.
The only time I ever hear about speculative exec is as a security vulnerability😂 Speaking of which, could you do a video on the *benefits* of spec exec? I’m really curious now lol
In a nutshell, branch prediction and speculative execution exist to prevent the performance hit that would come from stalling the processor until the correct outcome of the branch instruction is known. Ever since the 486 and Pentium, CPUs have been prefetching instructions from memory and decoding them in anticipation of executing them; the difference being that the 486 would stall its pipeline until it knew which way a branch would go. The Pentium was faster in part because it would predict which way a branch would go and continue fetching and decoding (but not executing) instructions along that path. It was also able to execute instructions up to the jump point, as long as all the inputs were known (out-of-order execution). Speculative execution takes this mechanism further by out-of-order executing the instructions ahead of the branch, placing the results into temporary registers; committing them to real registers (and saving execution time) if the branch was predicted correctly. Out-of-order execution on the Pentium was interesting, because well-optimized assembly code could actually arrange to have the inputs to a jump instruction available just as the CPU was ready to execute the jump; simply by changing the order of seemingly unrelated instructions.
I mean this was all fixed in the 80's with capability systems, but then C programmers wouldn't have the ability fuck themselves over so here we are... Absolutely no reason for software to have access to pointers. Just... lol man In a system designed... not by a c programmer.... even if the pointers were printed out it wouldn't help because you can't address memory directly via pointers. Its not a thing there are ISA commands for.
Could you please elaborate on such ISAs ? I find that quite interesting, but from a quick search nothing quite like it came up. (either now classical ISAs or capability based security disconnected from ISAs)
@@filip0x0a98 i don't have a link but i saw pdf study about realisation of capabilites on current processors with changes to compiler and kernel and even some possible compatibility with older software
@@AK-vx4dy If it is done in software its broken. Arm, x86, and I think RiscV, all grant you access to anything you want if you have the magic memory address. The Flex System, Tendra, and others used things like object addressed memory. Where you can ask for a memory object, so you can't use after free, be out of bounds etc, as its all mediated by hardware ensuring the interaction is correct and permitted. the other advantage of hardware memory management and scheduling is you don't spend thousands of cycles context switching as you negotiate with the OS, you just focus on computation the whole time.
How is it possibly less intensive to try to predict the future and have it loaded than it is to just do the thing you want to do when you decide to do it? That makes no sense... Sounds like speculative execution is a built in attack vector for anything running on the device, that is meant to have some plausible deniability... Like oops sorry the whole entire system is a giant security vulnerabity... Why don't they hire the people that find this stuff and have them make an OS that doesnt have these problems ffs
@@WaltH-sv6to You go to McDonald's, and you order a Big Mac. What's faster, them starting to cook it when you arrive at the counter, or them realizing there is a line of 20 people and deciding to bulk crank out burgers ahead of time? Speculative execution is one of the major architectural speed improvements of modern CPU design. The fact you claim it "makes no sense", suggests you haven't even taken a few seconds to understand its purpose. Engineers didn't just add it in for funsies.
@@adamsoft7831I’d say it’s more like you see someone heading towards the entrance so you decide to start making a big mac but there is a chance the customer will order chicken nuggets instead in which case you will discard the big mac and start making nuggets from scratch
Hi @LowLevelLearning I just took your course from Low-level academy... Would be great if u can add a detailed OS course to that... Also add more content for ARM and C
@LowLevelLearning Just because I'm not sure if I've understood everything correctly. This memory tagging is just an additional security mechanism in ARM processors and not the only one? So this design flaw doesn't make ARM processors less secure than other processor architectures, it just makes them less secure than intended. Correct? Or do ARM processors lack other security mechanisms that other architectures have?
So much of this complexity was invented in the 1970s when computers were small and expensive and we had to make computers more efficient. In an age of 64-bit SOCs with gigahertz clocks for less than $10 we need to jettison the unneeded complexity and shift to much simpler architectures.
@@tedspradley809 what are you talking about, we're not doing the same things as in the 70s, we still benefit from cache and branch predictions etc. Virtual memory is a security concern and thank god (or engineers) it exists
Soon enough devs will be making even slower software in Python translated into JS running in a web browser in a virtual machine that is running in a browser (and call the framework Pentaquark). And that would be just a HelloWorld, so you'll need that juicy speculative execution. Although I would argue that another type of execution might be necessary these days to ensure the bright future of fast software.
haha wow that lowlevel.academy guy seemed pretty cool huh?
Whos that?
Never heard of that guy... Does anyone know that guy?
Yeah, I like his hair
😮 Very tempted by this assembly course. I’ve done a bit of assembly in some really low-level optimisation work (comparing what different Rust functions compile to), very nice very cool
my bitdefender gives warning on that werbsite.
people that figure this stuff out are so amazing. like I understand it, after you explain it, and am like "yep I get it," but I could never actually figure it out beforehand or even consider that it exists.
@@c.ladimore1237 I’m not claiming that it is easy by any means, but these people spend everyday searching for bugs like these. Surely, at some point, they develop some kind of intuition.
That's also part of the skill of the presenter. A good presenter can easily make you feel like you know more than you do.
@@c.ladimore1237 I don’t professionally find exploits, but I have found unique ways of using things in unintended ways.
My understanding is exploits like this are either people looking at how things work and being like “wait, that means theoretically it will do this thing too” or people being like “I wonder if it will also do this thing too” and trying it.
So to me, it seems more akin to educated experimentation with the scientific method, while software development (although there is experimentation) is more akin to writing a book.
Beacuase it was a team of hundreds of people working on it
If you know how a cpu works on the low level, I guess you can think up of these things?.
Every time I hear the phrase 'speculative excution', I am reminded of what a late friend of mine used to say: "CPU designs should never incorporate speculative execution or branch prediction. They will inevitably lead to security vulnerabilities." He was also a big fan of the ARM architecture, because it did not use to do this thing. He passed away about fifteen years ago, but as it turns out he was right...
Only in architectures where it was added long after the instruction set was finalized. The problem is not that CPUs have speculative execution, but that the 8080 they're based on didn't.
the problem is that specultive execution / branch prediction brings huge performance benefits, there is a reason as to why we have it and still use it
@@darrennew8211 Not true. The ARM ISA is not based on the 8080 architecture and now also seems to suffer from it.
My friend was very adamant about this at the time, that this would not be restricted to architectures that weren't built around it.
@@juhotuho10 That is the counterargument that I put before him all those years ago and I was treated to a lecture about why the benefits could never outweigh the costs and why especially in multiprocessor/multicore systems this would lead to all kinds of security vulnerabilities. And he pointed out exactly the kind of security vulnerabilities that were discovered in the past decade or so.
@@juhotuho10 It brings huge performance benefits if your architecture is such that it pretends to execute one instruction at a time in order. You don't need it if your instruction set is designed from the ground up to keep every computational unit busy all the time. You need it because you execute one load instruction then one add instruction and then one multiply instruction then one store instruction and expect the CPU to behave like it's not doing all that in parallel.
Modern day computing is too unsafe lets all go be amish.
lmfao yea
when i retire i'm building chairs in a log cabin
@@WarDucc amish computing is too unsafe, let's go back to stone tablets 😅
@@LowLevelTV i will be reinventing the wheel see you when you retire!
You are confusing the Amish with Luddites.
"There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors." (Leon Bambrick)
Let me add two other hard problem. Memory allocation and bounds checking, hunter2
What a quote lmao
Don't forget cache invalidation
@@BobFlats7 cache invalidation is 0th in the list!
Funny, but naming things isn't hard at all.
My God. I guess time to check off "security vulnerability found in something you worked on" off my bucket list.
I was an intern at Arm, on the team that worked on MTE. I did some work around the generation of the tags, and on simulating the overhead they would have in caches and memory.
I have such mixed feelings right now. :D
This seems like something we could have thought of. Meltdown and Spectre were fresh on our minds and a major topic of discussion in the company. I can imagine an alternate universe where I told my manager (or someone else on the team) "hey, have we thought about if tag mismatches could be a cache side channel?" Yet I don't think we ever discussed anything related to this? At least not in any of the meetings I was in.
But hindsight is 20/20. In retrospect, these things always seem obvious.
We were mostly focused on minimizing the performance overhead of memory tagging, because we were worried it would get in the way of adoption. We wanted our new optional security features to be supported by hardware manufacturers, who might not be happy with there was too much perf or memory overhead, extra hardware complexity, or cost / die area increase.
Though, I guess, despite this new vulnerability, it still delivers on its goals. MTE was supposed to be something that offers substantial security improvements for cheap. A "better than nothing" optional feature which, when enabled, has a good chance of catching some bugs that might not be found otherwise. It is probabilistic: even if it worked perfectly, there is still a small chance a memory bug might go undetected by it (if different allocations happen to be assigned the same tag by chance). It was not meant to be perfect, or any sort of bulletproof defense. Just a way to hopefully catch more bugs in the wild. If a vulnerability makes it less effective, that's still better than every other CPU that does not have something like MTE at all.
It has its value as a hardware address sanitizer. I used it on C code within an Android App on the Google Pixel 8, which supports MTE, and it helped to figure out and fix a hidden memory management bug (a use after free).
@@olafschluter706 Yep. "Hardware ASAN" is pretty much how we thought of it when designing it. The motivation for MTE was "imagine ASAN but with low enough overhead that you could deploy it in release/production builds and just enable it everywhere, and hopefully also catch bugs in the wild instead of just during development."
@@inodedentry8887 yeah. Arm have said that the tags aren't secret. The video is somewhat misleading. Not all arm CPUs have mte and it isn't used much it seems
@@HayesHaugen I think if it helps to catch memory management bugs, it helps to reduce the attack surface and the number of possible exploits of software checked by it.
@@HayesHaugen Yeah well, there are several different benefits to MTE.
"Stopping an attacker" is certainly one of them, but IMO not even the most important one. There should hopefully be many other security measures on the system, and MTE would only be one piece of the puzzle.
Given this new exploit, this use case has been compromised. A determined attacker will be able to bypass MTE.
But, like I said, "Hardware-accelerated ASAN" is the other major use case.
Traditional compiler ASAN is very slow and only really useful as a dev tool.
The idea is that you can ship your production/release software with MTE, and, at an almost-negligible perf cost, the user's CPU will validate all memory accesses and catch memory safety bugs out in the wild for you.
If those bugs are reported back to the developers (probably via some automated crash report system), they can hopefully be fixed, before anyone has even tried using them for an attack.
In this sense, MTE can be a valuable tool to discover bugs in production software. That in itself is a pretty big benefit. It can help software be more secure in general, regardless of whether anyone is actually trying to exploit anything.
Weeks ago UEFI, now ARM last year I joked about hardware backdoors this year
STOP JOKING! :D
THANKS FOR JINXING IT XD
Please stop helping...
Except neither were backdoors. In the first case it's just a standard buffer overflow bug, except because you're running directly in ring -3 there's no ASLR to save you. The ARM bug is actually a feature that speeds up the CPU, which is good, but accidentally was implemented wrong. The difference is that buffer overflows can be patched by a software update (if you haven't downloaded the UEFI security update please do so right now), but a bug in the CPU itself means you need a new CPU.
You are the guy that says "q***t day" in the office/chat aren't you
"EVERY ARM cpu" article shows that it was introduced in arm v8.5
And everyone talks about Cortex A and forgets that Cortex R and Cortex M realtime and microcontrollers are massively different.
@AndyGraceMedia and even inside the A, R, M family there is a huge variety depending on what usecase they are designed for.
it's not every ARM processor, only V9? so title is kinda clickbait
I am a (retired) professional programmer. I never wanted my programs to run as fast as possible. I wanted them to run as reliably as possible, i.e. rock-solid reliably. I have seen countless examples of programmers being led astray by the siren song of premature optimization.
It depends. ARM processors are often used in embedded devices with few resources and hard real-time requirements, and programs that are not as efficient as possible may not be appropriate.
@@NoSpeechForTheDumb This is a hard blanket statement to make because a lot of embedded systems will prefer stability over speed. You don't want life critical systems failing due to software bugs that can be mitigated at compile time.
@@TheMixedupstuff there are some instances of embedded systems where reliability is most important, of course. That's why I said it depends. The blanket statement was made by OP who said he ALWAYS wants his systems as reliable as possible when for some applications this may not be necessary or possible.
Depends, if you make video games, you kinda want the game to look and run as good as possible on cheap hardware.
The saddest thing is that even though it is possible to write both robust and fast software, the capitalist system, that we have now basically everywhere, disincentivizes that by putting "make the most money" above all else, especially something that requires spending more time and effort. In other words, businesses like easy money, and making robust and fast software is not easy.
Great breakdown! Not surprised to see that speculative execution is causing vulnerabilities on more than just x86 - really feels like it was only a matter of time before something like this was uncovered. The way it was done, though, is absolutely wild.
Lets wait for dozen of fixes that will decrease productivity compared to leaving the feature off. No lessons learned whatsoever.
@@alexturnbackthearmy1907 Not doing speculative execution isn't really an option though...
That would cause a FULL pipeline stall after every branch. And not doing prefetching is even worse.
Complex problems require complex solutions and those oversights are sadly the cost of that.
We can only hope that most things are found and fixed before they can turn into widespread exploits in the wild or hope for memory to suddenly get 1000x faster without any other downsides.
@@Momi_V Eh, if thing were actually done the right way, we wouldnt have this conversation whatsoever. At least there is hope that they dont throw it under the rug (just like "superior" windows ARM hardware which isnt really).
@@alexturnbackthearmy1907 modern cpus without any branch prediction wont stand a chance in terms of performance to one that has all mitigations enabled, even the non applicable ones
With the speed of modern processors, who cares if we turn off the MTE functionality. Why inject a process that has a vuln. Surely there are other ways to sandbox for bugs during development.
CPU vulnerabilities usually need relatively low hardware access in order to work.
But when I heard you saying somebody managed to exploit it from within V8 (being a web dev) it literally just hit me - We're f**d.
JS isn't as much of a toy these days. You can easily manipulate raw binary data in JavaScript. Some more tinkering and this would easily escalate to a sandbox escape and really, really low-level code injection... From within a browser...
reject modernity, let's go back to monke! err... I mean DHTML
Tbh v8 0-days are being discovered every week now. It's easy to get RCE without some crazy CPU bug.
@@theairaccumulator7144 Yes, but for good results you'd need to escalate privileges, injecting direct CPU instructions omits that completely.
@@theairaccumulator7144 Yes, but in general, first you have to escape the sandbox, then find a a way to execute your code in something like a shell, and then gain admin access.
The paper covered in this video describes how it was done all in one step.
also: does web assembly still exist? This is lower level than js so it should be more easy to predict which wasm instruction transpiles to native machine code, making side-channel attacks even easier & more reliable then using js.
This reminds me of PAC introduced in iOS 14 that made jailbreaking very difficult. Eventually a couple Chinese researchers found a way to sign the pointers themselves to bypass it, but I still was fascinated enough by it that I did a college presentation on it in my computer architecture class.
Chinese researchers discovering secret silicon-level back doors always makes me laugh at the bad day someone at the NSA is having
The way you explain in these videos even a golden retriever can grok these topics. No pun intended
Golden Retriever Open Knowledge
Golden Retriever Operating Kubernetes
🤔
Found Ed thru John Hammond, but since John doesn't seem to do vids that aren't just straight ads anymore, I'm excited this is still here to learn from. Thank you, sir!
Yea John hasn't been a reliable source of info in years, bros sold for real.
You have in my opinion some of the best content over hosted on UA-cam. If this existed in 2004 my early programmer self would have had a much easier time learning how to exploit for fun ;).
IA64 had a ton of problems, but I really believe that explicit speculation was a great idea. So many of these attacks would be impossible on Itanium. (Insert joke about them not being attacked because no one used them)
What is explicit execution?
@@deusexaethera IA64 puts the work of avoiding problems due to parallel execution in the hands of the compiler. I.e., no mechanism to back out unexplored paths like with speculative execution. The idea was to run the CPU fast and loose, and just force compiler writers to deal with the burden to take advantage of full speed. Problem is, there are lots of languages and compilers, and not everyone wants to incorporate this stuff into code generation, and not everyone is good at it.
so the "feature" was it didn't do anything special?
@@MadsterV more correctly, the CPU didn’t hard-code any of the behaviors: the pathways existed in similar ways to x86, but required explicit control via ultra-wide instructions (VLIW architecture) which meant explicit, multi-instruction parallelism. In some ways, this arguably complicated the CPU as it made instruction parsing many times more complicated; on other archs those features would run mostly on autopilot while the instructions remained easy to parse and prevent collisions/weird behavior.
Hitachi SH5 also had a very nice branch expliceit prediction architecture. Unfortunately that did go nowhere :/
spec. execution is not only about filling up the cache to be ready, it can actually execute part of the code in different execution units but later either keep or discard the results depending on the path taken
Tf is your pfp
Exactly. See Lex Fridman's first podcast with Jim Keller for a really good explanation of how modern processors work in this way.
OK, interesting, but this is a way to defeat a secondary defence. The program still has to contain an exploitable memory corruption in the first place. I think describing it as an unfixable bug is to some extent click-bait.
@@sylviaelse5086 I agree. It's also not close to EVERY ARM CPU. Only newer Cortex-A CPUs, no M devices at all. Seems like a bad bug, but color me underwhelmed after that title.
Given how many "unfixable bugs" have been found and viola, fixed in one way or another, yeah, clickbait.
Clickbait doesn't win subscriptions, it wins unsubscriptions.
from what i understand, you need to achieve arbitrary code execution to achieve arbitrary code execution. it is a little silly.
@@nocakewalkthe M chips already have their own vulnerability lmao, they don't need this one
@@not_kode_kun which vulnerability?
My jaw dropped when you said it works inside the V8 sandbox. Bless the researchers for finding this.
I think specter and meltdown did also work in JS, in the browser. The speculation engine will see any code that runs on the cpu.....
Sending my appreciation. Sometimes when searching for work you have a not so wonderful interview for various reasons including just forgetting a term you couldn't recall in a moment. Sometimes a few can affect your mental health especially if not handled with understanding that it has nothing to do with your worth. I had known and worked with assembly. I had known and worked with memory, pointers, understanding buffer overflows, operating systems, and so on building up to a good, extensive software engineering mastery, ethics, and leadership. All of the concepts you mentioned as part of my education. I felt so let down as it seemed no one cared that I knew this stuff and it made me question if I should have specialized in a different path (CE, CS, EE even, physics, etc) when feeling like things weren't working out. I was lifted up as I could follow everything you noted and that I was able to see how worthwhile my time and degree were at my university. I just mean to say I appreciated so much having a reminder when you feel a job struggle to see that you have value and no one can take that away, including in this small way like having an education even if no one is acknowledging it yet. 🙌🏾
Misleading title, there are ARM "chips" that do not have these extension, a lot of them even do not support virtual memory
The crazy thing about this attack is that the basic concept isn’t even that complicated but being able to come up with that idea in the first place and really pull this off in real life is still impressive and mind blowing.
Who would've thought that doing insane things just so you wouldn't have to admit to yourself that Moore's Law has been dead for a lot longer than people imagine would've caused so many security issues?
Seriously underrated comment.
Remember Pointer is the variable holding the address not the address itself, Dope content, massive respect …
This is why I use an abacus. Granted, AR/VR apps are tricky, but no viruses!
i just use my fingers
V8 engine screams to me : "you can do this on your phone right now"
OMG It's amaizing!, when you said they did it in V8 was... OMG, incredible! how many layers of security they get to bypass!
The pacman vulnerability has existed for a few years, the big take away from this paper is that they found a pattern to exploit it in other code.
the "hats off" right after talking about a hair cut was accidentally brilliant 😂
nice sponsor, heard good things about that dude
Spectre broke literally nothing. It was a hype wave that lingered for a couple weeks and went away. Nothing ever was heard about any hacks exploiting it after. I expect the same is going to happen to this bug too.
Impressive how people may find all those vulnerabilities!
Thanks for the video!
Access to leaked tags doesn't ensure exploitation. It simply means that an attacker capable of exploiting a particular memory bug on an affected device wouldn't be thwarted by MTE.
But since this re-opens the door for buffer overflows, which after all is the most commonly found attack vector, we're basically back to square one. If someone finds an exploitable buffer overflow bug in the V8 sandbox, then you're looking at unprivileged code execution, which can be problematic enough. If someone finds one in both V8 and a kernel call then you have complete device pwnage. This smells a lot like how the PS3 was pwned.
@@andersjjensen or uglier, crash-o-matic, one runs into race conditions if the software didn't return a clean abort.
Still, code should be able to work around, like all of the other "unfixable bugs" over the years.
I am Pentium of Borg, you will be approximated.
The door was never "shut" to buffer overflows by MTE, its a second line of defence, and to breach it you still need a memory vulnerability in a target program (which MTE in this specific case will never catch anyway, its not designed to be perfect) and an incredibly niche one at that for this exploit. Problems like this can be better prevented when we move towards safer languages for userspace like rust and the lot.
As is usual with security, you cant rely on any one countermeasure, you need defense in depth.
Amazing find by these researchers! This is the beauty of our community: ppl take time and try new things and find these bugs like this!
I suspect we're heading towards a fundamentally unpatchable, ubiquitous and catastrophically effective exploit that forces us to fundamentally re-think chip design.
With software moving faster than hardware this has always be inevitable but it's still crazy to think this is probably coming in my lifetime.
Even crazier to think that the chip that's supposed to solve all these problems may end up being the Mark of the Beast described in the Bible
This just defeats a defense in depth measure. The computer is still secure.
The answer is rust. Rust all the way down.
@@mfaizsyahmi If an r0 exploit can for example manipulate any memory, nothing running on that system is secure, at any level. Not rust, not other drivers, literally every computer state can be manipulated - the entire stack even the bios.
@@74Gee A vulnerability is not automatically an exploit. If your computer only ran rust programs compiled with a trusted compiler, the chance of an r0 vulnerability leading to an exploit would be drastically reduced. Similarly, if I had a fully secure interpreter I could run untrusted interpreted programs on a CPU architecture without any hardware/firmware security features at all and still be secure.
Ergo any hardware vulnerability can theoretically be patched in software, with a certain performance penalty. In practice, any sufficiently severe exploit could take down the internet causing untold damage.
It's a classic side-channel attack, more exactly a timing attack. It's pretty well-known in cryptography. Nice work, in a way. That's hardly a bug, but I suppose the title is more catchy.
Somewhere I read and/or saw John Hennessy and David Patterson. They discussed the limitations of current processor designs, emphasizing that security vulnerabilities like Spectre and Meltdown, as well as diminishing performance returns, stem from reliance on techniques such as speculative execution. They propose a shift towards domain-specific architectures (DSAs) and processors capable of executing high-level language constructs directly. This approach would enhance security, performance, and energy efficiency by reducing the need for complex compiler translations and leveraging the open-source ecosystem for rapid innovation. But then legacy support as we have it now digging back to the 70s would be hard to maintain .. ;)
Love this guy. Incredibly smart, incredibly articulate. Really impressed. An inspiration to us all.
Thank you for your vids. Any update on that php vulnerability? Couldn't find further info on the details of it, beyond being related to language/encoding.
@@kiverismusic iconv chinese extended character bug, the fix is with a glibc update
Love that they’re called gadgets, like in hardness proofs
Jeez. What's up with all of those serious recent exploits?
honestly this is common, i'm just making more people aware of it. bugs are everywhere
Probably recency bias. Exploits come out all the time, but due to the big ones early this year people are on edge and more of them go mainstream.
@@LowLevelTV all these code issues is why I'm waiting for the day computers program computers. Humans arguably suck at it, as we've seen.
@@IncertusetNesciothis kid really thinks AI is going to take over😂😂😂
AI will destroy this world. Not humanity. The surge in mediocrity and destruction of novelty brought on by AI will destroy everything humanity has worked tirelessly to create. AI won't be terminator. It will be an invisible drain on society until every product from the US to China is so dumbed down it might as well be trash.
Thank you, nice and simple. Not so much about the hack, but rather the details of the limitations of the hardware implementation. We need better hardware developers. Which is to fire the crappy software developers. What a wasted effort, on the part of ARM, in the realm of address security. So remove the tag, remove the interrupt, or remove the look forward. We should quit worrying about speed, and actually do the job that is required. But no, OMG we used 4 bits more than before, we used 3 clock cycles. I believe in perfection before speed or space. Anyway thank you so much for the details that you supplied, I really enjoyed your talk. Keep up the good work.
If you can run arbitrary tik tag code on the cpu, you don't need to break the memory tagging, just run whatever arbitrary code you want on the cpu.
Half true, this can be used for privilege escalation.
Damn this is such a good video, thanks for explanation. I have only recently started learning stuff abt comp architecture and security and this video is still explaining the paper in the most crystal clear way possible that even I understood it.
The existence of these kinds of bugs reinforces why most of these hardware security features are often not worthwhile. Making all these "secure enclaves" "secure boot" and such are all just waiting to be exploited and broken, and the fixes just make it even more complicated or slow. In the past we had viruses and such but at least that was just software that could be fixed with patches and at most reinstallation. Now we have hardware that will be perpetually flawed, and even closing some of the bugs through microcode updates might not be 100% effective. Now we have to live in fear that something has permanently exploited our systems because the hardware itself is breakable.
Yup, overthink the plumbing making it easier to stop up the drain - to paraphrase a particular engineer. I think you hit a key point that these are permanently baked-in features. Zero day one of these and let the fun begin! 😲
@@meltysquirrel2919 speculative execution specifically has such a massive impact on performance that not doing it just ain't an option.
It was to the point where users would go out of their way to disable spectre/meltdown patches and see a *significant performance increase* until the patches were improved.
And it's not like speculative execution was disabled, it was just reduced. And even that was noticeable enough to be a concern.
So yeah, in a case like this, the plumbing is simply complex, no way around it. That's just how computers are at the lower levels.
You aren't piping a sink to a drain. You're piping a thousand sinks to a thousand drains in real time according to a set of given instructions.
And as it turns out, it isn't easy. And the incentive for breaking that plumbing is massive, so a lot of people are working on doing so.
The end result is what we're seeing here. Complex plumbing getting broken by people with massive interest in doing so. Fun...
There's always a trade-off. You can have a simple, provably safe hardware architecture if you're willing to accept an arbitrary performance impact in return. You can have fast, secure hardware if you're willing to pay significantly more for overprovisioned hardware. You can run on insecure hardware with no risk if you airgap your system, drastically crippling its usefulness.
Sure, you can cut out a feature you think is unsafe. But what are you willing to sacrifice in exchange - security? Performance? Flexibility? Compatibility? The tradeoffs are where the real engineering happens.
All those hardware vulnerabilities require a software vulnerability first. That software vulnerability would still exist even if the hardware had no security measures to speak of.
At worst, the hardware security features do nothing and lull you into a false sense of security. However, they never directly decrease security.
Persistent (against re-install) viruses can only be stopped if you make the firmware read-only at a hardware level. That is one area where I agree with your assessment. A little toggle switch to write-protect all firmware would go a long way. Then if you think the hardware security does more harm than good you can still permanently disable firmware updates making persistence impossible.
Uefi has entered the chat
Assembly code since the 70s here .. and yes, we're still longhaired and play music .. approaching 62 :)
Reminds me of my introduction to Java. How to get rid of most security holes? Bounds checking. References, not pointers. Fantastic I thought. Security built into the virtual machine! But here we are literally decades later and we're still in the C/C++ paradigm. Billions of dollars a year this costs, yet we're unwilling to abandon thinking in terms of pointers and unwilling to make things like runtime bounds checking mandatory.
"We are unwilling to take 3 to 10 times performance impact"? I wonder why.
@@denysvlasenko1865 That's ancient news.
@@denysvlasenko1865 Not to mention Java's propensity to just not properly garbage collect.
but the JVM is based on C/C++ ain't it ?
@@denysvlasenko1865 ur paying the performance impact with the cpu trying to fix ur mistakes, bounds checks can also be compiled away in a lot of cases
also i feel like u made up the 3 to 10 times number, if the bounds checking always succeeds then isnt branch prediction just gonna be always right and u would have no impact? in hot loops at least
Pro tip: show hex values (like pointers with embedded info for tags or virtual memory) in a monospaced font. Programmers can visually parse the fields much more easily. Thanks.
2024 - The year of the backdoor and the vulnerability
hold your popcorn... AI is comming hard
I find amazing that the people can speak about such advanced subjects, while I try simple to fit an excess 127 code for a normal overflow fix in a vhdl dsp fpu unit. My God, where do you have the time to read these subjects?
0:09 You know that there's three computers in the term "ARM computer"?
First, the obvious "computer". Second, "ARM" stands for "ACORN RISC Machine", "Machine" referring to a computer. Third, "RISC" stands for "Reduced Instruction Set Computer", revealing the third computer.
Almost blew my mind when I first realized that XD
@@Lampe2020 so spell it out, Acorn Reduced Instruction Set Computer Machine Computer 😂
@@nicholasvinen
Exactly.
That brings to mind the people who say things like, "ATM machine" and "PIN number".
Arm no longer stands for anything.
It stopped standing for Acord and moved to Advanced RISC Machine in the mid 90s. And in 2017 moved from ARM to Arm.
(Source: I'm and employee.)
@@m1geoYour message explains a lot.
TMA = Too Many Acronyms
Seriously, the people behind that paper needs to be praised as heroes.
wow, only option now is templeos
Always has been
Time Cube security is unmatched
I think calling speculative execution "execution in the future" is misleading as it conveys they idea of a "front-running thread", which is a very distinct and different thing.
The processor simply runs a program and if it needs to make a branch/turn and does not know which way to go, it speculates.
To keep a proper program state, this speculative execution cannot do certain things, but once the speculation is confirmed to be correct, the accumulated speculated results can be committed.
From the processors perspective running the program, it's just execution current code, just of a speculated branch.
There is of course a lagging program-state that represents the validated non-speculative outcomes.
It can restart from this state when the speculated code turned out to be the wrong code and resume with the correct code instead.
A processor is thus not "executing future code".
It might run the wrong code and discard the results, but it's not running ahead of the actual program.
That is a lot less mystic and magical to me.
It’s NOT in every ARM CPU! Change this clickbait title. 😒
@@ArnaudMEURET in which arm cpus are they in? the snapdragon x cpus?
@@be8090
From what I could gather from Wikipedia, it’s in ARM Cortex X2 through X4, which means ALL Android-based smartphones of 2024 and 2023 and a good number of Android-based smartphones from 2022 (especially Samsung Galaxy S22 and co.). Note: usually only the performance cores are X2 or later.
Interestingly, MTE was introduced with ARMv8.5-A (so really all architecture revisions from 8.5-A through 9.4-A have MTE (though 9.0-A is really just 8.5-A with additional features); whether this bug was ever patched in any of the later revisions, I do not know). This means MTE has been on Apple A-series SoC since A14 Bionic and on EVERY Apple M-series SoC since the first. This means for Apple smartphones *and tablets,* it’s been present since iPhone 12, 3rd gen iPhone SE, 10th gen iPad, 4th gen iPad Air, 6th gen iPad Mini and 5th gen iPad Pro. For Macs, it’s been present since 2020 for MacBook Air, MacBook Pro and Mac Mini, 2021 for iMac, 2022 for Mac Studio, 2023 for Mac Pro and 2024 for Vision Pro.
Why not?
The first sponsorship I’ll click and use in my life 😆 thanks for your awesome content! 💪
Spectre and meltdown did not break the internet.
This is fundamentally similar to a hash collision exploit, so the solution is the same. Increase the entropy on the memory tags so that the reuse is practically impossible.
increasing the entropy on a random generator that can only generate only 16 distinct tags? Not possible. To make it practically unexploitable, the randomized tags should be significantly longer, and this means decreasing the significant length of 64 bit pointers, meaning you need to decrease the maximum size of the usable virtual memory space. On a phone where you can only have about 128GB memory (including virtual memory and the kernel space and I/O space), only 37 significant bits are needed for virtual space, so tagging addresses is possible with up to 27 bits (instead of just 4 in the hardware implementation on MTE in ARM8 chips). The problem is that the MTE hardware was too cheap and does not preserve at least 1 bit of its tag to segregate the cache use for the kernel and the user space.
A basic fix that will at least prevent MTE exploit would be for MTE to not assign pseudo-randomly the 4 bits" in its tags, but reserve 1 bit with constant (set it to 1 i.e. 8 distinct MTE tags for the kernel or other rings such as external drivers, or security tools, or GPU internal work buffers, and 8 distinct tags left for user space, e.g. in Javascript v8). This reduces the memory hardware corruption detection in each ring to 7/8 instead of 15/16, but other asynchronous detection is still possible by software (e.g. to detect heap corruption, or stack-allocated buffer overflows caused by software bugs and unchecked boundaries).
Segregation of cache usage is the ultimate solution to avoid escaping the sandboxing.
Now ARM should think about allowing tags to be larger (and if needed, to use additional internal bits for tagging pointers with for example a 80-bit address space, with help of a supplementary cache for virtual pages tags) and making sure it still uses a strong enough random generator for these much larger tags.
Users may then complain that a 128GB virtual space is "not enough" for their smartphones (and their embedded GPUs), but when this will be really significant, enough years will have passed so that we may see the end of 64-bit processors, replaced by 80-bit or 128-bit processors (at that time, speculation execution will not longer be a problem as the "cache vulnerabilities caused by speculative execution" will not longer be exploitable in any reasonable time to cover the very huge cache-tagging space.
So may be all these is the sign that the computer industry must think about upgrading their hardware to 128-bit processors (even if the usable significant virtual memory space will still, for long, remain largely below the 64-bit limit). At that time MTE-like tagging will be extremely useful and in practice extremely secure (especially for mobile devices).
There's a lot of 'IF's in there. If you can find the right code, if you can find the tag , if you can change it, if... if.. if...
Whilst this is a possible route for an attack has anyone actually used this in the real world, not just in the research lab.
@@kevintedder4202 if anyone did, it would probably be state level threat actors. These are the kind of zero days that sell for tens of millions.
Speculated execution was always pandoras box. This is quite clear after Spectre and Meltdown. Its damn hard for chip designers and ISA designers to do it 100% correct.
Even if they do get it 100% correct, it's still going to be vulnerable to a cache timing side channel attack.
@@BrendonGreenNZL Yes true. My statement above is not precise enough. Spectre lives from the behavior of the cache itself in combination with speculative execution and branch prediction.
holy authentication man, just tried to enroll in your arm course and I had to log in like 5 times. Would be worthwhile for you to look into that.
we are overhauling the auth
@@LowLevelTV time to roll out Firebase Auth haha
@@LowLevelTV after purchase I also had to log out and log back in
Oh man, this is an amazing bug. I was there in the 90's when "smashing the stack" hit. It was above my pay-grade at the time, but it was clear in the late 90's that you could get wrecked by a few bad bytes on the wire. Overflow after overflow into the new century, race conditions all over kernels, you sure you want a multi-user system? Nowadays, multi-tenant systems suffer similar problems with any shared resources. You really can't have everything in once package.
I just assume that all computers are inherently insecure and act accordingly
It seems like this is very similar to PACMAN except that paper breaks pointer authentication code instead of memory tag. Both takes the approach of brute forcing a 16-bit secret by abusing speculation.
Isn't this the exact thing that happened with Apple Silicon?
Or at least very similar?
Apple Silicon is Arm
Well, it is ARM, so I'm assuming yes.
@@gljames24 not really. Arm based but very custom.
Not *EVERY* ARM cpu! I moved into developing 32 bit asm on the ARM2 and even had a go at an original ARM1 BBC Micro cheese wedge which never was really a product, just a dev system.
I can categorically say that this exploit will not work on either of those CPUs as they had exactly zero kilobytes of cache :) With 4k cache on the ARM3, and a 24/26 bit address bus and processor status stuffed into the remaining 6/8 of 32 bits... I still think you'd find it impossible.
it's another "we speculated, rewound and forgot to invalidate the cache" error. When will CPU designer learn to have cache invalidation be the default behavior in case of speculation rewind if there was a cache swap during the speculative block?
@@fluffy_tail4365 exept they never got cached, and thats how they figure out what the memory tag is, they iterate trough the numbers and see wich one was in cache, cuz thats the real one. The real exploit here is the side channel memory access.
performance hit from failed speculations would be a dog
This issue here is that there is no cache fill happening for the speculated code, which can be detected later on.
And as the wrongly speculated generates no error, they can keep trying with new tags until they found the correct one.
For me the real question is how they consistently fool the branch predictor to speculatively execute code for a branch never taken!
Because that is what bypasses the security here.
I would not call this a timing attack, but a branch algorithm attack.
@@TheEVEInspiration It's in the paper. You can see it in the short glimpse you see of the page before he zooms in (around 6:48). It says that they run the code multiple times with correct pointers and *cond_ptr true, to condition the branch predictor. They then make one guess with *cond_ptr false that triggers the speculative execution.
@@HerrNilssonOmJagFarBe Interesting, that is just changing data out after a few tries, so simple.
The JavaScript V8 engine uses a technique called NaN Boxing and Pointer Tagging which attaches the variable type inside the pointer address
Unfixable bug? More like NSA engineered backdoor 😂
are you surprised all cpu's have back doors x86 has their arm having them is no surprise at all to me as it makes sense🤣🤣
Kirin CPU + Harmony OS NEXT 💪
I bet it has something to do with pointer authentication (control flow).
3:55 I wasn't that far off LOL
@@tablettablete186 you were actually on the mark since tags are used to authenticate pointers.
I remember reading that from Aleph1 back in the day 😯seeing that paper just took me way back!
Should've used rust /j
:D unfortunately side-channel attacks are impervious to whatever rust throws at it if the hardware is unfit to provide for such security.
@@FrankHarwald Ah but in this case the vulnerability only bypasses a security system used to mitigate memory corruption vulnerabilities. If your program is written in rust chances are that there are no memory corruption vulnerabilities to begin with, so the attack is possible but useless.
Edit: Changed "prevent" to "mitigate".
copium
Sometimes I imagine the biggest security flaw ever, one that will wreck almost every computer and grind the world to a halt for a decade as companies had to bootstrap back up to the kinds of machines capable of making more computers since those were affected too. I imagine that this security flaw is being implemented around now, by some guy in an office making a small arbitrary decision in some new architecture that nobody thinks to question and eventually makes its way into the industry standard. Eventually leading to that security flaw being discovered decades from now.
Speculative execution really is a double edged sword. On the one hand it made x86 what it is today (performance wise) but on the other hand introduces a lot of complexity and attack surface. And now ARM is affected too. Although this is not nearly as bad as Spectre/Meltdown.
So this is bruteforcing tag speculating on cpus' assumption of outcome of a code to be ran? Brilliant!
Is this the year of exploits?
Apple: "It's not our Apple Silicon ARM chip, you're using your Macbook wrong"
My phone was hacked just by going outside. These videos are nice and all, but ppl need to understand just how vulnerable our devices really are!!!
Apple also have vulnerability on its M series
probably because it's ARM.
Snapdragon also. & pretty much every recent major ARM CPU safe the lower end models (e.g. Cortex M).
Man just write the code so convoluted that CPU won't able to speculatively predict the branch. Problem solved?
How the hell they find these
Automated tools
@@trens1005 they apparently created fuzzers you can run to find these, but it is also a challenge to even know what you are looking for. But who would have thought that MTE is vulnerable. This was probably months of research
@@trens1005 They did create fuzzers true, but it is also a challenge to even know what you are looking for, they probably did months of research, like who would have thought MTE was vulnerable
fr
why did 2 of my own replies got deleted
Thank you, very distinctive explanation ! Keep up ! Good luck ! I have some different CPU boards (AllWinners family) but luckily they are v6 and v7.
x86, M1, ARM, we're just building a collection of vulnerabilities.
It may do wonders for performance and optimisation, but nondeterministic processing is abysmal in terms of security. Cache management, branch prediction, and speculative execution, what an unholy trinity.
Speculative execution was a mistake
I do not like the death penalty in general, but they should at least properly trial people first.
A poison tree
The only time I ever hear about speculative exec is as a security vulnerability😂
Speaking of which, could you do a video on the *benefits* of spec exec? I’m really curious now lol
In a nutshell, branch prediction and speculative execution exist to prevent the performance hit that would come from stalling the processor until the correct outcome of the branch instruction is known.
Ever since the 486 and Pentium, CPUs have been prefetching instructions from memory and decoding them in anticipation of executing them; the difference being that the 486 would stall its pipeline until it knew which way a branch would go. The Pentium was faster in part because it would predict which way a branch would go and continue fetching and decoding (but not executing) instructions along that path. It was also able to execute instructions up to the jump point, as long as all the inputs were known (out-of-order execution). Speculative execution takes this mechanism further by out-of-order executing the instructions ahead of the branch, placing the results into temporary registers; committing them to real registers (and saving execution time) if the branch was predicted correctly.
Out-of-order execution on the Pentium was interesting, because well-optimized assembly code could actually arrange to have the inputs to a jump instruction available just as the CPU was ready to execute the jump; simply by changing the order of seemingly unrelated instructions.
I mean this was all fixed in the 80's with capability systems, but then C programmers wouldn't have the ability fuck themselves over so here we are... Absolutely no reason for software to have access to pointers. Just... lol man
In a system designed... not by a c programmer.... even if the pointers were printed out it wouldn't help because you can't address memory directly via pointers. Its not a thing there are ISA commands for.
Lol man...this tag thing is kind of capability system...
Also for capability system to work efficiently you need caching and speculation too...
@@AK-vx4dy Its a patch trying to be a capability system... but its not, which is why it got crushed like tin foil.
Could you please elaborate on such ISAs ? I find that quite interesting, but from a quick search nothing quite like it came up. (either now classical ISAs or capability based security disconnected from ISAs)
@@filip0x0a98 i don't have a link but i saw pdf study about realisation of capabilites on current processors with changes to compiler and kernel and even some possible compatibility with older software
@@AK-vx4dy If it is done in software its broken.
Arm, x86, and I think RiscV, all grant you access to anything you want if you have the magic memory address.
The Flex System, Tendra, and others used things like object addressed memory. Where you can ask for a memory object, so you can't use after free, be out of bounds etc, as its all mediated by hardware ensuring the interaction is correct and permitted.
the other advantage of hardware memory management and scheduling is you don't spend thousands of cycles context switching as you negotiate with the OS, you just focus on computation the whole time.
Pretty awesome find by the team
Man, please let RISC-V be somehow safer...
If they dont include the feature...but they definitely will because it gives a much needed boost in performance. At first.
@@alexturnbackthearmy1907 CPU's don't need so much performance. For example, in the 90's the CPUs were much slower and it was enough.
The mere mention of "speculative" and "prediction" already makes my neck hair stand up...
How is it possibly less intensive to try to predict the future and have it loaded than it is to just do the thing you want to do when you decide to do it? That makes no sense... Sounds like speculative execution is a built in attack vector for anything running on the device, that is meant to have some plausible deniability... Like oops sorry the whole entire system is a giant security vulnerabity... Why don't they hire the people that find this stuff and have them make an OS that doesnt have these problems ffs
Because of parallelism? It traverses multiple paths, and keeps going with the required path. It's kind of genius, except super exploitable apparently.
@@WaltH-sv6to You go to McDonald's, and you order a Big Mac. What's faster, them starting to cook it when you arrive at the counter, or them realizing there is a line of 20 people and deciding to bulk crank out burgers ahead of time?
Speculative execution is one of the major architectural speed improvements of modern CPU design. The fact you claim it "makes no sense", suggests you haven't even taken a few seconds to understand its purpose. Engineers didn't just add it in for funsies.
@@adamsoft7831 good analogy.
@@adamsoft7831I’d say it’s more like you see someone heading towards the entrance so you decide to start making a big mac but there is a chance the customer will order chicken nuggets instead in which case you will discard the big mac and start making nuggets from scratch
despite of your wonderful presentation, why the initial lower case in the title bothers so bad? Thanks for the content
Great video and information!
Hi @LowLevelLearning I just took your course from Low-level academy... Would be great if u can add a detailed OS course to that... Also add more content for ARM and C
@LowLevelLearning
Just because I'm not sure if I've understood everything correctly.
This memory tagging is just an additional security mechanism in ARM processors and not the only one?
So this design flaw doesn't make ARM processors less secure than other processor architectures, it just makes them less secure than intended. Correct?
Or do ARM processors lack other security mechanisms that other architectures have?
Kinda neat explanation of virtual memory, wish had it when wrote driver for Armv8 MMU. Also not the speculative execution exploit again
So much of this complexity was invented in the 1970s when computers were small and expensive and we had to make computers more efficient. In an age of 64-bit SOCs with gigahertz clocks for less than $10 we need to jettison the unneeded complexity and shift to much simpler architectures.
@@tedspradley809 what are you talking about, we're not doing the same things as in the 70s, we still benefit from cache and branch predictions etc. Virtual memory is a security concern and thank god (or engineers) it exists
Soon enough devs will be making even slower software in Python translated into JS running in a web browser in a virtual machine that is running in a browser (and call the framework Pentaquark). And that would be just a HelloWorld, so you'll need that juicy speculative execution. Although I would argue that another type of execution might be necessary these days to ensure the bright future of fast software.
Excellent explanation. However I would say it's not a bug, it's a data vulnerability.