The Unsafe Chronicles: Exhibit A: Aliasing Boxes

Поділитися
Вставка
  • Опубліковано 18 жов 2024

КОМЕНТАРІ • 68

  • @oskarberndal5310
    @oskarberndal5310 3 роки тому +15

    I think this highlights a nice convention of the current Rust community: Being considered 'safe' requires that it is sound _whenever a user can compile the code_, regardless of how "stupid" the user's code is. If the user can break safety by doing something that is "obviously" really stupid it is considered unsound. High integrity of the API surface. Libraries that "just work" even for idiots like me.

  • @tekneinINC
    @tekneinINC Рік тому +2

    As a relative newcomer to Rust, I’ve been working through a backlog of your videos in my spare time, and I think this might have been one of the most interesting so far!
    Many of these points brought up here make a lot of sense just looking at it from a memory layout perspective.
    But in particular those ‘explain it like I’m 5’ descriptions of the casting syntax really helped me… I might have to rewatch those chunks a couple more times after this 😂.

  • @aqua3418
    @aqua3418 2 роки тому +6

    Learned a lot from this! Finally somebody who explains things in detail!

  • @FTropper
    @FTropper 3 роки тому +3

    Your channel is actually the best "How Rust works (under the hood)" Information source I could find. Keep it up. :-)

  • @guillaumeaugustoni4814
    @guillaumeaugustoni4814 3 роки тому +15

    This is really helpful. Thank you for making these.

  • @random6434
    @random6434 3 роки тому +6

    This was extremely informative, thank you. Towards the end some of this stuff feels started to feel a lot like security with the crate user feeling more like a hacker of some kind, determined to violate your invariants through any means.

  •  3 роки тому +4

    I don't even write a lot of Rust nowadays but this is super interesting and thought-provoking for me.

  • @s1ck23
    @s1ck23 3 роки тому +7

    Thanks for this new series and the usual high quality content. I’d love to see more unsafe topics in the future :)

  • @mokuzzai8906
    @mokuzzai8906 3 роки тому +3

    I'm pretty sure transmute between Wrapper and Wrapper is always unsound if T != U and Wrapper is #[repr(Rust)] regardless if it's safe to transmute between T and U with the exception of ZSTs.
    or to put it simply:
    #[repr(transparent)]
    struct A { a: bool }
    #[repr(transparent)]
    struct B { b: u8 }
    // implicit #[repr(Rust)]
    struct Wrapper(T);
    // implicit #[repr(Rust)]
    struct RefWrapper(..); // DANGER
    }
    even for
    struct Foo {
    _0: PhantomData,
    vl: u8,
    }
    unsafe {
    transmute::(..); // unsound
    }
    struct C;
    struct D;
    unsafe {
    transmute::(..) // sound but may lead to unsafety if `D` has invariants to uphold.
    }

  • @sagnikbhattacharya1202
    @sagnikbhattacharya1202 3 роки тому +22

    This is super interesting, please give us more :D

  • @elahn_i
    @elahn_i 3 роки тому

    This type of video helps stuff soak into my brain. Thanks, Jon! I look forward to more Unsafe Chronicles.

  • @MostafaSaad
    @MostafaSaad 3 роки тому +2

    I'm not even a rust developer but i really enjoy your videos, and every time after the video i feel like i should try rust :D

  • @dantenotavailable
    @dantenotavailable 3 роки тому +2

    0:20 ("feel like i should be wearing a scary outfit") - Crash helmet... broken crash helmet even better.
    12:00 (optimisations due to noalias) - Based on my understanding, another example would be if you had an if block conditioned on left.real == right.real it might be smart enough to say "these are pointers that can't be aliased therefore left.real can never equal right.real therefore i can just delete this entire if block". There was an article series on the LLVM project blog from 2011 called "What every c programmer should know about undefined behaviour" with surprising examples like that.
    1:14:55 - I feel this relates to the Actix unsafe code kerfuffle that resulted in the change in project ownership. As far as i could tell, no-one was saying there was an issue given the code as it was, just that because the code in question could result in multiple mutable references some one developing on Actix safe internals could write something that relies on the single mutable reference guarantee of Rust and then there would be issues.
    This VOD was super interesting. I kind of feel my takeaway is that i classify unsafe much the same as inline asm in C. There are moments where that is absolutely the right thing to do but they should be the exception not the rule and if you can hide it behind a facade that means that other people generally don't have to think about it then so much the better..

    • @nordgaren2358
      @nordgaren2358 6 місяців тому

      I don't think it like inline ASM, where it's the exception not the rule. Unsafe Rust should only be used when it's the only option available.
      unsafe means that you are writing safe code that the borrow checker cannot check for safety. Unsafe means that you, the programmer, are going to uphold Rusts invariants. It really should be called "unchecked".
      Inline asm just means you are writing assembly directly. There are no optimizations applied to in line assembly. There are still optimizations that can be applied in unsafe blocks. Just not to unsafe actions, themselves. I'm not even sure if that is entirely true. There are still probably some optimizations on unsafe actions .
      Unsafe code has a specific place. There are things you cannot do with safe Rust, because the compiler cannot possibly check and guarantee it's safety. There are safe programs that Rust compiler won't let you write in safe code, because of this.
      I would say that it being the only option is sometimes true with inline ASM, but, the majority of inline ASM I have seen is optimization. Sure, that specific optimization requires you drop down to ASM, but the actual task is possible without inline ASM.
      Unsafe Rust really shouldn't be placed so distantly. I think it should be understood by Rust developers, because at it's core, it's dealing with Rusts Invariants, which is a very important thing to understand.
      Do you need to understand assembly for the architecture you are programming for in C? Probably not. It's good to know, but, it's not going to have as much of an effect on your programming in C. Misunderstanding, or ignorance of Rusts Invariants can lead to tough times when writing even safe Rust. And at worst you write something on safe Rust that is actually a bug that the compiler didn't check, and you don't know it's a bug, because you didn't know about Rusts validity invariants, and you were using -1 as a bool value somehow, and the compiler made an assumption about a bool field in a struct and it caused UB due to your structure having an incorrect bit pattern. This is just an example I made up, but, it's not too far fetched to say that something like it could happen. I think Jon had a similar example in the video.

  • @swapode
    @swapode 2 роки тому +1

    The private types sound like a specialized version of encoding state in the type system, which allows for some neat API design with compile time sanity checks.
    When you have a struct Something you can then impl Something and only add functionality applicable to that specific state, meaning that the compiler will alert you when you try to call a function that's not applicable, not only making the developer's life easier but also removing the necessity for sanity checks at runtime.
    I'm a bit out of my depth here, but maybe there's reason to combine these things. Stateful API design could probably benefit (slightly) from a guarantee that you can safely cast from one state to the other, maybe there's even reason for an explicit language feature, since you could safely do a state change in place.
    impl Something {
    fn changes_state(self) -> Something { ... } // would be kind of nice if this could be Self and happened in place, which shouldn't be a problem if both states are the same (zero) size
    }

  • @RitobanRoyChowdhury
    @RitobanRoyChowdhury 3 роки тому +3

    Super interesting! Would love to see more of these. Also, I wasn't able to stay for the whole stream, but I was there for the beginning, and I wanted to say that a Wheel of Time poster for the background would be _awesome_ :D.
    Though this raised some questions about repr(transparent) for me though, which I wrote up here: www.reddit.com/r/rust/comments/kbyb6z/what_is_the_purpose_of_reprtransparent/?.

  • @DucBanal
    @DucBanal 3 роки тому

    Really useful, would watch again !

  • @ricardopieper11
    @ricardopieper11 3 роки тому +2

    The thumbnail matches the content.... crazy stuff. I was watching the stream live and programming at the same time, and you caught me in the act of doing a mem::transmute when you said "don't do it" rofl. It's a toy project, so the transmute is still there :)
    (toy project is a python interpreter, the transmute is to willfully ignore mutability rules and let things go wild, because that's how python likes it... it is actually cleaner than adding a ton of refcells and borrows. It is never going to production though.)

    • @jonhoo
      @jonhoo  3 роки тому +12

      "it is never going to production" - famous last words.

  • @jeffg4686
    @jeffg4686 3 роки тому +1

    Jon, since rust uses llvm currently, I was curious if it's hindering it in taking advantage of a lot of rust specific benefits. One thing I was thinking about yesterday is whether or not there could be significant improvements in cpu cache invalidation if all "non mut" cache data is stored together (same cache line). I just got around to watching Scott Meyers 2014 video on cache invalidation the other day and got me thinking about rust and that perhaps this could be a major benefit. I doubt llvm currently supports such design as other languages don't restrict in same way (well, not that I understand anyways). I wonder if they've figured out a way to do this with llvm (frankly, I know very little about details of llvm, only what it does).

    • @jonhoo
      @jonhoo  3 роки тому +2

      LLVM has a lot of annotations that Rust can emit to produce more optimized code. `noalias` is one such example. But yes, a more Rust-specific backend may be able to do even better. That's a _lot_ more work though. I know there's been a lot of work on cranelift, though that too aims to be a re-usable compiler backend to support multiple languages, and may not go _quite_ as far as you're suggesting.

    • @jeffg4686
      @jeffg4686 3 роки тому +1

      @@jonhoo - I bet it could bin a real big win for rust someday. keeping muts and non-muts in separate cache lines. I don't know cpu APIs at all, so don't even know if cpu allows you to designate such a thing, or if it just does it all itself without giving you any control. Loved scott's talk. was very eye opening.

  • @mrvectorhc7348
    @mrvectorhc7348 3 роки тому

    Very interesting! I would like to see more

  • @xoomayose
    @xoomayose 3 роки тому

    Very interesting video. Thanks!
    For 1:27:31 : Instead of Arc, did you consider epoch-based garbage collectors (like crossbeam::epoch) to detect when is it safe to drop a value?

    • @jonhoo
      @jonhoo  3 роки тому +1

      That could be a possibility too, though the observation here is basically that it shouldn't be necessary (assuming the operations are deterministic), since we always know that we should drop the second time around. That said, left-right doesn't mandate any of these approaches, so you totally could use something like crossbeam-epoch with it!

  • @drcx3
    @drcx3 3 роки тому

    As always, great content!

  • @jeffg4686
    @jeffg4686 3 роки тому +1

    Just wanted to throw out a suggestion for a future video. This is in particular related to those like me that are more used to GC languages. Use cases for the functionality in "std::mem". You know, devs with non-native background like me just don't tend to do these direct mem operations in our code (if even allowed in language of choice), so when I see all these mem copy and mem replace operations, I often don't quite understand what's the benefit versus setting variable values. Is there a step skipped (like a more atomic operation and faster?) I just think there's a lot in that module that many jumping ship to Rust from GC languages get tripped up on, and I think there are more of us out there than you think. Certainly not suggesting that you make such a video just for my request - like for me, lol. But if you are looking for appealing topics, I think this is one.

    • @jonhoo
      @jonhoo  3 роки тому +2

      Ah, so, the operations in `mem` are more there to allow you to do things the borrow checker would otherwise stop you from doing, or that you would need temporaries for. For example, `mem::replace` let's you set a variable and also extract its own value given only a &mut T, which would be tricky to do otherwise.

    • @jeffg4686
      @jeffg4686 3 роки тому +1

      @@jonhoo - thanks, that makes sense

  • @Dorumin
    @Dorumin 3 роки тому

    I guess I'm silly for thinking Rc would've been appropriate for a structure that would only ever have two aliases haha

    • @jonhoo
      @jonhoo  3 роки тому +2

      I talk about this a little more towards the end, though specifically about Arc. Rc wouldn't work well since you then couldn't share the copies across thread boundaries.

    • @Dorumin
      @Dorumin 3 роки тому

      @@jonhoo Ah yes, you're right, it'd be a bit silly to have a non-Send+Sync type in a concurrency primitive :P
      I had left the comment about 40min into the video, I usually wait until the end for comments, but I also don't often watch full streams

  • @MaxLambrecht
    @MaxLambrecht 3 роки тому

    Awesome stuff!

  • @Lexikon00
    @Lexikon00 3 роки тому

    Hey Jon, can you make a video about variance in combination with unsafe code?

  • @eric23482
    @eric23482 Рік тому

    Is there a special reason to make a raw pointer using Box::into_raw instead of using std::alloc::alloc?

  • @ilyakamenshchikov9056
    @ilyakamenshchikov9056 3 роки тому

    So how did you find out about this subtle requirement of Box not being aliased if it's undocumented? From previous experience? hinted by viewers of the prev. stream?

    • @jonhoo
      @jonhoo  3 роки тому +1

      Someone filed an issue on the repository after watching the previous stream. Otherwise I definitely wouldn't have caught it. It might be that once the behavior is codified it could have been caught by miri though.

    • @ilyakamenshchikov9056
      @ilyakamenshchikov9056 3 роки тому +2

      @@jonhoo one more advantage to working publicly! Thanks

  • @isaactfa
    @isaactfa 3 роки тому

    Hi Jon, great video as always. I was wondering, would you have any interest in making a video about pointers in Rust? Specifically, I'd love to learn about the differences between *const T, *mut T, Unique, and NonNull, especially with regards to variance and safety. Because, while I understand what variance is, I don't understand what it actually means. Thanks!

    • @jonhoo
      @jonhoo  3 роки тому +1

      That could make a really interesting Crust of Rust I think. The big question is how to make real world examples of these. I've found that it's much easier to make a topic understandable if you have a concrete and real use case to work through. Do you have any in mind?

    • @isaactfa
      @isaactfa 3 роки тому

      @@jonhoo The only place I've come across it is when implementing data structures that allocate their own memory. Should your custom Vec implementation hold a raw pointer, a Unique, or a NonNull to its backing array and why? Otherwise, I'm not really sure. Thanks!

    • @ZippyMagician
      @ZippyMagician 3 роки тому +1

      @@isaactfa if you’re still interested, the standard library uses Unique and PhantomData in RawVec, the internal for Vec. I believe Unique makes some guarantees and provides a wrapper around *mut T to make it more efficient/safer, but don’t quote me on that or anything

  • @KohuGaly
    @KohuGaly 3 роки тому +1

    What I'm getting from this is that Unsafe rust is "basically just C" except the rest of rust isn't. The rust compiler basically assumes that unsafe{/*your code*/} contains pointer-casting magic that magically complies with all of Rust's safety protocols. Except is not entirely clear what the safety protocols actually are.

    • @Anonymouspock
      @Anonymouspock 3 роки тому +1

      It's kind of a side effect of the relationship between rust and llvm. More specifically, the rust folks are mostly not the llvm developers, and it's an extremely large system, so we don't really know what exactly is sound in unsafe code until we figure it out. The semantics of llvm are primarily documented in its source code, which is one of the main reasons we don't understand it.

    • @jonhoo
      @jonhoo  3 роки тому +1

      Well, sort of. Specifically, what Rust assumes is that your unsafe code does not violate any of Rust's safety invariants. When you're writing unsafe code, you're given access to tools (like transmutes or raw pointer dereferences) that _can_ violate invariants, and it's your job to ensure that you _don't_ violate them.

    • @panstromek
      @panstromek 3 роки тому +2

      Yes. Unsafe rust is fine, except for when it interacts with safe code, then it becomes more unsafe then C. There's all sorts of subtleties that can break it. I sort of wish for some better middle ground, something like more variants of unsafe based on which rules you need to break. The Neglect* api from safe_transmute RFC is getting close to that.
      Right now unsafe let's you do a lot of stuff you usually don't need and it's dangerous. I recently painted myself into a corner with this, I realized I broke aliasing rules for mutable references in really complicated way when I was porting some C code that used global variables (so everything was unsafe). That's a situation where it's safer to make everything unsafe and use raw pointers for everything because they don't have a burden of these strong rules.

  •  3 роки тому +3

    Your explanation of MaybeUninit doesn't sound accurate to me. MaybeUninit isn't just about validity but also about initialization.
    For example MaybeUninit::uninit(). assume_init() is UB despite u8 allowing any bit pattern. The reason is u8 wasn't written to. You can call assume_init () only if you've written the value AND the value has correct bit pattern.
    The name of the struct is actually good. 🙂

    • @jonhoo
      @jonhoo  3 роки тому

      Yup, that is true, but that's not an aspect of MaybeUninit that we are using in this case, so I didn't bring it up :)

  • @hugosales8102
    @hugosales8102 3 роки тому

    Interesting video :) From a C++ user's perspective, PhantomData seems like a very contrived way of doing things. Also, this is getting dangerously close to template metaprogramming ;)

    • @jonhoo
      @jonhoo  3 роки тому +4

      PhantomData is actually closely related to "ghost state" in more formal programming languages. Essentially, it is a way to carry around compile-time-only information, which in turn allows you leverage the type system for zero-runtime-cost safety checking.

    • @hugosales8102
      @hugosales8102 3 роки тому +1

      No, I think I understood the concept, it's just that in C++ you'd use a `using` declaration. Something like `template struct foo { using do_drop = D; T bar; };` and that has exactly the same size as T. do_drop doesn't become a special member of the class, it's a completely different thing (not sure what the term is)

  • @nordgaren2358
    @nordgaren2358 6 місяців тому

    Why using manually drop? MaybeUninit says you are responsible for dropping the value, yourself, so, it already is manually drop.

  • @wojciechrazik
    @wojciechrazik 3 роки тому

    Great content! So, MaybeUninit It's sort of C++ volatile?

  • @firexgodx980
    @firexgodx980 3 роки тому +2

    Why don't you use an IDE? If you did, you'd be able to set a break point and show us the complete state of the program, including the types and values of each of your variables. I think that would go a long way in helping people understand.

    • @jonhoo
      @jonhoo  3 роки тому +4

      vim gives me all those things too through rust-analyzer. It just didn't in this case because the code I wrote was not actually valid - I wrote Rust-like pseudocode that illustrates the problem without specifically trying to write code that would compile (and thus that the editor would be able to give me type hints for).

    • @firexgodx980
      @firexgodx980 3 роки тому +2

      @@jonhoo correct me if I'm wrong, but i don't think rust analyzer is a debugger. Type hints are cool and all, but that's different from a debugger which shows you the exact state of the program.

    • @jonhoo
      @jonhoo  3 роки тому +1

      Correct, but vim also integrates nicely with gdb if you like debuggers, and gdb works just fine with Rust :)

    • @firexgodx980
      @firexgodx980 3 роки тому

      @@jonhoo CLion uses gdb to debug rust and provides a really nice user interface for it. Being able to set breaks points and debug with just a click is so nice

    • @jonhoo
      @jonhoo  3 роки тому +11

      Each to their own. I personally do not like having to go to the mouse any more than I absolutely need to, and prefer to have the terminal close at hand. Not to mention that no program has vim bindings as good as vim's!

  • @meowsqueak
    @meowsqueak 11 місяців тому

    At the end you say to avoid writing unsafe code and use something someone else has written instead. So how do we know that someone else’s implementation is actually safe? How do we know they did a better job than what we could do? How do we know they even understood the concerns you allude to in this video? After all, nobody needs a license to write unsafe code and publish a crate containing it.
    I feel like deferring this expertise to “other people” is a cop-out and will eventually result in a few elites and the rest of us just blindly trusting them to get it right. No, we should be capable of auditing all of this unsafe code, and to do that we need properly and completely specified rules, and tools to help enforce them. Unit tests that happen to pass are not enough!

  • @meowsqueak
    @meowsqueak 11 місяців тому

    Seems like a classic engineering trade-off at play: the cost for having “safe” code is that writing anything under the hood requires a whole bunch of concerns and rules to always play properly with the “safe” world, which ends up being more complex than just writing C, where only specific rules need to be met in any situation. Thus, I found this video interesting but it certainly has not made writing unsafe code feel any more comfortable to me. I’d just rather not.