Note that at 7:47 the same optimization (0 = None) means that the NonZeroU8::new function is actually a no-op even though it returns Option and contains a match. Using it won't cause a branch until/unless you check the result yourself, for example by unwraping it. If you want a NonZeroU8 the effect is largely the same, but of you're passing to an API that wants Option regardless, then no extra branch is incurred. (This doesn't necessarily apply without optimizations turned on, of course.)
Interesting. But what's the point of having Option in this case ? Can't you just keep a u8 and check if it's not 0 when you need to ? Better readability and type safety I guess because then you can never accidentally devide by 0 if you forgot to check (but you can never forget to check Option for None). Is that all ?
@@tombenham9458 the NonZero types are often used for optimizations. If your type contains one anyehere in it, Option will have the same shape as T. The other use is semantics. Sure you "could just check that its not zero" when you use it, but that means putting those checks all over your code and more possibility to forget them. If you use this type it pushes the check to one place in the code (the factory new function) and everywhere else the invariant is guaranteed by the type system without having to put any more checks. So it's both harder to get wrong (correctness) and more performant (less checks needed). When working in Rust I find I pretty commonly create these sorts of restricted "Semantic Types" in places where in other languages I might just use a plain int or string.
@@tombenham9458 @tombenham9458 You can also only check it once and then pass around bare NonZeroU8. The compiler would not let you get NonZeroU8 without checking, so in all the other places of your code you can be sure you don't need any checks. With a u8 you would have to remember at which exact points of your code it has already been checked and at which it wasn't.
@tombenham9458 With Option, you don't necessarily have to check for None. That can be done via the many helper functions that Option has. 'map', for instance, becomes a no-op if the value is None. You have 'is_some_and' function too for doing some boolean check on the value, which defaults to false with None. Option is such a useful type.
@@tombenham9458 encapsulating this in the type system means that even when you pass this object to some other library or get a value passed by another library, you will not need to explicitly check for it to be non-zero. Furthermore, the compiler cannot know if a u8 is nonzero just by looking at it. With a proper type, the compiler can know this however. Type-safety in this case means to represent possible values by your type system, meaning if you pass the correct type, it is valid by design. For a randomly picked u8 value, we do not know if this is nonzero. Thus representing a NonZeroU8 value as u8 is not type-safe.
The realization 'in a factory, you can do a bunch of logic, then construct the object with all its initial values in one fell swoop, so there's never an invalid object' is really clever.
I'm not a C++ dev, but I wrote Java for about 10 years. Java provides you _some_ additional guarantees when it comes to constructing an object, but there are still easy ways to make mistakes and access uninitialized fields. I _never_ understood what the point of constructors (as seen in Java or C++) is, instead of having a way to construct an object with all field values in a single step. You can't name constructors and have to differentiate them by signatures instead, constructors are complicated to make safe (I think Swift has managed to do so), you can't return a type different from the class (e.g. wrapped in an Optional). It just seems like a bad idea that should be avoided in language design.
I don't get it, how is that clever? The invariants will be violated while you are doing the bunch of logic. This is just kicking the problem. The real solution is stop kicking the can and accept that not all constructors need to establish invariants, especially the default constructor. Partially Formed values are ok, and efficient.
@@AlfredoCorrea I don't understand. If an object has an associated invariant and the factory method creates that invariant _and then_ constructs the object, how is the object's invariant ever violated? Either the object exists with the invariant or it doesn't exist. Maybe you're looking at this from a different angle, but in my experience, having invariants associated with types/objects is really nice. Making it impossible/harder to instantiate objects with broken invariants helps a lot. I write a lot of security-critical stuff.
This reminds me of a concept I picked up in one of @NoBoilerplate's videos, which I now repeat like mantra when coding in Rust: "Make invalid state unrepresentable"
You know what that does to you, right? It makes your software useless if you miss your user's actual user requirements. It's just trading one bug opportunity against another.
@@lepidoptera9337 No. It makes all invalid states unreachable, meaning if the not mentioned use case lies within the non invalid cases then you can be absolutely sure that it will work without changes. If it lies within the explicit invalid then it will not work just as it shouldn't. If a new use case lies within the before agreed on "invalid" then it should not be able to exist, without explicit change.
@@onebacon_ It usually doesn't. People use e.g. negative entries in fields that usually expect positive numbers as flags all the time. Or let's say you make an animal class and you only allow "dog, cat, canary", then only small animal vets can use it. A rural vet who treats "horse, cow, sheep, goat" is done for. All of you guys have been over-trained by architects with OCD in this belief that exercising absolute control over the user makes good software. That's total BS. What makes good software is adaptability and resilience. If somebody can compromise your Windows PC because they are allowed to put an arbitrary string into "animal type" rather then selecting from a pulldown-menu, then you didn't do your job correctly. You just showed the entire world that you are poorly educated engineers.
@@lepidoptera9337having invalid states doesn't mean that the software cannot meet the requirements, just that we have less time representing the technical things and more time making the requirements
“You can grep your files for unsafe and find where you might have made a mistake” is the single best way of explaining (to me) why the “unsafe” keyword exists and why the compiler is more strict otherwise. That made so many things just *click* in my mind - great video!!
the thing is that thats quite irrelevant in larger code bases. when using rust to write my game engine with vulkan for example the invariants to control for are so large and expansive that everything has to unsafe anyway. having an api that is provably safe when your memory isnt cache coherent and asynchronously in flight is virtually impossible. also people dont actually give a shit about it. every other crate i open from crates io leaks undefined behavior into safe rust. and that one compromise it all that it takes to compromise all of your rust codes safety.
@@gideonunger7284I just don't know enough about GPU/Vulkan programming to meaningfully comment on the first part, but I just thought, isn't the point that's (kinda) made in this video that you need to design your API very carefully, creating very small abstractions and such components, in order to be able to minimise the invariant space around unsafe code? Ie., it's a very big effort, but not necessarily impossible? I don't know if it's _literally_ every other crate (maybe you just have very different usage patterns than me, lol), but yeah, there is a problem of a lot of crates using unsafe while not properly analysing it. It would be great if there was a common expected standard for analysis of unsafe, such that it'd be easier to see if the effort had been made, at the very least. To aid in that, but I don't know if this is feasible, it would be great if the compiler had an "invariant/UB analyser" that could look into and at least recommend invariants to check for and modes of UB that might arise from a particular use of unsafe.
@@MarteenHobbu yes it does in certain cases. i write a lot of rust code and a lot of safe abstractions over unsafe code. and some abstractions, with how rust works, are per definition impossible. also the set of safe code as considered by rust is not the same as the set of safe code. so there is a lot of "correct and safe" code that you opt out by using rust. and some of that code you opt out of is very performant. which if you work on games for example can be a hard requirement for what you are doing so you will have to write it in unsafe rust.
8:30 note that the “new” and “unchecked_new” functions need to be marked as ‘pub’ in order for them to be exposed for outside use - and that the member(s) of the structure must not be marked pub to prevent people from constructing a NonZeroU8 on their own (and from mutating the member(s)).
I think the `pub` keyword is omitted for the purposes of keeping the example code in the video concise and focused on what is being demonstrated. Similarly, in real code, some of the functions should ideally be marked as `const`, but that isn't relevant to the video topic. Anyway, still a useful reminder.
@@shrootskyi815 thanks for your reply. The main point that I was trying to convey is that if one has a struct with invariants, they should take steps to ensure that the struct only gets created using their designated 'factory functions' and its fields won't be manipulated directly. This typically means putting the struct in a module and declaring the factory functions and other potentially mutating functions as 'pub'.
That moment where the size of the struct changed when you reordered its fields, that was a great setup :) I'd heard the "field order trumps initializer order" rule before, but I hadn't thought about how field order has other consequences too.
incredible video. editing, narration, explanation, the code itself - everything was fantastic and so concise. as a rust programmer who’s never touched c++ i still learned a lot about programming in general. please keep it up.
9:00 "first of all exceptions are incredibly expensive for what's supposed to be a very simple and low-level type " This is only true if exceptions are used as a regular control flow mechanism (I think because of the stack unwinding but I'm not actually familiar). If the exception is not triggered, the cost is insignificant. Everything else you said about them is true though. Might've been worth mentioning that panic! exists for some niche low level uses, and that Result can't replace exceptions in every instance, but in practice for the vast majority high level code it can. Really really high quality video, I am astounded by the amount of misinformation that exists on C++, especially from other young people.
Yes, exceptions - like much of C++'s lunacy - are actually in the language for a good reason. Also there's a great CPPCon talk "expect the expected" that talks about many types of error handling and their issues/strengths. Exceptions in particular provide a lot of strengths. You can do centralised error handling, transport lots of information about the error, and they're very performant for situations where the "unhappy case" is rare - as you would hope it is.
Exceptions should only be used in exceptional circumstances. Even if they were free from a performance standpoint, using them for flow control just makes a maintainability mess.
Unfortunately there's no other (sane) way to signal failure from a constructor, which is why they're mentioned in this video. And for something like NonZeroU8, having the failure case be expensive really reduces the number of places that type can be used (compared to the rust version which is zero-cost)
Another observation: It's refreshing that your videos have no background audio. Pure calm voice is so much better to deliver concise factual information than noisy music. Wish that more educators would rediscover "less is more".
I believe clangd has warnings for every mistake inside a constructor that you mentioned. It will tell you if you're reading from a member that hasn't been initialized yet, it will tell you that you aren't calling an overridden virtual function, etc. Obviously it would be better to disallow those mistakes in the language standard instead of relying on warnings, but good tooling helps a lot.
Reading about the Superconducting Super Elider brings up another fun thing about Rust: moves. C++ has all sorts of rules about when moves can and can't be elided, specifically because there's API surface for arbitrary types to be told when they are being moved - move constructors, specifically. In Rust, anything can move anything as many or as few times as it likes, and if you don't like that, you have to stuff it behind a smart pointer (like Pin). This is mainly notable because it results in one of the biggest problems with Rust/C++ interop: everything has to be behind smart pointers. If you put a C++ type on a Rust stack frame, Rust will move it around without calling the move constructor, which is hilariously unsound.
10:38 A potential way to “mark” create_unchecked as unsafe (although I personally think that “grep unsafe” is analogous to “grep unchecked”) is to use “the passkey idiom” as a way of keeping more control of which code might call specific member functions.
Friends over at the pony language call these passkeys "object capabilities"; you create a type such that only code you trust may receive and instance of that type, and you then use this trust to gate away features of your library
The only problem with Rust's style is that in-place initialization on the heap isn't guaranteed. The code: let data = Box::new([0u8; 10_000_000]); is supposed to create 10 megabytes of data on the heap but might overflow the stack in the process. I tested this and it actually crashes in debug mode but works in release mode, pretty gross. I think solutions to guarantee a stack overflow won't happen are being worked on but it's not trivial. The way constructors in C++ work, you can just allocate the space on the heap and then call the constructor on the pointer.
@@kaga2922actually it's now been replaced by an arcane rustc attribute called #[rustc_box]. this problem does need fixing but its a much smaller problem than the tarpit C++ finds itself in imo - essentially we need a more ergonomic and safe equivalent of out pointers that doesnt involve using MaybeUninit which is very easy to misuse, whereas some sort of &out reference that *must* be fully initialized by the time the function returns would be much better
Absolutely true that placement new is badly needed in stable Rust. My preference would just be stronger guarantees about move elision, rather than a special attribute or `box` syntax or `&out` parameter type of thing. Since Rust controls its own calling convention, it would be amazing if `Foo::bar([0u8; 10_000_000])` could just initialize the 'prvalue' array directly wherever `Foo::bar` moves it to. In a perfect world this could be done without introducing anything like the incredible amount of complexity C++ has around value categories.
Yeah, that way you can have multiple constructors that init to different values. If NSDMI took precedence then you couldn't use the initializer list to has per constructor values assigned in the initializer list
10:28 "Because C++ doesn't have a builtin notion of safety" It does, you just dismissed it at 8:57. With exceptions enabled you can make an "unsafe" factory function by marking it noexcept. In case of invalid input the constructor will throw an exception, terminating your program. You (almost) don't pay for exceptions, don't have to handle them for such a scenario, and it guarantees the invariant. It is a bit 'hacky' and I acknowledge that it is less 'neat' than the rust way, but it's possible. And I don't think your arguments against exceptions hold much ground. 1. Yes, fully exceptionless C++ is faster. But firstly, exceptions incur most of their cost when they're actually thrown and handled. For most use cases the overhead for exceptions being enabled is negligible. Secondly, if we're doing a feature comparison here, safe Rust is also slower (potentially also really slow and expensive) compared to unsafe Rust. I'd go as far as to say that these arguments have the same weight when comparing language performance, cancelling each other out, although it's too much work to prove it properly. 2. "Lots of codebases aren't prepared to handle them". Well, sounds like they should start using the language like it's supposed to be used and actually learn the core safety features. I agree with "invisible code path" argument though, it is the only sound one and actually a really strong one.
It would be reasonable to use the success or failure of Rust's NonZeroU8::new as actual regular control flow in your code (say you have an unknown u8 and it's not necessarily an error if it's 0, you just want to take a different code path), whereas it would not be reasonable or wise to do so with a try/catch and a throwing constructor in C++. That's the insight at the heart of my argument that exceptions are too expensive for this simple low-level task. The noexcept trick actually sounds more like safe Rust than unsafe: it guarantees a crash (safe) instead of forging ahead with UB (unsafe). As for code bases that aren't prepared to handle exceptions, sure, everyone should just git gud, but the fact that the language doesn't force you contend with them (the invisible code path argument) means this is always going to be an uphill battle. Structuring your code to put failure in the type system with std::optional (or std::expected in the future) gets the type system on your side about requiring you to think about error cases.
A few notes: 1. I'm a bit disappointed, that you didn't go into templates and meta-programming. Especially the NonZeroU8 case is a perfect example where templates can emulate a more complex compile time type system. (even though the syntax makes you want to hurt someone) 2. The private struct solution is a pattern many code bases already use especially for dynamically linked libraries to isolate the struct layout from the library user called "private implementation" (PIMPL) but it always annoyed me how you either need a pointer indirection or get rid of the C++ type system and resort to basic C-style OOP to make it work without it. 3. A CS teacher once told me "any design pattern is just a work around for the deficiencies of the language of choice". This video perfectly illustrates that statement! 4. Making unchecked_new "safer" in C++ is possible with "friend classes" but that is a giant mess in and of itself.
It's been quite a while since I've worked with C++, but that packing of private implementation made me think of the PImpl pattern. Not for the same application reasons, but certainly still to improve safety/stability, and also similarly with a cost from the indirection that cannot be optimized away.
It definitely has some similarities to Pimpl, and you could definitely set up your class architecture get the implementation hiding benefits of Pimpl and the initialization benefits of CString::M in one swoop if you chose to. I do want to point out that in the form I showed in the video, CString::M should actually be utterly transparent to the compiler; I doubt there would be any indirection overhead at all, even in unoptimized builds.
I have actually grown to like the nested struct approach for some data types I write. - If you use this for private members it does not affect the public API. - It lets you manipulate all the object's data at once, as the video says. If you add new members in the future, there are less changes that need to be done, to the class implementation. - All mentions of members are clearly visually distinguished from local variables. (I like to use Self/self instead of M/m, which makes it intuitively similar to languages like Python or Rust) - The data fields of the class are visually separated from methods in the class definition. You can see at a glance what data fields the object has. I don't think one should use this approach for all, class definitions, but for grouping together private data members of more complex obejcts this is great.
I really enjoyed this video, I've used a bunch of languages with constructors (C++, Java, C#) and have been using Rust for a little while now. The other day, I had a little C# to write, and when working with constructors I felt a familiar tiny little pang of dread again, and I think you just nailed down what I was feeling. The metaphor that a constructor is a function that returns an initialised object isn't quite true, and it's in the details where some subtle ambiguities and bugs often lie. Rust applies the metaphor much more literally, and I feel a lot more confident in what I'm doing. Funnily enough, this is one of the things I've appreciated the most with Rust. There are much fewer "things" to consider in the language. It's closer to C in many ways in that regard.
The solution you propose at the end (using aggregate data structures to ensure type validity) for the C++ side reminded me of a CppNorth talk from last year, "Writing C++ to Be Read". It touches on the topic of constructor initialization and how aggregate initialization provides advantages for quite a few cases.
It just amazes me time and time again, how the great choices that were made at rusts core make it such a practical and cohesive language to work with and reason about and have a butterfly-effect-like influence on so many parts in the language itself Another great example for that is the std::mem::drop implementation, made me chuckle when I found out about it :D
My favourite part is traits if you're working with a library that works with a specific struct, you often end up awkwardly calling a custom function that takes in the struct and spits some value or does some crap several times across your codebase. You can instead just implement a trait for the struct to make your life easier and code cleaner.
Here’s another line of code that’s equivalent to “std::mem::drop(x)” - “x;”. Just the variable and a semicolon. Because everything is a statement, “x” means you’re trying to pass x, and adding a semicolon means you aren’t returning anything. So you’re just passing x into the void, and the deconstructor is immediately called.
This is super helpful for actually learning "the right subset" of C++, thank you! I can't believe a Rust channel is what I needed to learn better Cpp but here we are
This video is so nice! It just resonates so much with struggle I had using C++ at first (like finding out that order of class members affects init order or that factory functions make much more sense in general) and constructors problem so well covered here.
Most languages benefit this, particularly when incorporating the Result or Option types. Make invalid states unrepresentable. It's only a problem when the frameworks being used fail to support it effectively, and force you into the constructor approach
much legacy code are using old style C++ which forces you to use the old semantics for much of the stuff. When you work on a system that has code from the previous millennium, then you would know. And most of the time, it is not an option to rewrite all the code.
@@oysteinsoreide4323 if it ain't broke don't fix it - there's nothing inherently wrong with a constructor based approach when it's working properly. But new code in such a legacy code base certainly should consider the more robust approach - the two styles can coexist just fine
@@orterves Yes, in some instances, then using a better invariant on objects is good. But I'm not sure if I would use the factory approach. I would rather have an exception in the constructor, and ensure that the exception is handled. It will be much less code in the classes, and the validity of objects will be equally safe. The most important thing is to have good invariants, and that is often lacking in old programming style, and that is much worse. And something that is much more difficult to just write yourself out of when the code is complex. So there are much code where there is no clear invariant. So in that case how thigns are constructed is far less important. You would need to fix the invariants first.
Another great video! I'm always astounded when watching your videos how often you state an idea that I say all the time to my teammates. Avoiding partially initialized objects, making constructor bodies empty, Being wary of how the spaces between lines can evolve even if the current code looks good. I need to keep directing my teammates to your videos
C++ classes are a mess in general. Like why the fuck do I have to write a custom destructor, copy-constructor, and a copy assignment operator just to be able to properly handle pointers. At least the only real footgun in Rust (in this context) is the `Drop` trait implementation.
@@oserodal2702 and unless you're managing custom resources (raw files, raw memory allocations, etc) or want drop for your type to have side-effects, you mostly don't need to implement drop either!
I always called factories as functions that returned new instances of a set of classes based on their arguments or other rules. The classic example is an image factory that can return a jpeg image class or a png image class based on the image file type. Callers need not care about the specialization and have a single method for construction
The exception in the constructor would solve everything also. But as you said, it would mandate a try - catch somewhere to avoid issues with it. optional makes it possible to solve it in modern c++.
You could get your constructors back once you aggregated all of your fields. Then the constructor could simply be this: 'CString() : m(CString::create()) {}' or this 'CString(const char* in) : m(CString::create(in)) {}'. This is actually very efficient thanks to NRVO. You also might want to replace the construct() methods with lambdas to have all the code right there.
8:30 "Which is something you can grep your code for" is slightly complicated by the fact that the actual mistake can easily happen outside the unsafe block. We can see that in this very example if you pass a variable to new_unchecked() and somewhere else in the code can accidentally set that variable to 0 before that call. It is still a very good idea to have in the language. It just doesn't make it as easy as it might seem at first glance.
@@anon8510 Yes, it does. But that doesn't mean that all potentially unsafe errors happen there. As in this very example: let x=0; let y= unsafe{NonZeroU8::new_unchecked(x)}; The actual mistake happens on the line above the unsafe block, where x was supposed to be set to 1, not in the unsafe block itself. (It is not difficult to imagine that figuring out the value of x could be much more complicated, for instance dependent on input that isn't sanitised when it should be, or set with some crazy math expression that's not quite correctly coded.) Which is to say, you can't JUST grep for and look at the unsafe blocks. You have to inspect all the surrounding code as well.
The point is that you can always be sure that the breakage of the invariant happened in an unsafe block. But the thing that breaks the invariant might have initialized outside of it. Safe/unsafe rust does not promise anything else and this is already a lot better than having no safe subset.
Well I gotta keep you on your toes, don't I? That's definitely the other valid approach, but I prefer M because it's less typing (don't have to spam out and maintain that constructor that's pure boilerplate), and you get to use designated initializers when creating it in the factory. If C++ had reflection and we could auto-generate the boilerplate constructor, I'd be much more attracted to that approach. One of these days....
I always lean towards a static factory creation, usually returning a smart pointer for my C++ creation; I can vouch for this working and being absolutely the way to go. Making my constructors private, adding helper utilities, which you can then control access to the constructors via friendship has solved a lot of headaches for several projects my end. Especially when writing code which is to be used by others, preventing willy nilly stack allocation is really rather good, and though you say "factory" and you get glared at, I absolutely agree with the sentiments in this video.
I love stuff like this. Can you give some examples where normal constructor use might be leading me to willy nilly allocations that I'm overlooking/could be avoiding?
Impressive, a single vid and you got a subscriber. That didn't happen for me for a long time. I knew C++ was unsafe but this really makes it stick in. I like the m. approach though. Scary to propose it in production code but I for sure will try that in my day to day programs.
Constructors can be good as an interface (i.e. you know you're instantiating something but don't know what that thing is), though that does require that all the types you handle require the same amount of arguments for their constructor. It may or may not apply to C++ in particular, but it definitely applies to higher-level languages. For more complex initialisation such as the non-zero refined type case, I think static factory functions are reasonable if you have support for nullable types. Just return a NonZeroU8? from one, and a NonZeroU8 from the other. I do agree with many of the points you make, I just don't think they're inherent flaws of the constructor method, just of the way constructors work in C++ specifically. For instance, you could make a guarantee that in the constructor, any field of class A that has type T will have type T? instead until you assign it a value. And if by the end there are paths where not all fields are set, that's a compiler error because you didn't instantiate your object correctly. The type checker should be able to similarly reason about how fields are accessed in methods that you call. If it's just a setter, go ahead. You can treat it as though it accepts a type that has nullable fields of which the class you defined is a refined type. So long as the type-checker determines the method can work with this implicitly defined superset of your class, you're allowed to call it. Most of the other issues come down to a combination of syntax and language semantics, but none of them are flaws with what a constructor method does at its core.
In case of generics (templates in C++) types may have different number of constructor arguments, the particular constructor used is determined when the template is instantiated, e.g. vector.emplace_back(args...); will accept any number of arguments if the handled type has a constructor that accepts that combination of arguments.
Thanks for these videos! An informed opinion, educated discussion, and soothing voice with great visuals make this a great channel. Keep doing what you're doing, man.
Very interesting video, thanks! Note that the first big general-purpose OOP language was very OOP, in that classes were actually object instances (whose supertype was "Class", which was also an object instance), and the only way to allocate an object was to call a factory function on the class object. So "Point.new()" [not the right syntax] was invoking the new function on the object stored in the global variable Point. Anyone who wants to learn more about invariants (and pre- and post-conditions) should check out Meyer's tome "Object-Oriented Software Construction", which you can get as a PDF floating around since he gave away PDF copies with his compiler. It's an interesting delve into how programming languages are designed. Like, what's the mathematical reasoning behind the various higher-level structures. Very handy concepts that have oozed into other programming languages, even if you're not writing OOP programs. For example, in Meyer's language, the constructor would have a precondition that the argument passed in isn't zero, and an invariant that the value inside the object isn't zero. That's part of the type signature, so you know that's the requirement. If the requirement is more complex (e.g., an array of bytes contains valid UTF-8) then a non-failing function has to be provided so you can check it. Then the constructor relies on the precondition being met, and if it isn't an exception is thrown in the caller which you cannot catch and continue on from (but you can retry, in a sense), and the top of the exception stack traceback is the caller and not the non-zero-constructor code. Just as an idea of a different way of handling it. His whole exception thing is so much cleaner than other languages.
@@darrennew8211 Well, you can use Simula for general things. Yes, it has a simulation library, but it still is useful for general things. Well useful, is a wide term here. Simula is not much used these days. Has mostly been used in Universities. And it was the inspiration for Bjarne Strostrup that made C++. Smalltalk is probably more popular. But I still Simula as the first object oriented language.
@@darrennew8211 The Simula of 67 was made for general purpose. In 62 it was mostly for simulations. The 67 version was the version I used at university back in 1993 to -95.
One advantage with C++ constructors is "emplace_back" for vectors and other in-place construction. For Rust, you typically have to hope the compiler optimises it that way (I believe its called Return Value Optimisation). However, it's of course trivial to have it in C++ since constructors use pointers instead. I know eventually it's going to be solved, but it's taking a bit unfortunately.
the curly brackets in the C++ constructor are pretty nice to do some quick math with passed in arguments to put a create a value for another struct variable. Instead of creating an instance and then calling an init function you can just create an instance and some base initializing will be done.
I was thinking if another issue, at least in c++. Even if you initialise your NonZeroU8 properly, there no protection from setting the member “value” to zero afterwards. You would need a getter, or cast operations, since there is no such thing as read-only fields in c++
10:16 if rust enforce the invariant thru UB (no one can stop u from passing 0, but nothing is guaranteed if we do so), we could do the same in c++, tho? like this: if(x != 0) { NonZeroU8 res; res.value = x; return res; } // Nothing here afterwards. Not even a return statement. If 0 is passed, it’s UB since we don’t return It actually works (everything is optimized as if the x is never 0), except the memory layout is not enforced. We can use std::unreachable in c++ 23 to explicitly cause UB also
i use boost outcome + 3 phase initialization. sure C++ don't have unsafe keyword but that's good enough for me. I don't think most C++ programmers need hand holding while writing a constructor, "spaces" between lines of code in constructor sound like a trivial problem, because constructor should be as simple as possible, if someone write 50 lines of code in constructor that would be smelly to me. May be i haven't wrote that kind of constructor, or work in a team like you but i think it is fine for me for now. Very nice video. i like it.
@@Evan490BC the argument can go all the way on every little things. safety is nice but it has cost, so it become risk assessment and trade-off consideration for specific situation. all i am saying is i have considered the risks and concluded it is not that beneficial to have further safety rail guards like unsafe keyword or prevent "spaces" in constructors in my situation/code base. There are no one-size-fits-all gloves.
iirc from a c++ weekly video that constructors are just converters (casts). You can't even take the address of constructors, much like destructors. Hence, without the explicit keyword, the constructor is used implicitly, much like how casts are done implicitly. The (old) constructor syntax (S s = S(42);) also comes from the syntax for C-style casts (float f = float(42);), so it's not designed to be for constructors (well the brace initialization S s{42}; or S s = S{42}; kinda make sense though, but it's newer). Therefore, constructors in C++ began simply as a glorified custom cast, and since this does not change as the language move forwards, constructors, which could only get some temporary patches and fixes (like the brace initialization syntax), remain broken. btw the pattern/paradigm in 16:42 is kinda genius ngl
8:24 this is a VERY good point. the fact that you can explicitly go out of your way to do something in an "unsafe" manner, but then be able to EASILY track down when you have done this is a VERY powerful tool you can have, that also isnt babying and distrusting of the user / developer, which is a pretty pervasive culture that i highly dislike. with this, though, you not only make safer and more robust code but you also lack this inherent distrust of the developer
9:22 if you know x is not zero you can do `if(!x)__builtin_unreachable();` before calling constructor. This will tell compiler that x=0 is not possible.
I noticed a similar issue in C# in some UTs where two-step initialization is kinda forced upon and bim, what you said would happen, did: Stuff was accessed before it was initialized and, worse, it didn't crash. It was happily null and creating really hard to find bug
You don't mention if you were aware of it, but Josh Bloch talks about this same thing (as _static factory methods_ ) in _Effective Java._ He gives a lot of the same reasons for it, even though Java has strong guarantees about not letting you operate on an uninitialized object and cleanup on creation failure.
I dislike sometimes how tightly OOP type things naturally couple lifetime into things. It's nice to use allocators (mainly arena allocators) to get a block and use a factory/c-style init function to fill in a pod struct into some aligned offset into the allocator's big block of memory. People teaching c++ will squak it has to be impossible to have invalid data but sometimes you just have to accept the responsibility. By dodging the construction, you can dodge the destruction. Just reset the allocator (that invalidates all structs placed in it) and go again for a new frame or hot loop iteration. I think c++ gets a lot better when you stop forcing everything to tangle data, lifetime, and functions into a class. Data, functions, and memory allocation can be handled seperately.
Two-phase initialization was popular in Java so we have a bunch of legacy C++ that has initialize() and finalize() methods. Really ugly. Plus we’re stuck on ‘17 so no designated initializers 😢
Because you already have the pointer to uninitialized this hanging around. C++ officially says that the only way to make an instance of a class (or a struct, for that matter) is to call a constructor. Aggregate initialization is just a default constructor that takes an initializer list.
@emilyyyylime- I really wish you could, but having private data members disables aggregate initialization for your type (even from inside a place with full visibility like a static member function, which is a shame IMHO). Using the nested struct gives us aggregate initialization for M, since M's members are all public even though M itself is private inside CString. So we create M through aggregate initialization, and then just need a proper CString private constructor that moves it into the private `m` member.
A cool thing about Go is that even if you try to do this, anybody can just conjur themselves up a zero-valued instance of your type, either intentionally or by accident, making it somehow even less safe than C++.
I think that there are good reasons to criticize c++ but in case of NonZeroU8 it’s not the case in my opinion. Example: In c++ you can easily create the same NonZeroU8 using std::optional, private constructor and static std::optional NonZeroU8::new(u8 x). #include using u8 = unsigned char; using opt_u8 = std::optional; class NonZeroU8 { u8 val; explicit NonZeroU8(u8 x) : val(x) {} public: static opt_u8 NonZeroU8(u8 x) { If(x != 0) { return NonZeroU8(x); } return std::nullopt; }; It’s important to say that there’s the doom of backwards compatibility that makes thigs pretty annoying but the c++ standard is trying to make sensible changes to improve safety and simplicity of the language. For example in c++ 23 there is a strong notion to use std::excepted instead of exceptions to maintain safety with minimal overhead.
Great video, especially the talk about half-initialized instances, this is something that should have been solved ages ago by the standard. One thing is bugging me about exceptions though. You mention that they are extremely expensive. This used to be the case, but in most modern compilers, the cost of exception handling should be basically null unless an exception is actually thrown, in which case yes, exceptions are more costly than checking the return value (mostly due to the handler being store in cold memory). I used to rely on exceptions as my sole error-handling mechanism. Since moving to rust as my main programming language, when I go back to C++, I do rely less often on exceptions, but not because of performance concerns, I simply want to force users of my API to reason about error handling with std::expected or std::optional (including explicitly saying "I want an expection if the value is not valid"). I think in some cases exceptions might still be slower than checking the return value because of easier optimizations (including calling std::unreachable in the "failure" branch of the error checking), but I have no data about this.
Something I might not have made clear enough in the video is that it's quite often that I WANT to be able to take the failure path. It's useful to have an unknown u8 value, and use the success or failure of constructing a NonZeroU8 from it as normal control flow. Exceptions are this strange form of control flow mechanism that you hope never actually takes the unhappy path--precisely because it's so expensive. I think that's a strange tool to use when there are simpler, more type-safe, and dirt cheap alternatives. Here's a quick microbenchmark where exceptions are over a thousand times slower for a simple piece of control flow that takes the failure path. quick-bench.com/q/KDEPFXLc7746GdbKRbyIuPghLe0 I stand by my claim that this is unacceptably expensive just for a fallible constructor of a low-level primitive like NonZeroU8.
@@_noisecode I did not expect it to be 1000x slower, the "common knowledge" was that it is about 20x slower than error checking. I guess I'll only use exceptions in cases where errors are truly exceptional, like IO operations. Thanks for the answer.
Around 11:40: Personally, I would use a delegating constructor for CString: add a second constructor taking a pointer and length, and have the first delegate to the second passing the result of calling strlen.
Used to hate writing factory code whenever I could cause it felt ugly. Though I write rust now this video gave me a new perspective and I will try to use them as much as possible in the future.
While I agree in the end, if you do have to have constructors in your language (say, for backward compatibility with a previous language, or familiarity to intended programmer audience), you can do what Swift did and make fallible constructors possible and language-supported.
I don't believe that comment about not having a valid S until the function returns is quite true; the standard guarantees we have a fully-initialised object by the opening brace, and all subjects must therefore, explicitly or implicitly, be initialised by the opening brace of the constructor implementation. It's fully initialised memory by the time the body of our constructor begins, and then the rest should be more about performing business logic to guarantee everything is how we want it, but it's a fully initialised and valid object at that point, just not the one we want.
At the opening brace, all data members have been constructed. The S object still doesn't technically exist. You can prove it by throwing an exception out of the constructor: all of S's data members' destructors will be called (because their construction finished), but S's destructor won't (because it never fully came into existence so it doesn't need to be destroyed). I would argue that needing to perform extra business logic in the curly braces to ensure everything is "how we want it" strongly implies that S's invariants are not yet fully established by the time of the opening curly brace, requiring additional logic to be executed. This means that S is not yet "valid" per the definition I gave at the beginning, meaning reading from `*this` as though it's an S is potentially still just as dangerous as using it before members' constructors have run.
learning c++ as a grad student I thought it was cool to know these easter eggs and irrational behaviors that could vary depending on the machine and compiler 😂 now I realise it's basically putting up with someone else's 💩
I relate to this. I used to take pride in knowing all the little subtle or esoteric stuff in C++ and thought it was a good use of my brain cycles. I took some time away from C++ and now that I'm using it again, I'm like... man, dealing with all this complexity is mostly just a waste of our precious time on this earth.
To be fair the problems with struct padding and the order/performance of the code is not solved, it's just hidden away or there are compromises made, which you can't control. I think if you need control of such small details, writing code that has to be correct is a decent tradeoff
I'm not sure that constructors of primitive types in C++ do nothing - after all, `int a;` is an uninitialized local variable, and `int a{};` is a local variable initialized to zero. AFAIK the constructors for primitive types and PODs are just not called unless mentioned explicitly in the initializer list. One particular footgun is `std::array` for primitive/POD type T - it is such a lightweight wrapper around old C array that it is considered a POD itself by the language - that means that its elements are not initialized by default. E.g.: `vector xs;` is an empty vector of ints; `array xs;` is an array of 5 uninitialized ints (!) - to make them always be zero, we have to use `array xs{};` aka "Uniform Initialization Syntax".
The "default constructor of int" bit was somewhat of a simplification in service of my not wanting to dive into all the C++ initialization complexity like I said. But to be more precise, yeah--ints don't have "default constructors" per se, but if you don't mention them in the initializer list, they do undergo _default initialization_, which, for ints, is defined as doing precisely nothing. Default initialization is the same type of initialization performed by the syntax `std::array xs;`. In this case, since std::array is a class type, default initialization calls its implicitly declared trivial default constructor, which does nothing, and we end up with an array of uninitialized ints. The syntax `std::array xs{};` on the other hand does _value initialization_, which has the difference that it zero-initializes primitive types (and aggregates thereof), so you end up with an array of all zeroes.
@@_noisecodeThanks for replying! I've never actually read the C++ Standard myself, all I know is from specialized books and from experience working in the language. Your comment clarified things a bit. And it's a great video that explains in a very accessible manner why in modern PLs like Rust there are no constructors as we know them.
after learning all of c++'s initialization rules i didn't realize it was such a horrible nightmare... and for no justifiable reason. most of the complexity comes from backwards compatibility and lack of foresight.
tbf, the language concept of "moving" stuff (instead of just copying all the time) was added to the language after about 30 years (and still before Rusts came into being; what do you think where they got it from?)
The C++ committee is full of brilliant people doing their absolute best, but yeah backwards compatibility is the real killer. C++11 tried really hard to simplify initialization with uniform initialization, but it wasn't perfect (ahem std::initializer_list), and now we are stuck with its warts forever, and the sum total is that uniform initialization permanently added complexity instead of simplifying.
When C++ was created, people just didn't have decades of experience with this sort of things. I guess technically you can call this lack of foresight, but I think no one can be expected to have that level of foresight.
@@__christopher__ the thing is that C++ tries to be 100% backwards compatible, the lack of foresight back then could've been justifiable if the language were to later undergo some fundamental change; but it didn't, and that's why it can be a pain in the ass to use nowadays. It doesn't seem like it'll ever be any different, and at this point maybe it's better to let it slowly die off and be replaced by Rust or even Go.
Spoiler alert, in 20 years people will say that [xyz] is such a better system than borrow checking and how come the Rust developers had such little foresight to build such a silly language. I mean personally I think that Val's system looks nicer to me than Rust even today. In any case, making programming languages is hard. Maintaining a programming language so that it stays relevant for decades is even harder.
It's a neat idea, but I don't see how you can make it work in practice unfortunately: That static factory function returns an `std::optional`, this is all well and good (I don't see what better type to return without exceptions), but you'd like to get rid of that optional wrapper once you've made sure that optional contains a proper `T` and not `std::nullopt`. That means either copy-constructing or move-constructing a new `T` from the `T` inside that optional. Copying will likely be either inefficient or even incorrect if `T` is noncopyable (e.g. if `T` is a wrapper around a file descriptor). Which means we need to be able to move-construct a `T`, so an object of type `T` can be in an empty moved-from state, that is, one of those unusable states we intended to avoid being representable by using that factory method over the constructor in the first place. It'd have worked with a destructive move, but we don't have that. Bummer. That means you'd be doomed to drag that extra `std::optional` everywhere which kinda defeats the point of having all objects of type `T` always being valid since they're now replaced by `std::optional` (which can be invalid by definition). At best, you now know that if you pass a `T&` or `const T&`, you know it's a valid object (well, unless that reference is somehow dangling, but that would be unrecoverable anyway). Still the caller will have to work with an `std::optional` instead of a `T` and either ignore nullopt checks (i.e. "trust me, I know this optional contains something") or add extra unneeded checks everywhere (slight performance loss, less readable code). Still, interesting comparison.
I believe what you're meant to do is create the optional, but then soon after you need to unwrap it: if it's None, report some error, if it's Some, move the inner value out of the option. If it feels silly because you've already validated the invariant, then you could call an unchecked factory... but prefer to refactor your code so that the checked factory becomes your validation. Then again, it's been a loooong time since I've done any C++. I will say, I'm usually of the opinion that the performance cost of keeping the invariants is worth the code not segfaulting in production. Obviously, tight loops are an exception, but before any optimization: take measurements!
In an older more imperative language such as C++ doing things this way would probably mean overhead as the code dictates that you first declare the variables, reserving memory, then operate on them and then copy them by value to the struct memory when constructing the object. I assume Rust is smart enough to see that a variable will be directly used to initialise an object later in the function and places it in the destination address straight away. Which is what the C++ thing with an output pointer pointing to a partially initialised object is doing very explicitly, as C[++] does. Though I guess probably nothing in the C++ spec (besides the actual current constructor rules) prevents the compiler from doing the same when noticing a function returns an object and sneakily allocating the declared variables exactly where it knows it will place the object? Or maybe I'm wrong and they've been doing that for a long time?
Yep, you're describing [Named] Return Value Optimization (a.k.a. copy/move elision), and it's not only allowed by the C++ standard, but it's actually required in many cases as of C++17.
This all sounds like you actually want to be doing functional programming 🙂 But yeah, this would probably serve as great context to a talk on referential transparency
I do this for a lot of reasons, including not needing to rely on exceptions. Except you cannot RVO at all if you return optional / variant types, which means you most likely will need a move constructor for a lot of your heavier types, and need to delete the copy constructor in order to not get bit. C++ constructor wrangling is soulcrushing sometimes, feels like a circus.
Optional and variant do both support in-place construction which can give you RVO if you do things correctly/carefully. The WithResultOf/“Superconstructing Super Elider” I mention near the end can help with this, I’d recommend checking that out in more depth.
One thing about Cstring class for C++: Is a valid way of solving that problem not to push the responsibility of finding out the string length from the constructor to the callee, and pass it as a parameter? The isAscii function could be ran on the inputted pointer too, and then you should be able to get the invariant statement straight from valid arguments.
Watching these footguns made me feel like between the wall and a knife. You can't get a realistic solution in C++ without commiting serious trade-offs. It really makes Rust feel magical because the design choices give a feeling of a sturdy foundation for cohesiveness and "Code makes sense" sort of effect.
15:00 There's a much simpler way. Just setup a linter rule that enforces that "this->" must be explicit (Very common already), and then a linter rule that enforces that "this->" can only be used for the last N lines of the constructor, and only ever as the first 6 non-whitespace characters of the line. i.e., you enforce that you first do everything at the top, and then you just do this->a = , this->b = , this->c = , at the bottom of your constructor. You could probably whip up a fairly robust regex in short order, or a tree sitter implementation if you want it to be perfect.
I guess simpler is in the eye of the beholder. :) Could be a neat idea. It still requires that all your members be default constructible (and ideally cheaply) so that they can be constructed before the opening curly brace and then assigned later.
Awesome video, I really like how clearly you explain and compare things. I would love to hear more, especially a Rust tutorial for C++ developers. Some concept in Rust are just so alien to me I can't grasp this language properly (like how would you control a global list of objects for some important thing in project well).
8:11 A non-rust programmer here. How is it possible for a safe function to have non-safe code in it? Wouldn't that just mean that the safe function is not safe? If there is a bug in the safe initializer and zero gets passed to the unsafe initializer. Wouldn't that cause an issue?
This is a really good question! So, since the unsafe code can only execute in the branch where we know it's not zero (we checked with the match{} statement), the unsafe block is actually known to be safe. This is the correct way of using unsafe blocks in Rust: carefully check that the preconditions are all satisfied, and then open the unsafe block. (In that way, 'unsafe {}' is a bit paradoxical--the keyword should really be like `i_have_checked_that_this_unsafe_operation_is_safe_in_this_case { foo() }`). This is really the essence of the interplay between safe and unsafe Rust. We have low-level operations that are generally unsafe, and then we write safe APIs on top of them by wrapping them in structures that check or otherwise enforce the unsafe operations' preconditions. If you have _bugs_ in those safe wrappers (like if we screwed up the match{} statement), then yes, you can cause safe code to have UB. This is generally regarded as a very serious bug in Rust, and so you'll usually see a lot of care and pains taken around unsafe blocks that are in safe code (like huge comment blocks explaining the reasoning why it's safe).
new_unchecked isn't 'necessary' for performance, you can do Option::unwrap_unchecked and any checks (aka paths returning None) will be dead-code-eliminated. In this case the constructor already did nothing, as None is represented as 0
One notable issue with factory functions in Rust is the inability to have fine-grained control over how and where an object is allocated, which can result in unneccessary copying.
If I'm not misunderstanding what you're talking about, I think have to disagree. When "making" a struct by filling in all of it's fields, you statically allocate that memory on the stack. You can also choose to dynamically allocate on the heap by using something like Box. Inside a function, once a struct is created, you can move it out of the function which, as far as i know, is always (or at least in 99.9% of cases) basically a zero cost operation. A type is only copied when explicitly calling `.clone()` on it, or when that type is `Copy` (true for most primitives).
@@potatomaaan1757 I think you are misunderstanding. To clarify, the issues are kinda twofold. One is that because objects (in the generic sense of "entities", not objecs in the C++ sense) are created inside of a function that has to return them, there's no idiomatic way in the language to construct on object off-stack: your struct is created on stack and then moved off the stack into a Box or wherever else if that's what you want. You could make a specialized new function that lets the caller pass memory in, but that isn't standard in the ecosystem, so you have to rely on compiler optimizations to eliminate that copying step, which are not reliable (this is an acknowledged issue inside of the Rust project, and there's slow-but-ongoing work to improve things), whereas in C++, because allocation and initialization are separated, the caller can decide to allocate the memory itself and use placement new to create the object, or use a custom allocator, even if additional allocations occur during the creation process. Rust additionally doesn't handle allocation failure very well and doesn't have support for custom allocators yet. And to be clear both the all-on-stack allocations and the lacking support for custom allocators are seen as limitations in Rust-Custom allocators are in nightly right now, but because Rust uses factory functions to create objects, support will have to be specifically added to any datastructure that performs an allocation. As for the copy-out issue, work is being done to make the optimization consistent and there's been discussion of a language feature to allow for functions to explicitly request/demand the optimization, but that's all still WIP. I'm hoping this all does get resolved, but it is definitely a challenge with the design Rust chose here. There are a lot of issues with C++'s design, but the video covers those and they're discussed a lot more generally.
15:50ish, could you just declare a (private) constructor taking every param and doing nothing? It would be slightly annoying, sure, but: private: CString(unique_ptr _buf, int _len, bool _is_ascii) : buf(_buf), len(_len), is_ascii(is_ascii) {} ... create () { ... return CString(buf, len, is_ascii); }
I'm not sure I would call these associated fns "factories." The term "constructor" seems to apply equally well. My understanding of factories from OOP is that they are objects whose sole responsibility is constructing other objects (especially abstract ones with multiple implementations). I think there's a key distinction in that factories hold onto configuration data while constructors are simply pure functions.
There is an existing design pattern more commonly referred to as "static factory method" or "static creation method" and is a useful simplification of "factory method" when factory interface or constructor arguments aren't necessary. That is essentially what is being called "factory" for short here
I misunderstood the idea for the first few minutes of the video because of this as well. I have only heard the factory pattern used to mean "object instance that instantiates other objects", and a (static) "factory" like Vec::new() is often just refered to as a constructor in rust contexts because that word doesn't mean anything else within the language. I am more keen on saying "c++ constructors are something else" than "rust doesn't have constructors, and instead has static factories" as a general rule unless you are actively talking to a c++ person.
Note that at 7:47 the same optimization (0 = None) means that the NonZeroU8::new function is actually a no-op even though it returns Option and contains a match. Using it won't cause a branch until/unless you check the result yourself, for example by unwraping it. If you want a NonZeroU8 the effect is largely the same, but of you're passing to an API that wants Option regardless, then no extra branch is incurred. (This doesn't necessarily apply without optimizations turned on, of course.)
Interesting. But what's the point of having Option in this case ? Can't you just keep a u8 and check if it's not 0 when you need to ? Better readability and type safety I guess because then you can never accidentally devide by 0 if you forgot to check (but you can never forget to check Option for None). Is that all ?
@@tombenham9458 the NonZero types are often used for optimizations. If your type contains one anyehere in it, Option will have the same shape as T. The other use is semantics. Sure you "could just check that its not zero" when you use it, but that means putting those checks all over your code and more possibility to forget them. If you use this type it pushes the check to one place in the code (the factory new function) and everywhere else the invariant is guaranteed by the type system without having to put any more checks. So it's both harder to get wrong (correctness) and more performant (less checks needed). When working in Rust I find I pretty commonly create these sorts of restricted "Semantic Types" in places where in other languages I might just use a plain int or string.
@@tombenham9458 @tombenham9458 You can also only check it once and then pass around bare NonZeroU8. The compiler would not let you get NonZeroU8 without checking, so in all the other places of your code you can be sure you don't need any checks. With a u8 you would have to remember at which exact points of your code it has already been checked and at which it wasn't.
@tombenham9458 With Option, you don't necessarily have to check for None. That can be done via the many helper functions that Option has. 'map', for instance, becomes a no-op if the value is None. You have 'is_some_and' function too for doing some boolean check on the value, which defaults to false with None. Option is such a useful type.
@@tombenham9458 encapsulating this in the type system means that even when you pass this object to some other library or get a value passed by another library, you will not need to explicitly check for it to be non-zero. Furthermore, the compiler cannot know if a u8 is nonzero just by looking at it. With a proper type, the compiler can know this however. Type-safety in this case means to represent possible values by your type system, meaning if you pass the correct type, it is valid by design. For a randomly picked u8 value, we do not know if this is nonzero. Thus representing a NonZeroU8 value as u8 is not type-safe.
The realization 'in a factory, you can do a bunch of logic, then construct the object with all its initial values in one fell swoop, so there's never an invalid object' is really clever.
I'm not a C++ dev, but I wrote Java for about 10 years. Java provides you _some_ additional guarantees when it comes to constructing an object, but there are still easy ways to make mistakes and access uninitialized fields. I _never_ understood what the point of constructors (as seen in Java or C++) is, instead of having a way to construct an object with all field values in a single step. You can't name constructors and have to differentiate them by signatures instead, constructors are complicated to make safe (I think Swift has managed to do so), you can't return a type different from the class (e.g. wrapped in an Optional).
It just seems like a bad idea that should be avoided in language design.
sooo smart, until the moment you realize your type is now not compatible with quite some std containers. Default initialization DOES MATTER.
@@petarpetrov3591 I agree, but not everything should implement a default initialization, it should be tied to an interface.
I don't get it, how is that clever? The invariants will be violated while you are doing the bunch of logic. This is just kicking the problem. The real solution is stop kicking the can and accept that not all constructors need to establish invariants, especially the default constructor. Partially Formed values are ok, and efficient.
@@AlfredoCorrea I don't understand. If an object has an associated invariant and the factory method creates that invariant _and then_ constructs the object, how is the object's invariant ever violated? Either the object exists with the invariant or it doesn't exist.
Maybe you're looking at this from a different angle, but in my experience, having invariants associated with types/objects is really nice. Making it impossible/harder to instantiate objects with broken invariants helps a lot. I write a lot of security-critical stuff.
This reminds me of a concept I picked up in one of @NoBoilerplate's videos, which I now repeat like mantra when coding in Rust: "Make invalid state unrepresentable"
You know what that does to you, right? It makes your software useless if you miss your user's actual user requirements. It's just trading one bug opportunity against another.
...what? @@lepidoptera9337
@@lepidoptera9337 No. It makes all invalid states unreachable, meaning if the not mentioned use case lies within the non invalid cases then you can be absolutely sure that it will work without changes. If it lies within the explicit invalid then it will not work just as it shouldn't.
If a new use case lies within the before agreed on "invalid" then it should not be able to exist, without explicit change.
@@onebacon_ It usually doesn't. People use e.g. negative entries in fields that usually expect positive numbers as flags all the time. Or let's say you make an animal class and you only allow "dog, cat, canary", then only small animal vets can use it. A rural vet who treats "horse, cow, sheep, goat" is done for. All of you guys have been over-trained by architects with OCD in this belief that exercising absolute control over the user makes good software. That's total BS. What makes good software is adaptability and resilience. If somebody can compromise your Windows PC because they are allowed to put an arbitrary string into "animal type" rather then selecting from a pulldown-menu, then you didn't do your job correctly. You just showed the entire world that you are poorly educated engineers.
@@lepidoptera9337having invalid states doesn't mean that the software cannot meet the requirements, just that we have less time representing the technical things and more time making the requirements
“You can grep your files for unsafe and find where you might have made a mistake” is the single best way of explaining (to me) why the “unsafe” keyword exists and why the compiler is more strict otherwise. That made so many things just *click* in my mind - great video!!
the thing is that thats quite irrelevant in larger code bases. when using rust to write my game engine with vulkan for example the invariants to control for are so large and expansive that everything has to unsafe anyway. having an api that is provably safe when your memory isnt cache coherent and asynchronously in flight is virtually impossible.
also people dont actually give a shit about it. every other crate i open from crates io leaks undefined behavior into safe rust. and that one compromise it all that it takes to compromise all of your rust codes safety.
@@gideonunger7284 um sweaty, just spend 3 years writing a safe wrapper for vulkan and use that instead.
@@gideonunger7284I just don't know enough about GPU/Vulkan programming to meaningfully comment on the first part, but I just thought, isn't the point that's (kinda) made in this video that you need to design your API very carefully, creating very small abstractions and such components, in order to be able to minimise the invariant space around unsafe code? Ie., it's a very big effort, but not necessarily impossible?
I don't know if it's _literally_ every other crate (maybe you just have very different usage patterns than me, lol), but yeah, there is a problem of a lot of crates using unsafe while not properly analysing it. It would be great if there was a common expected standard for analysis of unsafe, such that it'd be easier to see if the effort had been made, at the very least.
To aid in that, but I don't know if this is feasible, it would be great if the compiler had an "invariant/UB analyser" that could look into and at least recommend invariants to check for and modes of UB that might arise from a particular use of unsafe.
@@gideonunger7284 the fact that writting memory safe code is hard doesnt mean it is impossible tho, with enough time that is.
@@MarteenHobbu yes it does in certain cases. i write a lot of rust code and a lot of safe abstractions over unsafe code.
and some abstractions, with how rust works, are per definition impossible.
also the set of safe code as considered by rust is not the same as the set of safe code.
so there is a lot of "correct and safe" code that you opt out by using rust. and some of that code you opt out of is very performant. which if you work on games for example can be a hard requirement for what you are doing so you will have to write it in unsafe rust.
8:30 note that the “new” and “unchecked_new” functions need to be marked as ‘pub’ in order for them to be exposed for outside use - and that the member(s) of the structure must not be marked pub to prevent people from constructing a NonZeroU8 on their own (and from mutating the member(s)).
I think the `pub` keyword is omitted for the purposes of keeping the example code in the video concise and focused on what is being demonstrated. Similarly, in real code, some of the functions should ideally be marked as `const`, but that isn't relevant to the video topic. Anyway, still a useful reminder.
@@shrootskyi815 thanks for your reply. The main point that I was trying to convey is that if one has a struct with invariants, they should take steps to ensure that the struct only gets created using their designated 'factory functions' and its fields won't be manipulated directly. This typically means putting the struct in a module and declaring the factory functions and other potentially mutating functions as 'pub'.
That moment where the size of the struct changed when you reordered its fields, that was a great setup :) I'd heard the "field order trumps initializer order" rule before, but I hadn't thought about how field order has other consequences too.
incredible video. editing, narration, explanation, the code itself - everything was fantastic and so concise. as a rust programmer who’s never touched c++ i still learned a lot about programming in general. please keep it up.
9:00 "first of all exceptions are incredibly expensive for what's supposed to be a very simple and low-level type "
This is only true if exceptions are used as a regular control flow mechanism (I think because of the stack unwinding but I'm not actually familiar). If the exception is not triggered, the cost is insignificant. Everything else you said about them is true though. Might've been worth mentioning that panic! exists for some niche low level uses, and that Result can't replace exceptions in every instance, but in practice for the vast majority high level code it can. Really really high quality video, I am astounded by the amount of misinformation that exists on C++, especially from other young people.
Yes, exceptions - like much of C++'s lunacy - are actually in the language for a good reason.
Also there's a great CPPCon talk "expect the expected" that talks about many types of error handling and their issues/strengths. Exceptions in particular provide a lot of strengths. You can do centralised error handling, transport lots of information about the error, and they're very performant for situations where the "unhappy case" is rare - as you would hope it is.
Exceptions should only be used in exceptional circumstances. Even if they were free from a performance standpoint, using them for flow control just makes a maintainability mess.
Unfortunately there's no other (sane) way to signal failure from a constructor, which is why they're mentioned in this video. And for something like NonZeroU8, having the failure case be expensive really reduces the number of places that type can be used (compared to the rust version which is zero-cost)
Another observation: It's refreshing that your videos have no background audio. Pure calm voice is so much better to deliver concise factual information than noisy music. Wish that more educators would rediscover "less is more".
You really know your stuff when it comes to C++.
I believe clangd has warnings for every mistake inside a constructor that you mentioned. It will tell you if you're reading from a member that hasn't been initialized yet, it will tell you that you aren't calling an overridden virtual function, etc. Obviously it would be better to disallow those mistakes in the language standard instead of relying on warnings, but good tooling helps a lot.
Reading about the Superconducting Super Elider brings up another fun thing about Rust: moves. C++ has all sorts of rules about when moves can and can't be elided, specifically because there's API surface for arbitrary types to be told when they are being moved - move constructors, specifically. In Rust, anything can move anything as many or as few times as it likes, and if you don't like that, you have to stuff it behind a smart pointer (like Pin).
This is mainly notable because it results in one of the biggest problems with Rust/C++ interop: everything has to be behind smart pointers. If you put a C++ type on a Rust stack frame, Rust will move it around without calling the move constructor, which is hilariously unsound.
10:38 A potential way to “mark” create_unchecked as unsafe (although I personally think that “grep unsafe” is analogous to “grep unchecked”) is to use “the passkey idiom” as a way of keeping more control of which code might call specific member functions.
Friends over at the pony language call these passkeys "object capabilities"; you create a type such that only code you trust may receive and instance of that type, and you then use this trust to gate away features of your library
What is the pony language?
@@LunaDragofelisJust Google it?
The only problem with Rust's style is that in-place initialization on the heap isn't guaranteed.
The code:
let data = Box::new([0u8; 10_000_000]);
is supposed to create 10 megabytes of data on the heap but might overflow the stack in the process. I tested this and it actually crashes in debug mode but works in release mode, pretty gross. I think solutions to guarantee a stack overflow won't happen are being worked on but it's not trivial.
The way constructors in C++ work, you can just allocate the space on the heap and then call the constructor on the pointer.
the std uses a "box" keyword that avoids this problem while preventing stable rust code from using it. very annoying
@@kaga2922 that keyword no longer exists - even within std. Box::new is now implemented with a special attribute.
@@kaga2922actually it's now been replaced by an arcane rustc attribute called #[rustc_box]. this problem does need fixing but its a much smaller problem than the tarpit C++ finds itself in imo - essentially we need a more ergonomic and safe equivalent of out pointers that doesnt involve using MaybeUninit which is very easy to misuse, whereas some sort of &out reference that *must* be fully initialized by the time the function returns would be much better
placement new is wip from what know. in the mean time you do: vec![0u8; 10000000].into_boxed_slice()
Absolutely true that placement new is badly needed in stable Rust. My preference would just be stronger guarantees about move elision, rather than a special attribute or `box` syntax or `&out` parameter type of thing. Since Rust controls its own calling convention, it would be amazing if `Foo::bar([0u8; 10_000_000])` could just initialize the 'prvalue' array directly wherever `Foo::bar` moves it to. In a perfect world this could be done without introducing anything like the incredible amount of complexity C++ has around value categories.
6:10 if I remember correctly 42 is what’s set because NSDMI is basically a default value, could be wrong tho lol
Yeah, that way you can have multiple constructors that init to different values. If NSDMI took precedence then you couldn't use the initializer list to has per constructor values assigned in the initializer list
10:28 "Because C++ doesn't have a builtin notion of safety"
It does, you just dismissed it at 8:57.
With exceptions enabled you can make an "unsafe" factory function by marking it noexcept. In case of invalid input the constructor will throw an exception, terminating your program. You (almost) don't pay for exceptions, don't have to handle them for such a scenario, and it guarantees the invariant.
It is a bit 'hacky' and I acknowledge that it is less 'neat' than the rust way, but it's possible. And I don't think your arguments against exceptions hold much ground.
1. Yes, fully exceptionless C++ is faster. But firstly, exceptions incur most of their cost when they're actually thrown and handled. For most use cases the overhead for exceptions being enabled is negligible. Secondly, if we're doing a feature comparison here, safe Rust is also slower (potentially also really slow and expensive) compared to unsafe Rust. I'd go as far as to say that these arguments have the same weight when comparing language performance, cancelling each other out, although it's too much work to prove it properly.
2. "Lots of codebases aren't prepared to handle them". Well, sounds like they should start using the language like it's supposed to be used and actually learn the core safety features.
I agree with "invisible code path" argument though, it is the only sound one and actually a really strong one.
It would be reasonable to use the success or failure of Rust's NonZeroU8::new as actual regular control flow in your code (say you have an unknown u8 and it's not necessarily an error if it's 0, you just want to take a different code path), whereas it would not be reasonable or wise to do so with a try/catch and a throwing constructor in C++. That's the insight at the heart of my argument that exceptions are too expensive for this simple low-level task.
The noexcept trick actually sounds more like safe Rust than unsafe: it guarantees a crash (safe) instead of forging ahead with UB (unsafe).
As for code bases that aren't prepared to handle exceptions, sure, everyone should just git gud, but the fact that the language doesn't force you contend with them (the invisible code path argument) means this is always going to be an uphill battle. Structuring your code to put failure in the type system with std::optional (or std::expected in the future) gets the type system on your side about requiring you to think about error cases.
Honestly one of the better coding talks I have seen. It is very thoughtful and your solution can be easily reasoned about
A few notes:
1. I'm a bit disappointed, that you didn't go into templates and meta-programming. Especially the NonZeroU8 case is a perfect example where templates can emulate a more complex compile time type system. (even though the syntax makes you want to hurt someone)
2. The private struct solution is a pattern many code bases already use especially for dynamically linked libraries to isolate the struct layout from the library user called "private implementation" (PIMPL) but it always annoyed me how you either need a pointer indirection or get rid of the C++ type system and resort to basic C-style OOP to make it work without it.
3. A CS teacher once told me "any design pattern is just a work around for the deficiencies of the language of choice". This video perfectly illustrates that statement!
4. Making unchecked_new "safer" in C++ is possible with "friend classes" but that is a giant mess in and of itself.
It's been quite a while since I've worked with C++, but that packing of private implementation made me think of the PImpl pattern. Not for the same application reasons, but certainly still to improve safety/stability, and also similarly with a cost from the indirection that cannot be optimized away.
It definitely has some similarities to Pimpl, and you could definitely set up your class architecture get the implementation hiding benefits of Pimpl and the initialization benefits of CString::M in one swoop if you chose to. I do want to point out that in the form I showed in the video, CString::M should actually be utterly transparent to the compiler; I doubt there would be any indirection overhead at all, even in unoptimized builds.
@@_noisecode I tend to agree. The main issue I see with your inner-struct suggestion is that I think it doesn't handle inheritance elegantly
I have actually grown to like the nested struct approach for some data types I write.
- If you use this for private members it does not affect the public API.
- It lets you manipulate all the object's data at once, as the video says. If you add new members in the future, there are less changes that need to be done, to the class implementation.
- All mentions of members are clearly visually distinguished from local variables. (I like to use Self/self instead of M/m, which makes it intuitively similar to languages like Python or Rust)
- The data fields of the class are visually separated from methods in the class definition. You can see at a glance what data fields the object has.
I don't think one should use this approach for all, class definitions, but for grouping together private data members of more complex obejcts this is great.
I really enjoyed this video, I've used a bunch of languages with constructors (C++, Java, C#) and have been using Rust for a little while now. The other day, I had a little C# to write, and when working with constructors I felt a familiar tiny little pang of dread again, and I think you just nailed down what I was feeling.
The metaphor that a constructor is a function that returns an initialised object isn't quite true, and it's in the details where some subtle ambiguities and bugs often lie. Rust applies the metaphor much more literally, and I feel a lot more confident in what I'm doing.
Funnily enough, this is one of the things I've appreciated the most with Rust. There are much fewer "things" to consider in the language. It's closer to C in many ways in that regard.
The solution you propose at the end (using aggregate data structures to ensure type validity) for the C++ side reminded me of a CppNorth talk from last year, "Writing C++ to Be Read". It touches on the topic of constructor initialization and how aggregate initialization provides advantages for quite a few cases.
It just amazes me time and time again, how the great choices that were made at rusts core make it such a practical and cohesive language to work with and reason about and have a butterfly-effect-like influence on so many parts in the language itself
Another great example for that is the std::mem::drop implementation, made me chuckle when I found out about it :D
Isn't std::mem::drop just a function with an empty body?
My favourite part is traits if you're working with a library that works with a specific struct, you often end up awkwardly calling a custom function that takes in the struct and spits some value or does some crap several times across your codebase.
You can instead just implement a trait for the struct to make your life easier and code cleaner.
@@kuhluhOG exactly, it just consumes the value )
pub fn drop(_x: T) {}
@@senzmaki The ability to slap a trait in almost anything to extend its functionality is so great.
Here’s another line of code that’s equivalent to “std::mem::drop(x)” - “x;”. Just the variable and a semicolon. Because everything is a statement, “x” means you’re trying to pass x, and adding a semicolon means you aren’t returning anything. So you’re just passing x into the void, and the deconstructor is immediately called.
This is super helpful for actually learning "the right subset" of C++, thank you! I can't believe a Rust channel is what I needed to learn better Cpp but here we are
This video is so nice! It just resonates so much with struggle I had using C++ at first (like finding out that order of class members affects init order or that factory functions make much more sense in general) and constructors problem so well covered here.
Most languages benefit this, particularly when incorporating the Result or Option types.
Make invalid states unrepresentable.
It's only a problem when the frameworks being used fail to support it effectively, and force you into the constructor approach
Indeed, and it can be annoying with libraries.
much legacy code are using old style C++ which forces you to use the old semantics for much of the stuff. When you work on a system that has code from the previous millennium, then you would know. And most of the time, it is not an option to rewrite all the code.
@@oysteinsoreide4323 if it ain't broke don't fix it - there's nothing inherently wrong with a constructor based approach when it's working properly. But new code in such a legacy code base certainly should consider the more robust approach - the two styles can coexist just fine
@@orterves Yes, in some instances, then using a better invariant on objects is good. But I'm not sure if I would use the factory approach. I would rather have an exception in the constructor, and ensure that the exception is handled. It will be much less code in the classes, and the validity of objects will be equally safe. The most important thing is to have good invariants, and that is often lacking in old programming style, and that is much worse. And something that is much more difficult to just write yourself out of when the code is complex. So there are much code where there is no clear invariant. So in that case how thigns are constructed is far less important. You would need to fix the invariants first.
As someone who doesn't know C++ and hasn't learned Rust yet, this was very informative 🎉
Another great video! I'm always astounded when watching your videos how often you state an idea that I say all the time to my teammates. Avoiding partially initialized objects, making constructor bodies empty, Being wary of how the spaces between lines can evolve even if the current code looks good. I need to keep directing my teammates to your videos
C++ initialization rules are a mess man
I'm quoting this on my CS final.
C++ classes are a mess in general.
Like why the fuck do I have to write a custom destructor, copy-constructor, and a copy assignment operator just to be able to properly handle pointers.
At least the only real footgun in Rust (in this context) is the `Drop` trait implementation.
all of c++ is a mess
No not really
@@oserodal2702 and unless you're managing custom resources (raw files, raw memory allocations, etc) or want drop for your type to have side-effects, you mostly don't need to implement drop either!
I always called factories as functions that returned new instances of a set of classes based on their arguments or other rules. The classic example is an image factory that can return a jpeg image class or a png image class based on the image file type. Callers need not care about the specialization and have a single method for construction
I would call this an 'abstract factory', since an 'image' is abstract
@@sus7801But... would you call it an AbstractImageFactoryFactory? 😏
See: OOP Design Patterns. You might not be wrong, but others have decided this well before you.
The exception in the constructor would solve everything also. But as you said, it would mandate a try - catch somewhere to avoid issues with it. optional makes it possible to solve it in modern c++.
You could get your constructors back once you aggregated all of your fields. Then the constructor could simply be this: 'CString() : m(CString::create()) {}' or this 'CString(const char* in) : m(CString::create(in)) {}'. This is actually very efficient thanks to NRVO. You also might want to replace the construct() methods with lambdas to have all the code right there.
As a C programmer, I dont use constructors
Exactly.
8:30 "Which is something you can grep your code for" is slightly complicated by the fact that the actual mistake can easily happen outside the unsafe block. We can see that in this very example if you pass a variable to new_unchecked() and somewhere else in the code can accidentally set that variable to 0 before that call.
It is still a very good idea to have in the language. It just doesn't make it as easy as it might seem at first glance.
@@anon8510 Yes, it does. But that doesn't mean that all potentially unsafe errors happen there. As in this very example:
let x=0;
let y= unsafe{NonZeroU8::new_unchecked(x)};
The actual mistake happens on the line above the unsafe block, where x was supposed to be set to 1, not in the unsafe block itself. (It is not difficult to imagine that figuring out the value of x could be much more complicated, for instance dependent on input that isn't sanitised when it should be, or set with some crazy math expression that's not quite correctly coded.) Which is to say, you can't JUST grep for and look at the unsafe blocks. You have to inspect all the surrounding code as well.
The point is that you can always be sure that the breakage of the invariant happened in an unsafe block. But the thing that breaks the invariant might have initialized outside of it. Safe/unsafe rust does not promise anything else and this is already a lot better than having no safe subset.
what about a private constructor that takes in all fields?
Was about to suggest the same. I thought that's where he was headed...
Well I gotta keep you on your toes, don't I? That's definitely the other valid approach, but I prefer M because it's less typing (don't have to spam out and maintain that constructor that's pure boilerplate), and you get to use designated initializers when creating it in the factory.
If C++ had reflection and we could auto-generate the boilerplate constructor, I'd be much more attracted to that approach. One of these days....
@@_noisecode what about macros or cpp2 @'s
I'm so glad this high-level content exists. Great video. Your expertise and knowledge is clear.
I always lean towards a static factory creation, usually returning a smart pointer for my C++ creation; I can vouch for this working and being absolutely the way to go. Making my constructors private, adding helper utilities, which you can then control access to the constructors via friendship has solved a lot of headaches for several projects my end. Especially when writing code which is to be used by others, preventing willy nilly stack allocation is really rather good, and though you say "factory" and you get glared at, I absolutely agree with the sentiments in this video.
I love stuff like this. Can you give some examples where normal constructor use might be leading me to willy nilly allocations that I'm overlooking/could be avoiding?
Impressive, a single vid and you got a subscriber. That didn't happen for me for a long time. I knew C++ was unsafe but this really makes it stick in. I like the m. approach though. Scary to propose it in production code but I for sure will try that in my day to day programs.
Constructors can be good as an interface (i.e. you know you're instantiating something but don't know what that thing is), though that does require that all the types you handle require the same amount of arguments for their constructor. It may or may not apply to C++ in particular, but it definitely applies to higher-level languages. For more complex initialisation such as the non-zero refined type case, I think static factory functions are reasonable if you have support for nullable types. Just return a NonZeroU8? from one, and a NonZeroU8 from the other.
I do agree with many of the points you make, I just don't think they're inherent flaws of the constructor method, just of the way constructors work in C++ specifically. For instance, you could make a guarantee that in the constructor, any field of class A that has type T will have type T? instead until you assign it a value. And if by the end there are paths where not all fields are set, that's a compiler error because you didn't instantiate your object correctly. The type checker should be able to similarly reason about how fields are accessed in methods that you call. If it's just a setter, go ahead. You can treat it as though it accepts a type that has nullable fields of which the class you defined is a refined type. So long as the type-checker determines the method can work with this implicitly defined superset of your class, you're allowed to call it. Most of the other issues come down to a combination of syntax and language semantics, but none of them are flaws with what a constructor method does at its core.
In case of generics (templates in C++) types may have different number of constructor arguments, the particular constructor used is determined when the template is instantiated, e.g. vector.emplace_back(args...); will accept any number of arguments if the handled type has a constructor that accepts that combination of arguments.
Thanks for these videos! An informed opinion, educated discussion, and soothing voice with great visuals make this a great channel. Keep doing what you're doing, man.
Very interesting video, thanks!
Note that the first big general-purpose OOP language was very OOP, in that classes were actually object instances (whose supertype was "Class", which was also an object instance), and the only way to allocate an object was to call a factory function on the class object. So "Point.new()" [not the right syntax] was invoking the new function on the object stored in the global variable Point.
Anyone who wants to learn more about invariants (and pre- and post-conditions) should check out Meyer's tome "Object-Oriented Software Construction", which you can get as a PDF floating around since he gave away PDF copies with his compiler. It's an interesting delve into how programming languages are designed. Like, what's the mathematical reasoning behind the various higher-level structures. Very handy concepts that have oozed into other programming languages, even if you're not writing OOP programs.
For example, in Meyer's language, the constructor would have a precondition that the argument passed in isn't zero, and an invariant that the value inside the object isn't zero. That's part of the type signature, so you know that's the requirement. If the requirement is more complex (e.g., an array of bytes contains valid UTF-8) then a non-failing function has to be provided so you can check it. Then the constructor relies on the precondition being met, and if it isn't an exception is thrown in the caller which you cannot catch and continue on from (but you can retry, in a sense), and the top of the exception stack traceback is the caller and not the non-zero-constructor code. Just as an idea of a different way of handling it. His whole exception thing is so much cleaner than other languages.
What was the first general purpose OOP? I thought Simula was the first one. It maybe never became big though.
It sounds it is more use of metaclass programming.
@@oysteinsoreide4323 I would say Smalltalk, as Simula was specifically for simulation.
@@darrennew8211 Well, you can use Simula for general things. Yes, it has a simulation library, but it still is useful for general things. Well useful, is a wide term here. Simula is not much used these days. Has mostly been used in Universities. And it was the inspiration for Bjarne Strostrup that made C++. Smalltalk is probably more popular. But I still Simula as the first object oriented language.
@@darrennew8211 The Simula of 67 was made for general purpose. In 62 it was mostly for simulations. The 67 version was the version I used at university back in 1993 to -95.
One advantage with C++ constructors is "emplace_back" for vectors and other in-place construction. For Rust, you typically have to hope the compiler optimises it that way (I believe its called Return Value Optimisation). However, it's of course trivial to have it in C++ since constructors use pointers instead. I know eventually it's going to be solved, but it's taking a bit unfortunately.
the curly brackets in the C++ constructor are pretty nice to do some quick math with passed in arguments to put a create a value for another struct variable. Instead of creating an instance and then calling an init function you can just create an instance and some base initializing will be done.
Thanks for thoughtful videos, always a pleasure to watch them
My favorite channel on UA-cam. Another great upload 👍🏻
I was thinking if another issue, at least in c++. Even if you initialise your NonZeroU8 properly, there no protection from setting the member “value” to zero afterwards. You would need a getter, or cast operations, since there is no such thing as read-only fields in c++
This is the best video on the subject by far
10:16 if rust enforce the invariant thru UB (no one can stop u from passing 0, but nothing is guaranteed if we do so), we could do the same in c++, tho? like this:
if(x != 0) {
NonZeroU8 res;
res.value = x;
return res;
}
// Nothing here afterwards. Not even a return statement. If 0 is passed, it’s UB since we don’t return
It actually works (everything is optimized as if the x is never 0), except the memory layout is not enforced.
We can use std::unreachable in c++ 23 to explicitly cause UB also
i use boost outcome + 3 phase initialization. sure C++ don't have unsafe keyword but that's good enough for me. I don't think most C++ programmers need hand holding while writing a constructor, "spaces" between lines of code in constructor sound like a trivial problem, because constructor should be as simple as possible, if someone write 50 lines of code in constructor that would be smelly to me.
May be i haven't wrote that kind of constructor, or work in a team like you but i think it is fine for me for now. Very nice video. i like it.
Yes, but the thing is you might do the right thing but your colleague might not...
@@Evan490BC
the argument can go all the way on every little things. safety is nice but it has cost, so it become risk assessment and trade-off consideration for specific situation. all i am saying is i have considered the risks and concluded it is not that beneficial to have further safety rail guards like unsafe keyword or prevent "spaces" in constructors in my situation/code base. There are no one-size-fits-all gloves.
iirc from a c++ weekly video that constructors are just converters (casts). You can't even take the address of constructors, much like destructors. Hence, without the explicit keyword, the constructor is used implicitly, much like how casts are done implicitly. The (old) constructor syntax (S s = S(42);) also comes from the syntax for C-style casts (float f = float(42);), so it's not designed to be for constructors (well the brace initialization S s{42}; or S s = S{42}; kinda make sense though, but it's newer). Therefore, constructors in C++ began simply as a glorified custom cast, and since this does not change as the language move forwards, constructors, which could only get some temporary patches and fixes (like the brace initialization syntax), remain broken.
btw the pattern/paradigm in 16:42 is kinda genius ngl
8:24 this is a VERY good point. the fact that you can explicitly go out of your way to do something in an "unsafe" manner, but then be able to EASILY track down when you have done this is a VERY powerful tool you can have, that also isnt babying and distrusting of the user / developer, which is a pretty pervasive culture that i highly dislike. with this, though, you not only make safer and more robust code but you also lack this inherent distrust of the developer
9:22 if you know x is not zero you can do `if(!x)__builtin_unreachable();` before calling constructor. This will tell compiler that x=0 is not possible.
I noticed a similar issue in C# in some UTs where two-step initialization is kinda forced upon and bim, what you said would happen, did: Stuff was accessed before it was initialized and, worse, it didn't crash. It was happily null and creating really hard to find bug
You don't mention if you were aware of it, but Josh Bloch talks about this same thing (as _static factory methods_ ) in _Effective Java._ He gives a lot of the same reasons for it, even though Java has strong guarantees about not letting you operate on an uninitialized object and cleanup on creation failure.
I dislike sometimes how tightly OOP type things naturally couple lifetime into things. It's nice to use allocators (mainly arena allocators) to get a block and use a factory/c-style init function to fill in a pod struct into some aligned offset into the allocator's big block of memory. People teaching c++ will squak it has to be impossible to have invalid data but sometimes you just have to accept the responsibility.
By dodging the construction, you can dodge the destruction. Just reset the allocator (that invalidates all structs placed in it) and go again for a new frame or hot loop iteration. I think c++ gets a lot better when you stop forcing everything to tangle data, lifetime, and functions into a class. Data, functions, and memory allocation can be handled seperately.
I think this is a really interesting idea, and a valid C++ criticism. The throwing constructor is a huge concern in a lot of places.
Two-phase initialization was popular in Java so we have a bunch of legacy C++ that has initialize() and finalize() methods. Really ugly. Plus we’re stuck on ‘17 so no designated initializers 😢
As always, your videos are absolutely fantastic. I can’t get enough.
Why can't you use an aggregate initialiser without a member struct?
Because you already have the pointer to uninitialized this hanging around. C++ officially says that the only way to make an instance of a class (or a struct, for that matter) is to call a constructor. Aggregate initialization is just a default constructor that takes an initializer list.
@emilyyyylime- I really wish you could, but having private data members disables aggregate initialization for your type (even from inside a place with full visibility like a static member function, which is a shame IMHO). Using the nested struct gives us aggregate initialization for M, since M's members are all public even though M itself is private inside CString. So we create M through aggregate initialization, and then just need a proper CString private constructor that moves it into the private `m` member.
A cool thing about Go is that even if you try to do this, anybody can just conjur themselves up a zero-valued instance of your type, either intentionally or by accident, making it somehow even less safe than C++.
You can do that in C++ with casting.
but at least the lang is simple enough that most everyone reading that understands that that's what's going on.
I think that there are good reasons to criticize c++ but in case of NonZeroU8 it’s not the case in my opinion. Example: In c++ you can easily create the same NonZeroU8 using std::optional, private constructor and static std::optional NonZeroU8::new(u8 x).
#include
using u8 = unsigned char;
using opt_u8 = std::optional;
class NonZeroU8 {
u8 val;
explicit NonZeroU8(u8 x) : val(x) {}
public:
static opt_u8 NonZeroU8(u8 x) {
If(x != 0) {
return NonZeroU8(x);
}
return std::nullopt;
};
It’s important to say that there’s the doom of backwards compatibility that makes thigs pretty annoying but the c++ standard is trying to make sensible changes to improve safety and simplicity of the language. For example in c++ 23 there is a strong notion to use std::excepted instead of exceptions to maintain safety with minimal overhead.
Great video, especially the talk about half-initialized instances, this is something that should have been solved ages ago by the standard.
One thing is bugging me about exceptions though. You mention that they are extremely expensive. This used to be the case, but in most modern compilers, the cost of exception handling should be basically null unless an exception is actually thrown, in which case yes, exceptions are more costly than checking the return value (mostly due to the handler being store in cold memory). I used to rely on exceptions as my sole error-handling mechanism. Since moving to rust as my main programming language, when I go back to C++, I do rely less often on exceptions, but not because of performance concerns, I simply want to force users of my API to reason about error handling with std::expected or std::optional (including explicitly saying "I want an expection if the value is not valid").
I think in some cases exceptions might still be slower than checking the return value because of easier optimizations (including calling std::unreachable in the "failure" branch of the error checking), but I have no data about this.
Something I might not have made clear enough in the video is that it's quite often that I WANT to be able to take the failure path. It's useful to have an unknown u8 value, and use the success or failure of constructing a NonZeroU8 from it as normal control flow. Exceptions are this strange form of control flow mechanism that you hope never actually takes the unhappy path--precisely because it's so expensive. I think that's a strange tool to use when there are simpler, more type-safe, and dirt cheap alternatives.
Here's a quick microbenchmark where exceptions are over a thousand times slower for a simple piece of control flow that takes the failure path. quick-bench.com/q/KDEPFXLc7746GdbKRbyIuPghLe0
I stand by my claim that this is unacceptably expensive just for a fallible constructor of a low-level primitive like NonZeroU8.
@@_noisecode I did not expect it to be 1000x slower, the "common knowledge" was that it is about 20x slower than error checking. I guess I'll only use exceptions in cases where errors are truly exceptional, like IO operations. Thanks for the answer.
Around 11:40: Personally, I would use a delegating constructor for CString: add a second constructor taking a pointer and length, and have the first delegate to the second passing the result of calling strlen.
16:10 I don't really understand why the helper struct is needed here.
Because he wants to use designated initialisers, so that he can always initialise data members in the right order.
Used to hate writing factory code whenever I could cause it felt ugly. Though I write rust now this video gave me a new perspective and I will try to use them as much as possible in the future.
Great video as always!
Small correction at 2:38 (and following slides): It is not valid to add a comma after Default::default() (or any value).
Thanks for the correction! You're right.
I'm glad I'm not the only one disillusioned with traditional 'seemingly infallible' constructors. Factory functions are a much better way to do this.
While I agree in the end, if you do have to have constructors in your language (say, for backward compatibility with a previous language, or familiarity to intended programmer audience), you can do what Swift did and make fallible constructors possible and language-supported.
@@PthariensFlame
Yes! That's also a good option.
@@PthariensFlame I dunno... I think I'd cry if C++ became any more convoluted.
What the hell, this is so good!
Moving from C++ to C#, I've never been so confident about initialization as with your Rust code.
So atomic and safe!
Amazingly informative video, as usual. Very well presented
Great video, and I thought your perspective was very interesting l! Keep at it!
Appreciate you watching! Big fan of your channel!
beautiful! thanks for making this!
I don't believe that comment about not having a valid S until the function returns is quite true; the standard guarantees we have a fully-initialised object by the opening brace, and all subjects must therefore, explicitly or implicitly, be initialised by the opening brace of the constructor implementation. It's fully initialised memory by the time the body of our constructor begins, and then the rest should be more about performing business logic to guarantee everything is how we want it, but it's a fully initialised and valid object at that point, just not the one we want.
At the opening brace, all data members have been constructed. The S object still doesn't technically exist. You can prove it by throwing an exception out of the constructor: all of S's data members' destructors will be called (because their construction finished), but S's destructor won't (because it never fully came into existence so it doesn't need to be destroyed).
I would argue that needing to perform extra business logic in the curly braces to ensure everything is "how we want it" strongly implies that S's invariants are not yet fully established by the time of the opening curly brace, requiring additional logic to be executed. This means that S is not yet "valid" per the definition I gave at the beginning, meaning reading from `*this` as though it's an S is potentially still just as dangerous as using it before members' constructors have run.
learning c++ as a grad student I thought it was cool to know these easter eggs and irrational behaviors that could vary depending on the machine and compiler 😂 now I realise it's basically putting up with someone else's 💩
I relate to this. I used to take pride in knowing all the little subtle or esoteric stuff in C++ and thought it was a good use of my brain cycles. I took some time away from C++ and now that I'm using it again, I'm like... man, dealing with all this complexity is mostly just a waste of our precious time on this earth.
God.. I’m felling upset 😞😞😞 I do love c++ It hurt me to hear that!!
To be fair the problems with struct padding and the order/performance of the code is not solved, it's just hidden away or there are compromises made, which you can't control. I think if you need control of such small details, writing code that has to be correct is a decent tradeoff
Video of golden quality, thanks
Great insight and explanations as always
I'm not sure that constructors of primitive types in C++ do nothing - after all, `int a;` is an uninitialized local variable, and `int a{};` is a local variable initialized to zero.
AFAIK the constructors for primitive types and PODs are just not called unless mentioned explicitly in the initializer list.
One particular footgun is `std::array` for primitive/POD type T - it is such a lightweight wrapper around old C array that it is considered a POD itself by the language - that means that its elements are not initialized by default. E.g.: `vector xs;` is an empty vector of ints; `array xs;` is an array of 5 uninitialized ints (!) - to make them always be zero, we have to use `array xs{};` aka "Uniform Initialization Syntax".
The "default constructor of int" bit was somewhat of a simplification in service of my not wanting to dive into all the C++ initialization complexity like I said. But to be more precise, yeah--ints don't have "default constructors" per se, but if you don't mention them in the initializer list, they do undergo _default initialization_, which, for ints, is defined as doing precisely nothing.
Default initialization is the same type of initialization performed by the syntax `std::array xs;`. In this case, since std::array is a class type, default initialization calls its implicitly declared trivial default constructor, which does nothing, and we end up with an array of uninitialized ints. The syntax `std::array xs{};` on the other hand does _value initialization_, which has the difference that it zero-initializes primitive types (and aggregates thereof), so you end up with an array of all zeroes.
@@_noisecodeThanks for replying!
I've never actually read the C++ Standard myself, all I know is from specialized books and from experience working in the language. Your comment clarified things a bit.
And it's a great video that explains in a very accessible manner why in modern PLs like Rust there are no constructors as we know them.
after learning all of c++'s initialization rules i didn't realize it was such a horrible nightmare... and for no justifiable reason. most of the complexity comes from backwards compatibility and lack of foresight.
tbf, the language concept of "moving" stuff (instead of just copying all the time) was added to the language after about 30 years (and still before Rusts came into being; what do you think where they got it from?)
The C++ committee is full of brilliant people doing their absolute best, but yeah backwards compatibility is the real killer. C++11 tried really hard to simplify initialization with uniform initialization, but it wasn't perfect (ahem std::initializer_list), and now we are stuck with its warts forever, and the sum total is that uniform initialization permanently added complexity instead of simplifying.
When C++ was created, people just didn't have decades of experience with this sort of things. I guess technically you can call this lack of foresight, but I think no one can be expected to have that level of foresight.
@@__christopher__ the thing is that C++ tries to be 100% backwards compatible, the lack of foresight back then could've been justifiable if the language were to later undergo some fundamental change; but it didn't, and that's why it can be a pain in the ass to use nowadays. It doesn't seem like it'll ever be any different, and at this point maybe it's better to let it slowly die off and be replaced by Rust or even Go.
Spoiler alert, in 20 years people will say that [xyz] is such a better system than borrow checking and how come the Rust developers had such little foresight to build such a silly language. I mean personally I think that Val's system looks nicer to me than Rust even today. In any case, making programming languages is hard. Maintaining a programming language so that it stays relevant for decades is even harder.
[0:47 thanks for mentioning us - the neither ones too (:
It's a neat idea, but I don't see how you can make it work in practice unfortunately:
That static factory function returns an `std::optional`, this is all well and good (I don't see what better type to return without exceptions), but you'd like to get rid of that optional wrapper once you've made sure that optional contains a proper `T` and not `std::nullopt`. That means either copy-constructing or move-constructing a new `T` from the `T` inside that optional. Copying will likely be either inefficient or even incorrect if `T` is noncopyable (e.g. if `T` is a wrapper around a file descriptor). Which means we need to be able to move-construct a `T`, so an object of type `T` can be in an empty moved-from state, that is, one of those unusable states we intended to avoid being representable by using that factory method over the constructor in the first place.
It'd have worked with a destructive move, but we don't have that.
Bummer.
That means you'd be doomed to drag that extra `std::optional` everywhere which kinda defeats the point of having all objects of type `T` always being valid since they're now replaced by `std::optional` (which can be invalid by definition). At best, you now know that if you pass a `T&` or `const T&`, you know it's a valid object (well, unless that reference is somehow dangling, but that would be unrecoverable anyway). Still the caller will have to work with an `std::optional` instead of a `T` and either ignore nullopt checks (i.e. "trust me, I know this optional contains something") or add extra unneeded checks everywhere (slight performance loss, less readable code).
Still, interesting comparison.
I believe what you're meant to do is create the optional, but then soon after you need to unwrap it: if it's None, report some error, if it's Some, move the inner value out of the option. If it feels silly because you've already validated the invariant, then you could call an unchecked factory... but prefer to refactor your code so that the checked factory becomes your validation.
Then again, it's been a loooong time since I've done any C++. I will say, I'm usually of the opinion that the performance cost of keeping the invariants is worth the code not segfaulting in production. Obviously, tight loops are an exception, but before any optimization: take measurements!
@@okuno54 That's the issue with C++ community nowadays. They seems to forget that it doesn't matter how fast something doesn't work.
In an older more imperative language such as C++ doing things this way would probably mean overhead as the code dictates that you first declare the variables, reserving memory, then operate on them and then copy them by value to the struct memory when constructing the object. I assume Rust is smart enough to see that a variable will be directly used to initialise an object later in the function and places it in the destination address straight away. Which is what the C++ thing with an output pointer pointing to a partially initialised object is doing very explicitly, as C[++] does.
Though I guess probably nothing in the C++ spec (besides the actual current constructor rules) prevents the compiler from doing the same when noticing a function returns an object and sneakily allocating the declared variables exactly where it knows it will place the object? Or maybe I'm wrong and they've been doing that for a long time?
Yep, you're describing [Named] Return Value Optimization (a.k.a. copy/move elision), and it's not only allowed by the C++ standard, but it's actually required in many cases as of C++17.
@@_noisecode huh, nice
This all sounds like you actually want to be doing functional programming 🙂
But yeah, this would probably serve as great context to a talk on referential transparency
Why are out pointers bad in cpp?
I do this for a lot of reasons, including not needing to rely on exceptions. Except you cannot RVO at all if you return optional / variant types, which means you most likely will need a move constructor for a lot of your heavier types, and need to delete the copy constructor in order to not get bit.
C++ constructor wrangling is soulcrushing sometimes, feels like a circus.
Optional and variant do both support in-place construction which can give you RVO if you do things correctly/carefully. The WithResultOf/“Superconstructing Super Elider” I mention near the end can help with this, I’d recommend checking that out in more depth.
One thing about Cstring class for C++:
Is a valid way of solving that problem not to push the responsibility of finding out the string length from the constructor to the callee, and pass it as a parameter?
The isAscii function could be ran on the inputted pointer too, and then you should be able to get the invariant statement straight from valid arguments.
Watching these footguns made me feel like between the wall and a knife. You can't get a realistic solution in C++ without commiting serious trade-offs. It really makes Rust feel magical because the design choices give a feeling of a sturdy foundation for cohesiveness and "Code makes sense" sort of effect.
15:00
There's a much simpler way. Just setup a linter rule that enforces that "this->" must be explicit (Very common already), and then a linter rule that enforces that "this->" can only be used for the last N lines of the constructor, and only ever as the first 6 non-whitespace characters of the line.
i.e., you enforce that you first do everything at the top, and then you just do this->a = , this->b = , this->c = , at the bottom of your constructor.
You could probably whip up a fairly robust regex in short order, or a tree sitter implementation if you want it to be perfect.
I guess simpler is in the eye of the beholder. :) Could be a neat idea. It still requires that all your members be default constructible (and ideally cheaply) so that they can be constructed before the opening curly brace and then assigned later.
C++ and Haskell are the best
Can't type constraints/concepts be used to enforce x != 0?
Best programming video I've seen in a while.
Brilliant video 🦀🔥
Awesome video, I really like how clearly you explain and compare things.
I would love to hear more, especially a Rust tutorial for C++ developers. Some concept in Rust are just so alien to me I can't grasp this language properly (like how would you control a global list of objects for some important thing in project well).
8:11 A non-rust programmer here. How is it possible for a safe function to have non-safe code in it? Wouldn't that just mean that the safe function is not safe?
If there is a bug in the safe initializer and zero gets passed to the unsafe initializer. Wouldn't that cause an issue?
This is a really good question!
So, since the unsafe code can only execute in the branch where we know it's not zero (we checked with the match{} statement), the unsafe block is actually known to be safe. This is the correct way of using unsafe blocks in Rust: carefully check that the preconditions are all satisfied, and then open the unsafe block. (In that way, 'unsafe {}' is a bit paradoxical--the keyword should really be like `i_have_checked_that_this_unsafe_operation_is_safe_in_this_case { foo() }`).
This is really the essence of the interplay between safe and unsafe Rust. We have low-level operations that are generally unsafe, and then we write safe APIs on top of them by wrapping them in structures that check or otherwise enforce the unsafe operations' preconditions.
If you have _bugs_ in those safe wrappers (like if we screwed up the match{} statement), then yes, you can cause safe code to have UB. This is generally regarded as a very serious bug in Rust, and so you'll usually see a lot of care and pains taken around unsafe blocks that are in safe code (like huge comment blocks explaining the reasoning why it's safe).
@@_noisecode Thanks!
new_unchecked isn't 'necessary' for performance, you can do Option::unwrap_unchecked and any checks (aka paths returning None) will be dead-code-eliminated. In this case the constructor already did nothing, as None is represented as 0
One notable issue with factory functions in Rust is the inability to have fine-grained control over how and where an object is allocated, which can result in unneccessary copying.
If I'm not misunderstanding what you're talking about, I think have to disagree.
When "making" a struct by filling in all of it's fields, you statically allocate that memory on the stack.
You can also choose to dynamically allocate on the heap by using something like Box.
Inside a function, once a struct is created, you can move it out of the function which, as far as i know, is always (or at least in 99.9% of cases) basically a zero cost operation.
A type is only copied when explicitly calling `.clone()` on it, or when that type is `Copy` (true for most primitives).
@@potatomaaan1757 I think you are misunderstanding. To clarify, the issues are kinda twofold. One is that because objects (in the generic sense of "entities", not objecs in the C++ sense) are created inside of a function that has to return them, there's no idiomatic way in the language to construct on object off-stack: your struct is created on stack and then moved off the stack into a Box or wherever else if that's what you want. You could make a specialized new function that lets the caller pass memory in, but that isn't standard in the ecosystem, so you have to rely on compiler optimizations to eliminate that copying step, which are not reliable (this is an acknowledged issue inside of the Rust project, and there's slow-but-ongoing work to improve things), whereas in C++, because allocation and initialization are separated, the caller can decide to allocate the memory itself and use placement new to create the object, or use a custom allocator, even if additional allocations occur during the creation process. Rust additionally doesn't handle allocation failure very well and doesn't have support for custom allocators yet.
And to be clear both the all-on-stack allocations and the lacking support for custom allocators are seen as limitations in Rust-Custom allocators are in nightly right now, but because Rust uses factory functions to create objects, support will have to be specifically added to any datastructure that performs an allocation. As for the copy-out issue, work is being done to make the optimization consistent and there's been discussion of a language feature to allow for functions to explicitly request/demand the optimization, but that's all still WIP.
I'm hoping this all does get resolved, but it is definitely a challenge with the design Rust chose here. There are a lot of issues with C++'s design, but the video covers those and they're discussed a lot more generally.
I haven't used Rust or C++ but I'm a fan of the way Dart deals with initialization.
15:50ish, could you just declare a (private) constructor taking every param and doing nothing? It would be slightly annoying, sure, but:
private:
CString(unique_ptr _buf, int _len, bool _is_ascii) : buf(_buf), len(_len), is_ascii(is_ascii) {}
...
create () {
...
return CString(buf, len, is_ascii);
}
I'm not sure I would call these associated fns "factories." The term "constructor" seems to apply equally well. My understanding of factories from OOP is that they are objects whose sole responsibility is constructing other objects (especially abstract ones with multiple implementations). I think there's a key distinction in that factories hold onto configuration data while constructors are simply pure functions.
There is an existing design pattern more commonly referred to as "static factory method" or "static creation method" and is a useful simplification of "factory method" when factory interface or constructor arguments aren't necessary. That is essentially what is being called "factory" for short here
I misunderstood the idea for the first few minutes of the video because of this as well. I have only heard the factory pattern used to mean "object instance that instantiates other objects", and a (static) "factory" like Vec::new() is often just refered to as a constructor in rust contexts because that word doesn't mean anything else within the language. I am more keen on saying "c++ constructors are something else" than "rust doesn't have constructors, and instead has static factories" as a general rule unless you are actively talking to a c++ person.
I mean yeah but this is just bikeshedding. Arguing about the naming of such basic things is just waste of time.
How does the nested structure way of holding member data play with inheritance? So far I've noticed it doesn't really.
I actually read the assembly for all the C++ parts in godbolt, he is right 😄