You're funny. 14 of the 16 minute go to increasingly abstruse (but totally fascinating) stuff about unions, followed by the lesson that one shouldn't do this. Indeed: optional, variant, any, expected are much nicer.
For someone like me not being a senior developer, I am more interested in learning the reasons when to use Unions. What problem do they solve and when is it good practice to use them? Maybe worth a Part 2 episode? 😊
Short answer: when you want a space efficient way of storing an object that might be one of several different types. Generally speaking, today, you should use std::variant in those cases, not a raw union. As I showed, it's hard to get right.
I *think* that if you have a union of say, two structs with common initial sequence of object (same type), you are allowed to de reference values of this common sub sequence. Say, union U { struct A {int A_; } an_a; struct B {int B_; } a_b; }; U a_u; a_u.an_a.a; and a_u.a_b.b; is legal. This is the underlying principle of many C libraries, say when you manipulate addresses using the typical sockets API or Windows Bitmap header as an example. I would be appreciate input on the subject.
From the standard (section 12.3): [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (12.2), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see 12.2. - end note ]
When you call destructor "s.std::string::~string();" What is this function call exactly and how to read this? I thought you would call destructor for local variable with "s.~string()". What am I missing?
First time I've seen this syntax too, I assume it means the same, but specifying the whole type/signature of the function iso depending on the context/object it is called on. Not sure though.
Yeah I'm just fully qualifying the name of the destructor. Nothing fancy. In that context I couldn't get it to compile without fully qualifying it. I'm not 100% sure why.
union AllMessageParametersThatExist {...}; History makes it impossible. In 16-bit Windows, some messages actually put two pointer sized items in LPARAM.
Because constructors and destructors aren’t called for objects in unions, as the compiler has no general way of knowing if the object is the active member of the union.
One thing I don't understand is this idea of the "active member of a union". At runtime I don't think this concept exists, if you modify the int and you access the float, you get the float that is represented by the same bits you set on the int, and that's totally normal behavior and there's nothing UB about it as far as I know. How come at compile time these rules change? Feels like union at compile time just cannot really be sensibly used.
That is absolutely undefined behaviour in C++, and just because it happens to work with your current compiler is no guarantee that it will work in the future. And if the compiler can tell that you’re reading the inactive member, it is quite at liberty to alter that to whatever it wants if it thinks that’ll make more optimised code. If you want to bitwise transfer a float to int or vice versa, use memcpy or std::bit_cast in C++20
@@StuartDootson Yeah this is my big gripe with c++. That most compilers currently behave "reasonably" indicates to me that reasonableness is possible. But the standard insists on calling this undefined behavior which is really frustrating. Many examples like this and they don't seem to be interested in fixing it.
@@sirhenrystalwart8303 it’s not just c++ - it’s c as well. C could have standardised 2s complement arithmetic (which is the de-facto hardware standard for signed integer representation), but for the sake of portability and backwards compatibility, they chose not to define how signed integers work when they overflow, and chose instead to effectively say ‘just don’t do it’, and haven’t changed that stance even though the computing landscape has changed significantly since C was first developed. See, back then, there were a few 1s complement computers. You couldn’t rely on ASCII, as IBM mainframes used EBCDIC. It made sense for C not to define behaviour in those areas where the platforms fundamentally differed, so as to allow efficient code generation on all platforms. Does that make sense now? Maybe not… unless you still run a PDP-1 or CDC-6600 and you need a C compiler!
What you're describing is extremely common in low level/embedded programming, and it's definitely the nicest solution in a lot of cases, but it's still UB. In my experience, you aren't going to convince people to do something more cumbersome on the off-chance a future compiler disregards common practice.
yeah, I do think many c and/or cpp codebases depend on UB in some way. I think one pragmatic way to deal with it is to have an automated test that confirms the 'reasonable compiler behavior' the codebase depends on, at least if you know that you are depending on UB. Use UB sanitizers to minimize the issue as much as possible (in combo with fuzz testing if possible). Most of embedded code can be in a x86/ARM compilable/runable lib that can easily be tested.
Is using unions to convert types functionally equivalent to reinterpret casts on local variables? In HLS we make local copies and use unions to convert types because reinterpret casts are unsupported. I understand that reinterpreting pointers with fixed hardware is not trivial but I don't understand why the compiler can translate conversions of local variables with unions bot not with reinterpret casts.
Yes, it is morally equivalent to a reinterpret_cast. As I attempted to very clearly state in the episode, you *can not* access the non-active member of a union without invoking UB.
You gave me idea to make struct represente Union and so we can convert bits from type to other type without making a Union. I always watching you in Cppcon and other
It's nice that we can't have UB in constant expression context due to C++ standard expr.const. If we can write the code that will executed in constexpr context we'll have no UB in this code. I'm starting to think it's safer than situation in some other language in which UB can take place not only in some blocks but because of these blocks (somewhere safe)
The short answer is that you use a union-like type when a value can be 1 of several different types, and you want to store that in a memory-efficient way. You should prefer std::variant in those cases today. Use the tools the standard library gives you.
Love the music selection @13:00! Thumbs up for the music alone!
You're funny. 14 of the 16 minute go to increasingly abstruse (but totally fascinating) stuff about unions, followed by the lesson that one shouldn't do this. Indeed: optional, variant, any, expected are much nicer.
Sometimes you're still stuck talking to C code and its unions tho...
@@reverendragnarok Sure, but those unions don't have constructors.....
I learned a new word (abstruse)
That's how I teach. I show you all the pitfalls of the common way code is written then show you the simple correct way. So now you believe me!
6:29 Sometimes I think, that C++ does not want you to program in it.
The dangerous thing about unions is how well the undefined use-cases for them work on any compiler.
For someone like me not being a senior developer, I am more interested in learning the reasons when to use Unions. What problem do they solve and when is it good practice to use them? Maybe worth a Part 2 episode? 😊
Short answer: when you want a space efficient way of storing an object that might be one of several different types.
Generally speaking, today, you should use std::variant in those cases, not a raw union. As I showed, it's hard to get right.
Unions and bit-fields are great combo that can replace use of bitwise operations.
Why not call constructor of string inside of constructor of union using : s(s) syntax?
I *think* that if you have a union of say, two structs with common initial sequence of object (same type), you are allowed to de reference values of this common sub sequence. Say, union U { struct A {int A_; } an_a; struct B {int B_; } a_b; }; U a_u; a_u.an_a.a; and a_u.a_b.b; is legal. This is the underlying principle of many C libraries, say when you manipulate addresses using the typical sockets API or Windows Bitmap header as an example. I would be appreciate input on the subject.
From the standard (section 12.3):
[ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union
contains several standard-layout structs that share a common initial sequence (12.2), and if a non-static data
member of an object of this standard-layout union type is active and is one of the standard-layout structs, it
is permitted to inspect the common initial sequence of any of the standard-layout struct members; see 12.2.
- end note ]
Shouln't you wait setting is_active until after construct_at, in case of an exception
Probably. But I'm also pretty sure I told you to not use code like this :D
Anonymous unions used to be cause a lot of GCC / clang warnings, is that no longer the case?
When you call destructor "s.std::string::~string();" What is this function call exactly and how to read this? I thought you would call destructor for local variable with "s.~string()". What am I missing?
First time I've seen this syntax too, I assume it means the same, but specifying the whole type/signature of the function iso depending on the context/object it is called on. Not sure though.
Yeah I'm just fully qualifying the name of the destructor. Nothing fancy. In that context I couldn't get it to compile without fully qualifying it. I'm not 100% sure why.
What I want to know is why lParam and wParam in WndProc aren't unions.
Unions are evil, avoid at all cost!
union AllMessageParametersThatExist {...}; History makes it impossible. In 16-bit Windows, some messages actually put two pointer sized items in LPARAM.
So why did he have to manually call the destructor? Shouldn't it be called either way?
Because constructors and destructors aren’t called for objects in unions, as the compiler has no general way of knowing if the object is the active member of the union.
I have to wonder why, when literally everyone who has ever used unions uses them with UB, the C and C++ committees don't actually *fix* them.
You can use them without UB in C, and changing them would break backward compatibility.
Love the last point: just use STL.
One thing I don't understand is this idea of the "active member of a union". At runtime I don't think this concept exists, if you modify the int and you access the float, you get the float that is represented by the same bits you set on the int, and that's totally normal behavior and there's nothing UB about it as far as I know. How come at compile time these rules change? Feels like union at compile time just cannot really be sensibly used.
That is absolutely undefined behaviour in C++, and just because it happens to work with your current compiler is no guarantee that it will work in the future. And if the compiler can tell that you’re reading the inactive member, it is quite at liberty to alter that to whatever it wants if it thinks that’ll make more optimised code.
If you want to bitwise transfer a float to int or vice versa, use memcpy or std::bit_cast in C++20
@@StuartDootson Yeah this is my big gripe with c++. That most compilers currently behave "reasonably" indicates to me that reasonableness is possible. But the standard insists on calling this undefined behavior which is really frustrating. Many examples like this and they don't seem to be interested in fixing it.
@@sirhenrystalwart8303 it’s not just c++ - it’s c as well. C could have standardised 2s complement arithmetic (which is the de-facto hardware standard for signed integer representation), but for the sake of portability and backwards compatibility, they chose not to define how signed integers work when they overflow, and chose instead to effectively say ‘just don’t do it’, and haven’t changed that stance even though the computing landscape has changed significantly since C was first developed.
See, back then, there were a few 1s complement computers. You couldn’t rely on ASCII, as IBM mainframes used EBCDIC. It made sense for C not to define behaviour in those areas where the platforms fundamentally differed, so as to allow efficient code generation on all platforms. Does that make sense now? Maybe not… unless you still run a PDP-1 or CDC-6600 and you need a C compiler!
What you're describing is extremely common in low level/embedded programming, and it's definitely the nicest solution in a lot of cases, but it's still UB. In my experience, you aren't going to convince people to do something more cumbersome on the off-chance a future compiler disregards common practice.
yeah, I do think many c and/or cpp codebases depend on UB in some way. I think one pragmatic way to deal with it is to have an automated test that confirms the 'reasonable compiler behavior' the codebase depends on, at least if you know that you are depending on UB. Use UB sanitizers to minimize the issue as much as possible (in combo with fuzz testing if possible). Most of embedded code can be in a x86/ARM compilable/runable lib that can easily be tested.
Is using unions to convert types functionally equivalent to reinterpret casts on local variables? In HLS we make local copies and use unions to convert types because reinterpret casts are unsupported. I understand that reinterpreting pointers with fixed hardware is not trivial but I don't understand why the compiler can translate conversions of local variables with unions bot not with reinterpret casts.
Yes, it is morally equivalent to a reinterpret_cast.
As I attempted to very clearly state in the episode, you *can not* access the non-active member of a union without invoking UB.
You gave me idea to make struct represente Union and so we can convert bits from type to other type without making a Union.
I always watching you in Cppcon and other
Jason, raycasting engine tutorial in modern C++, please!
It's nice that we can't have UB in constant expression context due to C++ standard expr.const. If we can write the code that will executed in constexpr context we'll have no UB in this code. I'm starting to think it's safer than situation in some other language in which UB can take place not only in some blocks but because of these blocks (somewhere safe)
If only we had variants/safe unions with the interface of unions...
wait a minute, this video was literally posted 1 minutes ago but the comment below is commented 1 day ago, how is this even possible?
never mind, i got it
@@nirajandata Welcome to the time traveling community!
That comment must have been created at compile-time.
I didn't know before that C++ unions are allowed to have member functions - thanks Jason.
Wow. Coding in modern C++ is like playing classic Tomb Raider games.
if you use, std::construct_at,/destroy_at, you are not far of using std::address_of to, for completeness' sake.
The short answer is that you use a union-like type when a value can be 1 of several different types, and you want to store that in a memory-efficient way.
You should prefer std::variant in those cases today. Use the tools the standard library gives you.