Support the channel by trying Brilliant at brilliant.org/LowLevelLearning for FREE and get 20% off when you sign up! Thanks again Brilliant for sponsoring this video and your continued support of the channel!
Here's a few video ideas: * Hardware bugs and their workarounds (e.g. F00F, FDIV, TAS on Amiga/Genesis, DPCM glitch on NES, etc) * How to program in assembly without losing motivation (me irl 😭) * How emojis are represented in binary * Unicode errors in general (e.g. rÃ(C)sumÃ(C) (I don't know how to type the copyright symbol on my phone sorry!) * Why it's so easy for a human to recognize spambots in chatrooms but computers can't do it well * How an EXE file works
You don't even need a switch statement for enums in Java, you can make frankensteinian monsters known as an abstract enum. They're usually used for adding extra data for the enum members (with a custom constructor), but you can also add abstract methods to the enum, which you have to then implement for each member. This leaves you at a simple method call on your enum variable without switching for possible values. 😂😂😂
@@joeystenbeck6697 this is something I would like to see @lowLevelLearning mention in the video. I haven't benchmarked it, but my gut feeling tells me the difference in runtime complexity is just pushed down to compile time, i.e. increasing time needed to compile the program in order to get a better performance in runtime. In some situations this is undesirable (or less desirable) and the program would be better off running a bit slower.
@@PrivacyRevocation in 99% of the time you rather want longer compile times than longer run times. And I'd assume that if statements are more difficult to optimize?
Yes, I was disappointed that he glossed over stuff referring to things as "magic numbers" and "a bunch of garbage". The whole *point* of the video was to explain the details behind the switch statement. The valid user input are letters: "c", "d", "e", "n", "q" (which he defined earlier in the video). Note that the first letter "c" is ASCII 0x63 and the last letter is "q" is ASCII 0x71. Note that the span from "c" to "q" is 0x0e (i.e. 15 characters away). The code at 7:51 shows a transformation and a check. Line 123b sub eax, 0x63 ; this subtracts letter "c" from the entered letter, so the range of inputs is now indexed from 0. Line 123e cmp eax, 0xe ; this compare checks if the input letter is past letter "q" since letter "q" is the 15th letter (i.e. 0xe) from letter "c" At this point, the character range from "c" to "q" has been transformed to an index range of 0 to 14. If the entered character is *OUTSIDE* this range the code terminates, bypassing the jump table stuff, and goes to process with the "invalid" code not shown in the video. Line 1241 ja 12aa ;exit the jump table code block as the entered character is outside the range of the defined command characters. If the entered character is *INSIDE* this range, it proceeds to the jump table code. Line 1245 lea rdx,[rax*4+0x0] ;Load Effective Address. The index value is multiplied by 4, since each jump table location is a 4-byte offset added to the table's base location. The code at 7:58 is really just the jump table values. (Unfortunately the disassembler produces nonsense code, but the offsets are valid.) Rearranged, the offset values are given below and matched against the entered letter: 𝟶𝟶: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝟸 ; "𝚌" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟿𝟼 𝙲𝚘𝚗𝚝𝚒𝚗𝚞𝚎 𝟶𝟷: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟸𝚊 ; "𝚍" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟽𝚎 𝙳𝚎𝚕𝚎𝚝𝚎 𝟶𝟸: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟷𝚎 ; "𝚎" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟽𝟸 𝙴𝚍𝚒𝚝 𝟶𝟹: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚏" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝚌𝚘𝚍𝚎 𝚋𝚕𝚘𝚌𝚔 𝟶𝟺: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚐" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟶𝟻: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚑" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟶𝟼: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚒" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟶𝟽: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚓" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟶𝟾: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚔" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟶𝟿: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚕" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟷𝟶: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚖" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟷𝟷: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟷𝟸 ; "𝚗" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟼𝟼 𝙽𝚎𝚠 𝟷𝟸: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚘" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟷𝟹: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚙" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝟷𝟺: 𝚏𝚏 𝚏𝚏 𝚏𝟸 ?? ; "𝚚" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟾𝟿 𝚀𝚞𝚒𝚝 There is no particular reason that the offsets are negative, it's just that the compiler chose to put the jump table _after_ the code fragments. In fact, referencing the offsets to the JUMP table *is* a bit odd. Usually the base address for the jump table is the FIRST ROUTINE. (in this example the "New" routine at address 1266 is the first rountine...as shown at 8:52 ) And because it's first, It would have an offset of simply 0. The rest of the routines are reference from there (the FIRST routine). In the given code each entry point is just 12 bytes from the previous routine. 𝙹𝚞𝚖𝚙𝚁𝚘𝚞𝚝𝚒𝚗𝚎𝙱𝚊𝚜𝚎𝙰𝚍𝚍𝚛 𝚒𝚜 𝟷𝟸𝟼𝟼 (𝚒𝚗 𝚝𝚑𝚎 𝚐𝚒𝚟𝚎𝚗 𝚎𝚡𝚊𝚖𝚙𝚕𝚎 𝚊𝚝 𝟾:𝟻𝟸) +𝟶 𝚏𝚘𝚛 "𝙽𝚎𝚠" +𝟷𝟸 𝚏𝚘𝚛 "𝙴𝚍𝚒𝚝" +𝟸𝟺 𝚏𝚘𝚛 "𝙳𝚎𝚕𝚎𝚝𝚎" +𝟹𝟼 𝚏𝚘𝚛 "𝚂𝚝𝚘𝚙" +𝟺𝟾 𝚏𝚘𝚛 "𝙲𝚘𝚗𝚝𝚒𝚗𝚞𝚎" With those simpler and smaller numbers, the example would have been easier to follow, but we're at the mercy of the compiler (and crappy disassembler). As was pointed out, if the cases are "sparse" then the jump table will have a lot of those "invalid" entries. It still executes very quickly, but just takes up some memory. Input letters *OUTSIDE* the range are culled out and terminate quickly. Letters *INSIDE* the range all result in the jump code being executed, even if it is just results in an "invalid" stub.
Before everyone decides to switch to switch statements😊I think it’s good to point out that switch statement CAN be faster, but it really depends on the use case. Like you mentioned, when you only have a few conditions, if/else might actually be faster. Also in your example the gaps between the enum values are small, but when very large gaps are used, the compiler might opt for a different strategy, which can again be slower than a few if/else statements. And finally, switch statements might be faster, but have their own caveats. Which can lead to slow development. It’s so easy to forget a break, which leads to an unexpected fall through, which can be real nasty bugs to trace. It’s easy to forget to handle the default case. Also in C/C++ switch statements cannot be used for strings, which is annoying to say the least. And the switch cases are not scoped, unless you add curly braces; which again can lead to unexpected behavior if used improperly.
It really depends on the use case, squeezing more than 1 line under each case can be hard to read and ugly. And so often the conditions are not as simple as a case. So I think its use case can be quite narrow.
If you use enums, the gaps should be quite small. Generally I'm happy to let the compiler figure out if it thinks the switch is faster or not, but I prefer switches because I think it looks prettier and I sometimes like to know if I'm missing a value by looking at the warnings when I remove the default statement. (I tend to write java or rust more than C, so the precise warnings may be different.)
Embedded dev here. The compiler impressed me a couple years ago. I had a static constant std::map of enums to functors and i changed where it was used to a switch calling the functions directly and it compiled into an identical object file.
Using the switch can also improve code safety by enabling the compiler warning/error for switch statement not handling all values in your enum. So in future, when you add another value to your enum, the compiler will tell you about any switch statements you forgot to update
I would have imagined that modern compilers with optimizations turned on would be smart enough to choose to translate the if/switch to the appropriate assembly if/switch depending on the specific code.
Most of them are. Not sure what compiler he used with what settings. In some cases they can convert if-elses to jump table, like they do in case of switch. In some other cases they might decide to not generate jump table for switch, because multiple compares would be faster or much more memory efficient (in terms of produced code).
Yep, to many compilers it doesnt matter at all whether you make a switch or if statement. I guess the switch just happens to be easier to optimize so its more likely to be faster if we choose any random compiler. In addition to O(1) jump tables and O(n) if else chains some compilers actually uses hardcoded binary search to find the correct case by repeatedly cutting the range in half which is O(logn) in average. I guess that can be good middleground between those solution. Most compilers change the implementation for if and switch statements depending on the amount of items to be compared and the type of data being presented (consecutive numbers/something that translates to them like enums, random looking numbers without seemingly any pattern etc.)
You forget the human here. If I have a long if-else chain that I carefully created with only fixed comparisons to a single variable The compiler can optimize that to a jump table. The next developer working in this file adds a if-else that breaks this pattern, and buum performance drops for no obvious reason (unless you know if this compiler optimization). By using a switch I tell the next developer my intention, and she/he needs to make a conscious decision to change this concept.
This is the correct answer. People who write modern optimising compilers are generally smarter than the average programmer. It's _relatively_ easy to look at an if-else cascade and analyse if it's evaluating equality on the same variable and is, therefore, functionally a switch statement. One only need look at Python: Guido explicitly excludes a switch statement as he _knew_ this was possible.
The trade off point between if and switch will depend on your target processor. A processor cannot look ahead with a jump table. Meaning it cannot keep the pipeline full. A processor may have branch prediction to try and keep the pipeline full. Lookup branch prediction and jump for your target processor. Only benchmarking will tell you the tradeoff point. Good luck.
To be fair, if the if-statement can easily be transformed into the switch statement as you did in the video, any reasonable compiler should output the same asm as the switch statement.
@@piisfun What? Modern compilers are perfectly capable of generating jump table from cascaded if statements as well as converting switch statement into series of comparision depending on whatever is faster in certain situation and are able to do it compiling much higher level language like C++.
yeah with the example code yes, the problem comes when it's organized a little differently and the if/else chain isn't always comparing the same variable to the same value. (like, wouldn't be compatible with a switch). Easy thing for someone to do accidentally coming in to old complex code and adding another case and suddenly it performs a lot worse. So with modern compilers it's more of a protection against myself rather than intentional optimization imo. Though there are use cases where you need to be intentional about this, like some embedded stuff.
In my first design review at Novell, my team lead said I should use a switch statement for true/false tests rather than if/else. So I coded both and looked at the generated code - they were identical.
Compilers since 20+ years ago have been smart enough to ignore your preferences as a source code writer and just do the right thing at the machine code level. I used enjoy trying to outsmart compilers, to make 3D renders run faster by studying the machine code output, but gave that up long ago. We have created tools smarter than ourselves!
If the if-else chain tests something that can be trivially changed to a switch, it's better to do so, simply because it communicates intent better, and gives more well-structured code. Or, put another way, a long, simplistic if-else chain is just a plain eyesore, IMHO. It doesn't necessarily matter if the compiler can magically figure your intent out in this specific case; in other cases, bad code (which this smells like) can actively hinder it from figuring things out and making optimisations. TL;DR: Don't make it any harder for other people to read your code than absolutely necessary, and don't expect your compiler to perform magic on weird code, even if it's _often_ able to.
@@mnxs It's most useful when you have a situation with multiple checks that are simple tests and the actions to take are mutually exclusive. Presumably in this situation those code blocks can be treated like an anonymous function and reduced to something akin to BASIC's IF ... THEN ... of the form IF condition1 is true THEN gosub 100 IF condition2 is true THEN gosub 200 IF condition3 is true THEN gosub 300 P.S. That said you can "optimize" your If-Else constructs by ordering the checks in terms of the most common conditions you expect to be true. Switch statements are to be preferred when that ordering/ratio is completely unpredictable. Or, as he puts it in the video, they are all equally likely.
1. which compiler 2. what compiler options (if you have comparet on no optimization flags, then good luck, it's not even production code) 3. what language version. This whole if / switch comparison makes litle sense, compilers this days will chop your code in sutch a way that you won't make heads and tails of it, so i assume adding -o3 will make bot comparable, but then again, you won't be able to read through ASM
That misses the point. The point is that the main reason switch statements exist is due to this optimization. A compiler turning an if-else tree into a jump table is explicitly a compiler optimization. A compiler turning a switch statement into a jump table is just what the compiler logically should do.
@@vibaj16 I did find a cases where switch cases won't get turned into jump tables. So it's not as logical as it seems. If for example you have non-equally spaced case statements that have very large gaps in between (like some random numbers like 1, 260, 1070, 24080). In those cases the jump table would be gigantic and even with no optimization gcc and clang (with -O0) seem to refuse to use jump-tables in those cases. It makes sense but it makes me doubt that this optimization is the main reason for the existence of switch statements since the compiler obviously still integrates it's own decisions even on -O0.
PSA: It does not matter.. modern compilers have agressive optimizers that transforms if to switch and switch to bit-tests whenever it is valid && profitable speed/size wise to do so.
I believe it's a method to prevent nesting. It gets unusable at the depth of 3, so it's motivating to extract and write separate functions/methods/classes.
@@moorscode that is actually the first positive thing I heard about such a big amount of spacing and it totally makes sense. Maybe I should try to get my coworkers machines to look that way xD
tabstops at 8 spaces is.... a very long-standing de-facto standard. I don't care what you set your indent level to, but on a modern computer, there's very little reason to make your editor necessarily use hard tabs of a different width, and plenty of reason (IMHO) to maintain the idea that tabs are 8 spaces (or rather, align you to a column number divisible by 8), for compatibility with existing things.
This is correct. Common subexpression elimination is one of the first optimizations I learned back in the 1970s while in college as it can be applied early while doing lifetime and flow analysis before code generation
This optimization can only be applied on if statements who conditionals strictly operate on integral types and never check said integral types against a range of other integers (e.g. > or
It is a neat mechanic, but disappointingly rarely actually relevant. 1. You need to have a decent number of branches to actually notice a difference. Even with 5 options, it's still only a minute dufference. 2. If you have a huge number of choices, you will often implement ot as some kind of hash map or state machine anyway, rather than if/else or switch. 3. The speed of the branching is only relevant if the code that is executed in the branches is also fast. If you're doing a printf at the end, then your improvement by using a switch will only make up a fraction of a percent between hitting the control structure and actually seeing the text on screen. That said, these niche cases where the speed gain is relevant of course still exist, just not in many places. I guess I'm kinda traumatised about this topic over the whole YandereDev incident where everyone and their got hung up on the point that the code was bad because it used if/else instead of switch, even though that was the least of the problem with that code. Ultimately it turned out that their performance issues came from calling an expensive UI function multiple times per frame and the use if heavily unoptimised graphic assets.
scanf("%4s", optionbuf), from man scanf: An optional decimal integer which specifies the maximum field width. Reading of characters stops either when this maximum is reached or when a nonmatching character is found, whichever happens first. Most conversions discard initial white space characters (the exceptions are noted below), and these discarded characters don't count toward the maximum field width. ***String input conversions store a terminating null byte ('\0') to mark the end of the input; the maximum field width does not include this terminator.*** Also, your while loop condition is UB on the first iteration.
actually an if else tree might get optimized into a jump table by the compiler if there are many branches, so using an if else tree or a switch really doesnt matter with an optimizing compiler. Though its good to know when to use what as it can make your code much more readable.
well perhaps, but people that care about such permeformance improvements are working with embedded devices like arduino etc. Everything is going to be quite basic.
This depends on the nature of the if else tree. It can only perform the jump table optimization for if-else branches if 1) The conditionals only operate on basic integral types 2) The conditionals only test for equality or inequality, and doesn't use >= or
@user-li2yv5je5e I think you missed the point. ISA on an arduino is vastly different than x86-64 ISA.. or even x86-32 ISA. The performance profile is going to be vastly different. This is low level stuff that I doubt 99% of the people here understand, unless you work on compiler development. Joso997 apparently gets it. I've seen plenty of people come from 'modern' development teams, into 'tiny MCU' world and not understand this.
you should use compiler explorer if you want to show whats going on in assembly, it has a much nicer view and will even highlight what gets translated into what assembly instructions.
I just assumed it was like: Table address = 1200 q = 71 d = 55 So it just reads the value (q) and adds it to 1200 and jumps to address 1271 and executes the code there. Actually he said this part contains a negative number(?) and uses that in conjunction with the table address to jmp to code. Close enough? I think he went over this part too quickly or something.
@@user-zu1ix3yq2w he kinda explains how they jump to the exact switch statement that satisfies the condition. But there was a magic address that points to the statement in his explanation. How it calculates the address to that statement was not explained and was mentioned as "magic". I clicked on the video title to know exactly about that, but it was mentioned as "magic math" here.
@@aneeshprasobhan Oh well. I think he should make another video explaining it. Obviously we could just look up jump tables and learn about them, but I'd prefer he just explained it in more depth. I've always found magic numbers fascinating. I think a jump table is a natural solution, one we would invent ourselves if we had been approaching these problems in assembly and using the goto statement, instead of using high(er) level languages. I guess what I can say is, this jump table solution for switch statements (goto table[table_address+offset]) is better for (numerically) consecutive cases, and less so for cases that are "sparser," especially for memory management.
Fun fact: * Write some code in a function, doesn't matter what * start a switch statement at the top of the function * put a "}" at the end, to finish the switch * sprinkle "case" statements in between the lines of your code wherever you want. Even inside loops, if staments, other switches etc. This is legal C/C++ and it even works, although it makes no sense at all.
Please correct me if I'm wrong, but isn't this only true with optimizations turned off? I did a quick bit of playing around on compiler explorer before commenting and it was true there was a substantial difference between a switch and a if chain with optimizations turned off but if -O3 set both switch and if result in the exact same assembly.
This. Decisions between if-else chains and switch statement should be based on what is more readable or what expresses the intend of your code better. Every major compiler is perfectly capable to see through switches and if statements and can transform one into the other if it thinks that's going to be faster. And generally the compiler will do a way better job at making this decision than the average programmer. The advice given out in this video is simply misleading and will make new programmers rewrite perfectly fine code because they now think that is will be faster (which it won't).
@@williamdrum9899 I did a bit more playing around this time with 9 cases instead of 3. The asm is almost identical, The compiler did switch to a different approach than before, this time it declared basically an array of length 26, one for each letter containing the start address of each case and then used the switch argument as the index to this array to find the correct case code, e.g. 'C' - 'A' = 2, 2 index in the array is the address for case 'C' which it then jumps to. This is the same for both if and switch. there is one small difference which is with the if version, there is a extra comparison at the begin which check the first if condition, if its matches then it skips everything I just mentioned. The switch version doesn't have this comparison but everything else is the same.
As per Jason Turner, I would also almost always avoid using the default label when switching on an enum since you could forget to adjust the switch statement when a new value is added to that enum. This should always be configured to cause a warning in the compiler, but the warning won't work when a default label is present. And yes, you'll probably have to extract the switch statement to a new function so you can handle the default case outside of the switch if it falls through, but without that label. Also another vote from me to use Compiler Explorer like Turner does when demonstrating assembly. It's just the best.
MISRA, for example, requires an explicit default. The alternative is that, unless intentional, the default should assert()... or just use a better static analysis tool/compiler, which will tell you when you have an unchecked enum value. If this kind of error makes it into the wild, you have more serious process issues.
Perhaps in embedded land where unrecoverable errors are Really Bad™, but for most programs I would argue you should always have a default case that throws or asserts. Of course, I would never recommend a default case that just silently ignores the problem. Even better is when your compiler can catch the problem for you (a la Typescript).
@@clonkexIt really depends on the context and situation. When writing code for yourself, it's perfectly fine to have an empty default or just print out a message. Where it becomes a problem is when the valid cases are strictly defined and receiving anything else is Very Bad News. If you write a compiler or assembler, you really want to know if it encounters a non-existent instruction, a math/arithmetic error (division by zero), or some other issue that could cause the program to unexpectedly crash the computer it's running on or at least hang it up badly. Otherwise it might quietly produce an executable that will suddenly crash without an obvious cause or waste a lot of computational resources on recovering from avoidable errors. And God help you if it should take garbage and produce otherwise valid but pointless instructions.
@@jnharton I agree. This is why I say you should have a default case that throws. You want to know _for certain_ that you've covered every case, and if not, crash rather than do something unexpected.
This will very likely be optimized out (for the if statement checks) if you tell the compiler to do any sort of optimization. Also, it's relevant to note that the hashing of the switch statement was so simple because you used a series of character comparisons of which each character was different. If you had multiple command inputs that matched even the first letter of the command, the switch statement would still branch. But you showed a neat topic few know about, kudos to that
Nice! A very common place to optimize with switch is when parsing command line arguments. The problem is when you need to call strcmp or some function like that for every argument, you can't do that in a switch statement. A nice trick is to separate what can be solved with ifs vs switch and use the switch first, managing to even prevent the strcmp function calls.
I wonder if you could just run a switch statement over the first character and only then do the `strcmp` (and only compare with values that actually start with that character), but that would make the code look a little ugly. :(
@@ThePC007 That's actually a very good trick, for example gperf uses that trick to parse keywords to generate hash maps. I actually use it all the time when parsing command line arguments, I check if the first character is a dash '-', and only then I call strcmp from the second character, as most command line options will start with dashes.
To parse argument options you can hash argument values into a fixed size (per value) hash board buffer (long enough to hold your argument options) where you hash the string value and then memcmp() the two memory blocks to make sure the value is the same as you wanted. Then you get an almost constant evaluation time.
@@informagico6331 That sound extremely efficient and fast, and probably the way to go for big production programs, but maybe an overkill if you're making a simple cli tool. Depending on the amount of available command line options, the simpler way could even be faster.
I think it boils down to a feature in the language. Switch statements have to have constants for the case statements. You can't use a variable in a case statement (i.e. CASE: ). So the values to be checked are all known at compile time, and the 'hashing' can be pre-calculated. 'If' statements however, could be comparing two variables whose values are not known at compile time. So even though in this example the value just happens to be a constant, the compiler doesn't have an easy way to go from assembling in each 'comparison' to trying to 'hash' if it just happens to be that one of the items being compared is a constant. Also, all the 'if' 'else if' clauses could be using completely different variables and values, whereas 'switch' statements the test parameter is always the same throughout all the 'case' statements.
so basically the compiler converts the switch statement into a sort of hash table? that way it runs a hash on the condition and immediately jumps to the corresponding branch's address, correct?
Any optimising compiler worth your time will produce identical code for either statement anyway. Usually it is able to transform the code into either lookup tables, trees of if & if/else statements, and often even branchless code. LLVM has been able to do all of this for a while now, as has GCC, and even MSVC has some capabilities here. This guy is really being irresponsible by presenting his performance advice as true when it very rarely is, and not even measuring the difference.
I ran the if statements versions if O3 optimizations enabled with g++ compiler, and it got converted to the switch version !! Compiler optimization is pretty dope
Yes, this code which reads from the same memory location repeatedly, with no other data memory accesses in between, is going to have cache misses. Do they not teach you CS guys anything about locality? Do you just see memory accesses and assume "cache could miss there"? Yes there are some edge cases, but in general, that should hit, and in many of those edge cases (interrupts, multicore, etc) the cause will cause a much larger slowdown than the single cache miss it causes in that code.
Reminds me of the time I used a function pointer rather than a switch statement (20 years ago) did not do this, but instead searched down the list like the if statement. This was for a state machine with lots of states. I knew it was inefficient. This switch-case example looks kind of like a function pointer. When I have a long-ish list of compares like this, I tend to check the assembly output for efficiency like was done here.
Might be different for out of order execution, but when your cpu is piping in instructions, it will usually be processing multiple instructions at a time. But that means for an instruction like jne, your cpu may end up processing instructions right after the jump when in reality it needed to be processing instructions at the location specified by the jump instruction. When this happens, the cpu is forced to flush out the instructions and then continue from the correct location. Branch predictors can reduce this chance but that is an additional cost of if statements.
Oh hey that's actually really cool! I've never seen someone explain how switches work at a byte level, and when you do it makes it really obvious why switches are so fast.
Nice tutorial! I know its just a test program but you should consider to clear the optionbuf with 0 before you use it since there may be arbitrary values in it and it potentially can trigger your stop condition
He also has a 1byte buffer overflow since the length specifier in scanf("%4s", optionbuf) doesn't include the null byte and completely failed to consider compiler optimizations. This video can basically be deleted
Because in many cases they translate directly to a single or very few assembly statements. However, given a properly designed compiler and language designed to be optimized, often IF statements can in select cases be translated to be as efficient as switch statements.
Well, what happens for the "default:" case, though? Does it detect if the number is out of a specific range to see it's not one of the wanted actions? Or how does it work that out?
🤓 This is my comp sci degree speaking out, but there is a case where an if/else block might have a marginal performance benefit. If your input data is distributed in a way where one or two values are the overwhelming majority, with tertiary and onward values being exceedingly rare, it might prove just barely optimal to use an if/else tree rather than a switch.
The important point is that you *MUST* profile your code. Modern processors are just so complex that you just can't make any assumptions about performance anymore.
I was afraid of for loops and used nothing but switch statements in my code. Then I finally learned for loops and have began rewriting a lot of my code to make it a whole lot better.
I like that you supported your thesis with a reference to the assembly code. I would be even happier if you ran some test code to compare the time it takes to complete 100K of switch vs. if versions of code.
I still remember reading the AMD optimization manual like 20-ish years ago and figuring out how to use that to your advantage in so many ways. Even when you have simple string constants you can still take advantage of it by pre-hashing the strings, and there was another trick in there involving multiple if branches that could be compressed into a single switch by using the compiler's own optimizations to convert a test into a simple bit and then bitwise or them all together.
@@sent4444 I can't directly link it, but just Duck the AMD optimization manual and you should find it. The Intel optimization manual is also a good resource. They both provide some really good tips if you're thinking of getting into writing assembly or a compiler or just improving your code from a higher level.
The indirect unconditional branches shown in the assembly for the switch statement actually dont play well with branch target predictor hardware in cpus so performance could actually potentially be lost by using a switch statement. Also I couldnt get compiler explorer to produce the indirect jump assembly with gcc at O0,O1,O2, or O3.
Thanks for the video, I think it would help a lot of people if you dived deeper into ASM to explain the differences between using ifs and switches. There are some things that really do need to be talked about in terms of what the computer and processor is ACTUALLY doing and switches and ifs are a perfect example. PS. I had no idea people still had tabs as 8 spacing instead of 4 or even smaller. That is wild.
I remember learning about another benefit in my computer architecture class. Modern processors use tricks like out of order processing and branch prediction to improve performance. More branches = more wrong predictions and more operations performed that need to be discarded
before i even watch the video i gotta say with my minimal experience switches are always the best i found, why have 10 ifs if you can just directly point to what needs to happen when a case matches?
I have used switch statements, and find that they lead to lot of unintended errors in writing the code which are syntactically correct but logically wrong. Like missing break declaration or default statements which is not supposed to be default after some changes in the program. Ultimately these take more time and cost more.
enum class in c++ is nicer because it doesn't pollute your namespace std::array in c++ is nicer than char arr[4] because it has built in bounds checking without overhead. better still would be std::string, because that's the correct datatype for a string. i could go on. I see no reason to use c over c++ when it's mostly backwards compatible anyway and you can just use the nice, safe and fast features of c++ over the unsafe trash that you will inexplicably write when writing in pure C. That said, mixing C and C++ will yield terrible code, so might as well just write c++ and dump all the c junk. And yes, c++ works just fine for embedded.
glad to hear THAT, because the program I've been writing has a switch/case block with (so far) 171 cases in it, and it'll probably end up with a lot more than that before I'm done
I learned something new and also realized some illustrations would be very helpful. Especially if to explain how to use an input in a table of negative numbers to subtract it and find which option to choose without it comparing two objects 🤨?
Do note, this behavior is specific to languages that only implement switch statements that only accept integral types, and only compare against single integral constants, like C. If you use C#, the behavior is different, according to Microsoft's documentation: "A switch statement evaluates case patterns in text order from top to bottom.", text patterns which can be either a constant list evaluation (in which case, if its an integral type, it can use jump table optimizations) *or* a bunch of comparison evaluations (e.g. value >= 10), which compiles down to IL that is no different than an if-else tree, so understand the behavior of your chosen language and use cases before throwing switch statements everywhere, expecting a performance boost that never manifests!
It's almost like the binary is a hash table and the switch is turned by the compiler into a very simple hash function to index(jump) to the right spot. Cool.
Pretty much. The beauty of this is that no comparisons are actually made. You basically have a big table where most options are "do nothing" so that no matter what the user types, all cases are handled
What? Is there a separate table created where jump adresses for the switch are stored? Where does it jump if no option is true? What do you mean by "we will use our input as an index into a table of negative numbers, we will subtract 'that' number and boom just go to that location?" Which of these numbers is the user input? What is this "table of negative numbers"? I don't get it at all
Very cool video, but I have some questions, first of all what if I use a switch statement to compare a value to 0 and INT_MAX, then the jump table is gonna be pretty big sacrificing both on both storage and ram, which may actually matter for weak embedded systems second compilers are pretty smart today so if you have a lot of if else statements after each other compering the same thing to something I'm pretty sure it's gonna optimize it away into a switch statement
For your first question, the compiler is smart enough to make it a branch. Chances are it will convert switch (a){ case 0: break; case INT_MAX: break; default: into xor eax,eax je IsZero inc eax je IsMaxInt default: ret IsMaxInt: ret IsZero: ret (abusing the property that INT_MAX + 1 equals zero)
Yo this is actually so sick. I saw a comment saying what is basically is... But man am I glad I stayed around for the actual explanation this was great
@@cherubin7th break breaks everything, which is why I prefer if. Also in game making, whenever I have a switch I'll need to break it because exception happen, with a if it's just inserting or changing the test. Though nowadays I just pass a fonction or an object by reference, much cleaner.
6:54 "Very long branch instructions that may invoke a cache miss on your computer" Do you mean a branch misprediction instead of a cache miss? AFAICT the if statements aren't fetching any previously unused memory so I don't see how that assembly could produce a cache miss (except of course if the thread running those branches got preempted and another thread went on to fill the cache with something else before the original thread got resumed or something like that) Or did you mean it generally? That there *could* be code that fetches uncached memory in those if statements? Also, it is interesting that the assembly code always loads optionbuf[0] again and again even though it would suffice to do only a single time after call to scanf (or at least that is what I understand movzx does there). I wonder if that has anything to do with the compiler being unsure if it can treat optionbuf as *restrict* ?
It can be hard to get a compiler to create a jump table (at least in gcc). When the cases are not consecutive values, and their span can only be represented with more than 8-bits, then the compiler considers it a waste of memory and makes an if/else equivalent. I once created a dozen fake cases using inline assembly NOPs to tip the scales in favor of a jump table.
So what your saying is it oils down to a condition jump which can writen as if val = n then goto line y else fall through and run a similar test with a different n value, again exit to y if that is true.... everything can be expressed as an if statement once you undserstand fall through. Unfortanatly a fair number of languages don't seem to be able to suport it due to not being able to call to a line or have a title part way through a function. I suspect the the switch statement is the implementation that obviscated away behind a compiler. But when we get down to brass tacks, it's just a state machine.
The issue is mostly about difference in branch prediction misses. Anytime one branch goes differently than predicted every instruction in the pipeline after branch gets squashed. For instance ZEN2 that's 18 cycles with potentially 4 instructions decoded per cycle, and for all modern PC processors it seems to be on that ballpark. The nested if statements can do that multiple times while calculated only does it once. Damn. I just needed this since I've completely forgotten to use the switch statements. Calling function pointers from array is quite similar, but should be somewhat faster.
Is this also true if we increase the compiler optimization? I would expect an optimized compiler to handle if statements more efficiently than just comparing each value? 🤔
hi, computer engineering student(hardware lol) here. switch statements are basically state machines on the hardware level(think of it like a recursive register that updates itself due to some conditions of the current state and whatever inputs/outputs) and they are quite fast as opposed to abunch of if statements/nested if statements as thats just a ton of multiplexers in a row and as great as a multiplexer is, they can have quite the long bit depth(the longest path a bit of information can take) its just more efficient to use a state machine which may only use a couple mulitplexers per clock cycle and register delay is much less than that of a mux.
one for one, a predicted if is faster than a switch. if you have more important values (such as a success vs error return) where the success performance is far more necessary, an if for the important value(s) first then do whatever with the rest. it allows for branch prediction then. Also, if your tables isnt dense it will be converted to an if-else tree anyways, Predicted branches are msotly just the instruction slot expense so single cycle - almost free.
I was going to suggest mapping your commands to different handlers via a hashtable, but it sounds like the switch statement is doing this already. I wonder what other languages implement switch statements like this?
Yeah ideally here we’d make a function pointer table that we could index with the key. That’s sort of what’s happening under the hood in the switch table.
MIND = BLOWN! What a great optimization, having seen that I can see bunches of ways to employ that to accelerate code flow in scenarios with complex nested comparisons. I would definitely want to wrap it with some sugar so its readable though.
I began programming in machine code, then assembly. Working down to the bare metal is the best way to understand how to write efficient code. I learned all these tricks from the start.
I've actually run into situations where clang optimized a series of `if` statements to something faster than the equivalent `switch`. I'm not sure why they were different at all, but they were. (This was essentially a finite state machine regex.)
First of all, use "gcc -O2 -S", and then analyze the resulting assembly code. The assembly code you have shown (the first one) most probably resulted from unoptimized compilation. my gcc 13.2.1 generates different code. Second, use non valued enums (not assigning values to the enumeration identifiers), and the result will even be better. And finally: switch was always supposed to be implemented by jump tables, but in some cases (like having only a few choices), the code generator could select a more appropriate (=faster) implementation.
to me personally to use Switch statements the value has to be a single denominator AND the impacted options have to be atleast more than 3 or 4 options. If is just more flexible and Switch requires more code to work. But if you got many options depending on your value I think its a powerful tool and when you write big programmes or it gets called a lot its worth to implement
I also consider size of the table since I usually program on 16 bit hardware. These kind of things work best when the choices are consecutive. So instead of using letters it's better to ask the user to type 0, 1, 2, or 3 since you don't need a massive table, you can just use bit twiddling to clamp everything down to four options
I found instructions 1243..=1260 (visible at 7:20) confusing and they seemed to be glossed over for some reason, but I eventually figured it out. A major realization was that the jump table values are relative to the jump table's base address (JTBA). Hopefully my explanation of those lines is clear. `1243: mov eax,eax` is filler inserted by gcc either for alignment or hot-patching (the latter might be specific to msvc). `1245: lea rdx,[rax*4+0x0]` multiplies the index (subtracted scanf value) by 4 into rdx because the jump table entries are 4 bytes wide. `124d: lea rax,[rip+0xe00]` stores the JTBA in rax (rip's value is the address of the next instruction). `1254: mox eax,DWORD PTR [rdx+rax*1]` stores into eax the value found at the memory address found by adding rdx (entry offset) and rax (JTBA); this value is the offset from the JTBA for the address to jump to. `1257: cdqe` sign extends* eax/rax from 32 bits to 64 (the sign of a signed integer is stored in the most significant bit; sign extension essentially copies that bit to all the new bits). `1259: lea rdx,[rip+0xdf4]` is storing the JTBA again, but into rdx, because rax was overwritten. `1260: add rax,rdx` adds the JTBA to the jump offset, producing the address to jump to.
I don't know the answer, but what happens if you set your optimization flags higher, or for the switch code if you compile with -fPIC (position independent code)? Also, is there a case for switch in any performance dependent code? Like let's say you are crunching a ton of numbers, I've never really seen a process where switches are applicable because usually you've set up your data to work with fast simple code.
Switch statement is also to do docotomy when values are dispersed. We can also take advantage of both worlds, by doing if (most probable value) and switch on the other values.
Ah yes the good old jump table. I remember playing ShenZhen IO (it's a game (if you can call it that) where you build digital circuits and code with a very simple RISC assembly) back in 2018. One of the levels was a sandwich making machine or something, and I had the idea to use a hard-coded jump table (using a ROM) to store the recipes and thereby cut down the number of instructions. This is basically just that, but slightly more involved. Slightly.
For small count of options, a jump table can have negative impact because branch predictor can run into difficulties to predict jump through the table. However, the compilers often transform this special edge cases into if-else series anyway. If one option appears often then others, is is probably better to make if statement with likely attribute for this option and then use switch-case for other options
Hi. Can you also compare switch statements with array of function pointers? I understand that switch statements are faster, but how much faster? Because for large cases the array of func ptrs improve readability.
I don't undertand: how does it arrange the jump table and how does it use one computation to get to the location? The enum values we need to compare against are not evenly spaced as far as I can tell.
It looks more like a jump table than a hash table as others have opined. Also, this only works in certain cases, yes? if the range of the switch is too large (say, outside the range of a few alphabetical letters) then I think the switch statement resolves to a series of "if" statements. Great video, I'm loving all this geeky low-level stuff. I'm subscribed and continued success!
@@liningpan7601sorry, but your explanation does not really make sense in my head. Maybe movsx overwrites EAX partially? Cause otherwise we overwrite useful data...
@@NotYourArmy666 The second mov sign extends the byte to 32bit, so the actual result of subtraction is what you would expect if you look at the 32bit register. This unnecessary and it does go away if you turn on optimization.
Cool video! But I didn't quite get how the compiler generates the index at 10:00 , is that related to the variable being an enum? Or would it also work if we just used a char directly?
It should work even without the enum, as enums are just for your convenience really. The compiler basically breaks down what you write into a bunch of gotos so it really isn't hard for it to figure out how to generate the proper index
I can't help but wonder why gcc wouldn't optimize the iferoo version by doing the "movzx eax,BYTE PTR [rbp-0xc]" just the once. Seems unnecessary. Isn't it already in eax for all the subsequent compares?
Support the channel by trying Brilliant at brilliant.org/LowLevelLearning for FREE and get 20% off when you sign up! Thanks again Brilliant for sponsoring this video and your continued support of the channel!
I did not catch the perf difference. What is the perf difference?
Here's a few video ideas:
* Hardware bugs and their workarounds (e.g. F00F, FDIV, TAS on Amiga/Genesis, DPCM glitch on NES, etc)
* How to program in assembly without losing motivation (me irl 😭)
* How emojis are represented in binary
* Unicode errors in general (e.g. rÃ(C)sumÃ(C) (I don't know how to type the copyright symbol on my phone sorry!)
* Why it's so easy for a human to recognize spambots in chatrooms but computers can't do it well
* How an EXE file works
You don't even need a switch statement for enums in Java, you can make frankensteinian monsters known as an abstract enum. They're usually used for adding extra data for the enum members (with a custom constructor), but you can also add abstract methods to the enum, which you have to then implement for each member.
This leaves you at a simple method call on your enum variable without switching for possible values. 😂😂😂
I cant wait for neural link so i can learn everything you know lol
@@TheTrackoShow right?
basically a hasmap lookup vs an array search, but at a low level
To add on: it’s like the hash function is computed at compile time! (iiuc)
Edit: Y’all just use Rust check and you don’t needa compile
@@joeystenbeck6697 this is something I would like to see @lowLevelLearning mention in the video. I haven't benchmarked it, but my gut feeling tells me the difference in runtime complexity is just pushed down to compile time, i.e. increasing time needed to compile the program in order to get a better performance in runtime. In some situations this is undesirable (or less desirable) and the program would be better off running a bit slower.
Beatrice can walk 14_88
@@PrivacyRevocation in 99% of the time you rather want longer compile times than longer run times. And I'd assume that if statements are more difficult to optimize?
@@PrivacyRevocation- Like .. when? When would it be better to have the program run slower so the compile would be faster?
Yes, I was disappointed that he glossed over stuff referring to things as "magic numbers" and "a bunch of garbage".
The whole *point* of the video was to explain the details behind the switch statement.
The valid user input are letters: "c", "d", "e", "n", "q" (which he defined earlier in the video).
Note that the first letter "c" is ASCII 0x63 and the last letter is "q" is ASCII 0x71. Note that the span from "c" to "q" is 0x0e (i.e. 15 characters away).
The code at 7:51 shows a transformation and a check.
Line 123b sub eax, 0x63 ; this subtracts letter "c" from the entered letter, so the range of inputs is now indexed from 0.
Line 123e cmp eax, 0xe ; this compare checks if the input letter is past letter "q" since letter "q" is the 15th letter (i.e. 0xe) from letter "c"
At this point, the character range from "c" to "q" has been transformed to an index range of 0 to 14.
If the entered character is *OUTSIDE* this range the code terminates, bypassing the jump table stuff, and goes to process with the "invalid" code not shown in the video.
Line 1241 ja 12aa ;exit the jump table code block as the entered character is outside the range of the defined command characters.
If the entered character is *INSIDE* this range, it proceeds to the jump table code.
Line 1245 lea rdx,[rax*4+0x0] ;Load Effective Address. The index value is multiplied by 4, since each jump table location is a 4-byte offset added to the table's base location.
The code at 7:58 is really just the jump table values. (Unfortunately the disassembler produces nonsense code, but the offsets are valid.)
Rearranged, the offset values are given below and matched against the entered letter:
𝟶𝟶: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝟸 ; "𝚌" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟿𝟼 𝙲𝚘𝚗𝚝𝚒𝚗𝚞𝚎
𝟶𝟷: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟸𝚊 ; "𝚍" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟽𝚎 𝙳𝚎𝚕𝚎𝚝𝚎
𝟶𝟸: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟷𝚎 ; "𝚎" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟽𝟸 𝙴𝚍𝚒𝚝
𝟶𝟹: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚏" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝚒𝚗𝚟𝚊𝚕𝚒𝚍 𝚌𝚘𝚍𝚎 𝚋𝚕𝚘𝚌𝚔
𝟶𝟺: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚐" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟶𝟻: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚑" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟶𝟼: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚒" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟶𝟽: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚓" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟶𝟾: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚔" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟶𝟿: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚕" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟷𝟶: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚖" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟷𝟷: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟷𝟸 ; "𝚗" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟼𝟼 𝙽𝚎𝚠
𝟷𝟸: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚘" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟷𝟹: 𝚏𝚏 𝚏𝚏 𝚏𝟸 𝟺𝚎 ; "𝚙" 𝚒𝚗𝚟𝚊𝚕𝚒𝚍
𝟷𝟺: 𝚏𝚏 𝚏𝚏 𝚏𝟸 ?? ; "𝚚" 𝚓𝚞𝚖𝚙 𝚝𝚘 𝙻𝚒𝚗𝚎 𝟷𝟸𝟾𝟿 𝚀𝚞𝚒𝚝
There is no particular reason that the offsets are negative, it's just that the compiler chose to put the jump table _after_ the code fragments.
In fact, referencing the offsets to the JUMP table *is* a bit odd.
Usually the base address for the jump table is the FIRST ROUTINE. (in this example the "New" routine at address 1266 is the first rountine...as shown at 8:52 )
And because it's first, It would have an offset of simply 0.
The rest of the routines are reference from there (the FIRST routine).
In the given code each entry point is just 12 bytes from the previous routine.
𝙹𝚞𝚖𝚙𝚁𝚘𝚞𝚝𝚒𝚗𝚎𝙱𝚊𝚜𝚎𝙰𝚍𝚍𝚛 𝚒𝚜 𝟷𝟸𝟼𝟼 (𝚒𝚗 𝚝𝚑𝚎 𝚐𝚒𝚟𝚎𝚗 𝚎𝚡𝚊𝚖𝚙𝚕𝚎 𝚊𝚝 𝟾:𝟻𝟸)
+𝟶 𝚏𝚘𝚛 "𝙽𝚎𝚠"
+𝟷𝟸 𝚏𝚘𝚛 "𝙴𝚍𝚒𝚝"
+𝟸𝟺 𝚏𝚘𝚛 "𝙳𝚎𝚕𝚎𝚝𝚎"
+𝟹𝟼 𝚏𝚘𝚛 "𝚂𝚝𝚘𝚙"
+𝟺𝟾 𝚏𝚘𝚛 "𝙲𝚘𝚗𝚝𝚒𝚗𝚞𝚎"
With those simpler and smaller numbers, the example would have been easier to follow, but we're at the mercy of the compiler (and crappy disassembler).
As was pointed out, if the cases are "sparse" then the jump table will have a lot of those "invalid" entries. It still executes very quickly, but just takes up some memory.
Input letters *OUTSIDE* the range are culled out and terminate quickly. Letters *INSIDE* the range all result in the jump code being executed, even if it is just results in an "invalid" stub.
This makes it all even more interesting. Thank you for the addition.
How did you do that monospace-ish font in a youtube comment?
`mono`
how did you do that bro
this is what i came here for, thanks
I'm convinced, I'm switching to switch.
If only i could
You should in any case
@@PingSharp 😆
If I use it, I won't break. That's if I don't default.
Do not get convinced so easy, test and measure times, that's the only way to know which strategy is best
Before everyone decides to switch to switch statements😊I think it’s good to point out that switch statement CAN be faster, but it really depends on the use case. Like you mentioned, when you only have a few conditions, if/else might actually be faster. Also in your example the gaps between the enum values are small, but when very large gaps are used, the compiler might opt for a different strategy, which can again be slower than a few if/else statements.
And finally, switch statements might be faster, but have their own caveats. Which can lead to slow development. It’s so easy to forget a break, which leads to an unexpected fall through, which can be real nasty bugs to trace. It’s easy to forget to handle the default case. Also in C/C++ switch statements cannot be used for strings, which is annoying to say the least. And the switch cases are not scoped, unless you add curly braces; which again can lead to unexpected behavior if used improperly.
Usually compiler issues warning about fall through. At least gcc does.
Also video explains why strings cannot be used in switch cases. You need a number for switch.
It really depends on the use case, squeezing more than 1 line under each case can be hard to read and ugly. And so often the conditions are not as simple as a case. So I think its use case can be quite narrow.
Seriously I don't understand why break statements are not a default on switch in all languages I have used. It blows my mind.
If you use enums, the gaps should be quite small. Generally I'm happy to let the compiler figure out if it thinks the switch is faster or not, but I prefer switches because I think it looks prettier and I sometimes like to know if I'm missing a value by looking at the warnings when I remove the default statement. (I tend to write java or rust more than C, so the precise warnings may be different.)
Embedded dev here. The compiler impressed me a couple years ago. I had a static constant std::map of enums to functors and i changed where it was used to a switch calling the functions directly and it compiled into an identical object file.
That’s nuts
Using the switch can also improve code safety by enabling the compiler warning/error for switch statement not handling all values in your enum.
So in future, when you add another value to your enum, the compiler will tell you about any switch statements you forgot to update
C99?
@@desertranger82, std::map implies C++.
doubtful. that means the compiler would have to have to elide the map entirely. maybe in a very simple case, but highly doubtful.
I would have imagined that modern compilers with optimizations turned on would be smart enough to choose to translate the if/switch to the appropriate assembly if/switch depending on the specific code.
Most of them are. Not sure what compiler he used with what settings. In some cases they can convert if-elses to jump table, like they do in case of switch. In some other cases they might decide to not generate jump table for switch, because multiple compares would be faster or much more memory efficient (in terms of produced code).
Yep, to many compilers it doesnt matter at all whether you make a switch or if statement. I guess the switch just happens to be easier to optimize so its more likely to be faster if we choose any random compiler. In addition to O(1) jump tables and O(n) if else chains some compilers actually uses hardcoded binary search to find the correct case by repeatedly cutting the range in half which is O(logn) in average. I guess that can be good middleground between those solution. Most compilers change the implementation for if and switch statements depending on the amount of items to be compared and the type of data being presented (consecutive numbers/something that translates to them like enums, random looking numbers without seemingly any pattern etc.)
You forget the human here.
If I have a long if-else chain that I carefully created with only fixed comparisons to a single variable
The compiler can optimize that to a jump table.
The next developer working in this file adds a if-else that breaks this pattern, and buum performance drops for no obvious reason (unless you know if this compiler optimization).
By using a switch I tell the next developer my intention, and she/he needs to make a conscious decision to change this concept.
They are, the if code is not optimized in any way
This is the correct answer.
People who write modern optimising compilers are generally smarter than the average programmer.
It's _relatively_ easy to look at an if-else cascade and analyse if it's evaluating equality on the same variable and is, therefore, functionally a switch statement.
One only need look at Python: Guido explicitly excludes a switch statement as he _knew_ this was possible.
The trade off point between if and switch will depend on your target processor. A processor cannot look ahead with a jump table. Meaning it cannot keep the pipeline full. A processor may have branch prediction to try and keep the pipeline full. Lookup branch prediction and jump for your target processor. Only benchmarking will tell you the tradeoff point. Good luck.
To be fair, if the if-statement can easily be transformed into the switch statement as you did in the video, any reasonable compiler should output the same asm as the switch statement.
@@piisfun What? Modern compilers are perfectly capable of generating jump table from cascaded if statements as well as converting switch statement into series of comparision depending on whatever is faster in certain situation and are able to do it compiling much higher level language like C++.
Yeah, just put -O3
yeah with the example code yes, the problem comes when it's organized a little differently and the if/else chain isn't always comparing the same variable to the same value. (like, wouldn't be compatible with a switch). Easy thing for someone to do accidentally coming in to old complex code and adding another case and suddenly it performs a lot worse. So with modern compilers it's more of a protection against myself rather than intentional optimization imo. Though there are use cases where you need to be intentional about this, like some embedded stuff.
In my first design review at Novell, my team lead said I should use a switch statement for true/false tests rather than if/else. So I coded both and looked at the generated code - they were identical.
Compilers since 20+ years ago have been smart enough to ignore your preferences as a source code writer and just do the right thing at the machine code level. I used enjoy trying to outsmart compilers, to make 3D renders run faster by studying the machine code output, but gave that up long ago. We have created tools smarter than ourselves!
If the if-else chain tests something that can be trivially changed to a switch, it's better to do so, simply because it communicates intent better, and gives more well-structured code. Or, put another way, a long, simplistic if-else chain is just a plain eyesore, IMHO.
It doesn't necessarily matter if the compiler can magically figure your intent out in this specific case; in other cases, bad code (which this smells like) can actively hinder it from figuring things out and making optimisations.
TL;DR: Don't make it any harder for other people to read your code than absolutely necessary, and don't expect your compiler to perform magic on weird code, even if it's _often_ able to.
@@mnxs It's most useful when you have a situation with multiple checks that are simple tests and the actions to take are mutually exclusive.
Presumably in this situation those code blocks can be treated like an anonymous function and reduced to something akin to BASIC's IF ... THEN ... of the form
IF condition1 is true THEN gosub 100
IF condition2 is true THEN gosub 200
IF condition3 is true THEN gosub 300
P.S.
That said you can "optimize" your If-Else constructs by ordering the checks in terms of the most common conditions you expect to be true.
Switch statements are to be preferred when that ordering/ratio is completely unpredictable. Or, as he puts it in the video, they are all equally likely.
@@DrunkenUFOPilot That takes all the fun out of it. :P --- Still it does save your butt if you do something stupid in the code.
1. which compiler
2. what compiler options (if you have comparet on no optimization flags, then good luck, it's not even production code)
3. what language version.
This whole if / switch comparison makes litle sense, compilers this days will chop your code in sutch a way that you won't make heads and tails of it, so i assume adding -o3 will make bot comparable, but then again, you won't be able to read through ASM
That misses the point. The point is that the main reason switch statements exist is due to this optimization. A compiler turning an if-else tree into a jump table is explicitly a compiler optimization. A compiler turning a switch statement into a jump table is just what the compiler logically should do.
@@vibaj16 I did find a cases where switch cases won't get turned into jump tables. So it's not as logical as it seems. If for example you have non-equally spaced case statements that have very large gaps in between (like some random numbers like 1, 260, 1070, 24080). In those cases the jump table would be gigantic and even with no optimization gcc and clang (with -O0) seem to refuse to use jump-tables in those cases.
It makes sense but it makes me doubt that this optimization is the main reason for the existence of switch statements since the compiler obviously still integrates it's own decisions even on -O0.
PSA: It does not matter.. modern compilers have agressive optimizers that transforms if to switch and switch to bit-tests whenever it is valid && profitable speed/size wise to do so.
Yeah I was going to ask about that in the comments, optimising compilers were my first thought
A good compiler will let you set flags to indicate what you want it to do.
I don't really care about tabs vs spaces, but 8 spaces tab width hurts my soul 😭
the Linux kernel codebase actually use 8-width space or tabs (iirc)
I believe it's a method to prevent nesting. It gets unusable at the depth of 3, so it's motivating to extract and write separate functions/methods/classes.
@@moorscode that is actually the first positive thing I heard about such a big amount of spacing and it totally makes sense. Maybe I should try to get my coworkers machines to look that way xD
tabstops at 8 spaces is.... a very long-standing de-facto standard. I don't care what you set your indent level to, but on a modern computer, there's very little reason to make your editor necessarily use hard tabs of a different width, and plenty of reason (IMHO) to maintain the idea that tabs are 8 spaces (or rather, align you to a column number divisible by 8), for compatibility with existing things.
I know right.. i only use 2 spaces.
I'm pretty sure same optimization applies to a sequence of IF statements on modern compilers. What optimization level is used here?
seeing the re-move of the value in eax it's obviously in debug mode, no compilers would output such thing outside -O0
This is what I was also thinking
When I was accelerating code by eliminating branches, gcc was already eliminating the lowest level if by using an optimal assembler instruction.
This is correct. Common subexpression elimination is one of the first optimizations I learned back in the 1970s while in college as it can be applied early while doing lifetime and flow analysis before code generation
This optimization can only be applied on if statements who conditionals strictly operate on integral types and never check said integral types against a range of other integers (e.g. > or
It is a neat mechanic, but disappointingly rarely actually relevant.
1. You need to have a decent number of branches to actually notice a difference. Even with 5 options, it's still only a minute dufference.
2. If you have a huge number of choices, you will often implement ot as some kind of hash map or state machine anyway, rather than if/else or switch.
3. The speed of the branching is only relevant if the code that is executed in the branches is also fast. If you're doing a printf at the end, then your improvement by using a switch will only make up a fraction of a percent between hitting the control structure and actually seeing the text on screen.
That said, these niche cases where the speed gain is relevant of course still exist, just not in many places.
I guess I'm kinda traumatised about this topic over the whole YandereDev incident where everyone and their got hung up on the point that the code was bad because it used if/else instead of switch, even though that was the least of the problem with that code.
Ultimately it turned out that their performance issues came from calling an expensive UI function multiple times per frame and the use if heavily unoptimised graphic assets.
This looks like a compiler's job to me
It really isn’t it’s all about how you want your code to run that’s why there are so many ways to do the same thing
scanf("%4s", optionbuf), from man scanf:
An optional decimal integer which specifies the maximum field width. Reading of characters stops either when this maximum is reached or when a nonmatching character is found, whichever happens first. Most conversions discard initial white space characters (the exceptions are noted below), and these discarded characters don't count toward the maximum field width. ***String input conversions store a terminating null byte ('\0') to mark the end of the input; the maximum field width does not include this terminator.***
Also, your while loop condition is UB on the first iteration.
actually an if else tree might get optimized into a jump table by the compiler if there are many branches, so using an if else tree or a switch really doesnt matter with an optimizing compiler. Though its good to know when to use what as it can make your code much more readable.
well perhaps, but people that care about such permeformance improvements are working with embedded devices like arduino etc. Everything is going to be quite basic.
keyword “might”.
@@Skyaidrules that's also the case with switches though
This depends on the nature of the if else tree. It can only perform the jump table optimization for if-else branches if 1) The conditionals only operate on basic integral types 2) The conditionals only test for equality or inequality, and doesn't use >= or
@user-li2yv5je5e I think you missed the point. ISA on an arduino is vastly different than x86-64 ISA.. or even x86-32 ISA. The performance profile is going to be vastly different. This is low level stuff that I doubt 99% of the people here understand, unless you work on compiler development. Joso997 apparently gets it. I've seen plenty of people come from 'modern' development teams, into 'tiny MCU' world and not understand this.
you should use compiler explorer if you want to show whats going on in assembly, it has a much nicer view and will even highlight what gets translated into what assembly instructions.
Who else is interested in how the "magical value math" works ? The whole point of me watching this video was to understand this lol.
Jump/Hash tables are things I'm not sure I quite get, so me.
I just assumed it was like:
Table address = 1200
q = 71
d = 55
So it just reads the value (q) and adds it to 1200 and jumps to address 1271 and executes the code there. Actually he said this part contains a negative number(?) and uses that in conjunction with the table address to jmp to code. Close enough?
I think he went over this part too quickly or something.
@@user-zu1ix3yq2w he kinda explains how they jump to the exact switch statement that satisfies the condition. But there was a magic address that points to the statement in his explanation.
How it calculates the address to that statement was not explained and was mentioned as "magic". I clicked on the video title to know exactly about that, but it was mentioned as "magic math" here.
@@aneeshprasobhan Oh well. I think he should make another video explaining it. Obviously we could just look up jump tables and learn about them, but I'd prefer he just explained it in more depth. I've always found magic numbers fascinating. I think a jump table is a natural solution, one we would invent ourselves if we had been approaching these problems in assembly and using the goto statement, instead of using high(er) level languages.
I guess what I can say is, this jump table solution for switch statements (goto table[table_address+offset]) is better for (numerically) consecutive cases, and less so for cases that are "sparser," especially for memory management.
Seems to be like branchless programming. There is an Aussie guy on here that is really in to it.
Fun fact:
* Write some code in a function, doesn't matter what
* start a switch statement at the top of the function
* put a "}" at the end, to finish the switch
* sprinkle "case" statements in between the lines of your code wherever you want. Even inside loops, if staments, other switches etc.
This is legal C/C++ and it even works, although it makes no sense at all.
Please correct me if I'm wrong, but isn't this only true with optimizations turned off? I did a quick bit of playing around on compiler explorer before commenting and it was true there was a substantial difference between a switch and a if chain with optimizations turned off but if -O3 set both switch and if result in the exact same assembly.
It depends on how many cases you have
This. Decisions between if-else chains and switch statement should be based on what is more readable or what expresses the intend of your code better. Every major compiler is perfectly capable to see through switches and if statements and can transform one into the other if it thinks that's going to be faster. And generally the compiler will do a way better job at making this decision than the average programmer. The advice given out in this video is simply misleading and will make new programmers rewrite perfectly fine code because they now think that is will be faster (which it won't).
@@williamdrum9899 I did a bit more playing around this time with 9 cases instead of 3. The asm is almost identical, The compiler did switch to a different approach than before, this time it declared basically an array of length 26, one for each letter containing the start address of each case and then used the switch argument as the index to this array to find the correct case code, e.g. 'C' - 'A' = 2, 2 index in the array is the address for case 'C' which it then jumps to. This is the same for both if and switch. there is one small difference which is with the if version, there is a extra comparison at the begin which check the first if condition, if its matches then it skips everything I just mentioned. The switch version doesn't have this comparison but everything else is the same.
Not if your if statements greatly differ from one another
As per Jason Turner, I would also almost always avoid using the default label when switching on an enum since you could forget to adjust the switch statement when a new value is added to that enum. This should always be configured to cause a warning in the compiler, but the warning won't work when a default label is present. And yes, you'll probably have to extract the switch statement to a new function so you can handle the default case outside of the switch if it falls through, but without that label.
Also another vote from me to use Compiler Explorer like Turner does when demonstrating assembly. It's just the best.
MISRA, for example, requires an explicit default. The alternative is that, unless intentional, the default should assert()... or just use a better static analysis tool/compiler, which will tell you when you have an unchecked enum value.
If this kind of error makes it into the wild, you have more serious process issues.
Perhaps in embedded land where unrecoverable errors are Really Bad™, but for most programs I would argue you should always have a default case that throws or asserts. Of course, I would never recommend a default case that just silently ignores the problem. Even better is when your compiler can catch the problem for you (a la Typescript).
@@clonkexIt really depends on the context and situation. When writing code for yourself, it's perfectly fine to have an empty default or just print out a message.
Where it becomes a problem is when the valid cases are strictly defined and receiving anything else is Very Bad News.
If you write a compiler or assembler, you really want to know if it encounters a non-existent instruction, a math/arithmetic error (division by zero), or some other issue that could cause the program to unexpectedly crash the computer it's running on or at least hang it up badly.
Otherwise it might quietly produce an executable that will suddenly crash without an obvious cause or waste a lot of computational resources on recovering from avoidable errors.
And God help you if it should take garbage and produce otherwise valid but pointless instructions.
@@jnharton I agree. This is why I say you should have a default case that throws. You want to know _for certain_ that you've covered every case, and if not, crash rather than do something unexpected.
No measurements? The first step in optimization is measurement.
The second step is to turn on your compiler's optimization.
Great video as always. I understand the difference but only in abstract level so it's pretty cool to see how exactly it work under assembly.
“Brilliant,a *free* and easy way, …, 20% off subscription” sure
This will very likely be optimized out (for the if statement checks) if you tell the compiler to do any sort of optimization. Also, it's relevant to note that the hashing of the switch statement was so simple because you used a series of character comparisons of which each character was different. If you had multiple command inputs that matched even the first letter of the command, the switch statement would still branch.
But you showed a neat topic few know about, kudos to that
Nice! A very common place to optimize with switch is when parsing command line arguments. The problem is when you need to call strcmp or some function like that for every argument, you can't do that in a switch statement. A nice trick is to separate what can be solved with ifs vs switch and use the switch first, managing to even prevent the strcmp function calls.
I wonder if you could just run a switch statement over the first character and only then do the `strcmp` (and only compare with values that actually start with that character), but that would make the code look a little ugly. :(
@@ThePC007 That's actually a very good trick, for example gperf uses that trick to parse keywords to generate hash maps. I actually use it all the time when parsing command line arguments, I check if the first character is a dash '-', and only then I call strcmp from the second character, as most command line options will start with dashes.
To parse argument options you can hash argument values into a fixed size (per value) hash board buffer (long enough to hold your argument options) where you hash the string value and then memcmp() the two memory blocks to make sure the value is the same as you wanted.
Then you get an almost constant evaluation time.
@@informagico6331 That sound extremely efficient and fast, and probably the way to go for big production programs, but maybe an overkill if you're making a simple cli tool. Depending on the amount of available command line options, the simpler way could even be faster.
Yacc + lex. It's a long time I used them last, but if I remember correct Lex can make exactly such tables.
I think it boils down to a feature in the language. Switch statements have to have constants for the case statements. You can't use a variable in a case statement (i.e. CASE: ). So the values to be checked are all known at compile time, and the 'hashing' can be pre-calculated.
'If' statements however, could be comparing two variables whose values are not known at compile time. So even though in this example the value just happens to be a constant, the compiler doesn't have an easy way to go from assembling in each 'comparison' to trying to 'hash' if it just happens to be that one of the items being compared is a constant. Also, all the 'if' 'else if' clauses could be using completely different variables and values, whereas 'switch' statements the test parameter is always the same throughout all the 'case' statements.
so basically the compiler converts the switch statement into a sort of hash table? that way it runs a hash on the condition and immediately jumps to the corresponding branch's address, correct?
yes, correct.
I think the like button glowed when you said "like" at 10:57 o-o
Some benchmarks would be nice to check if the performance really changes that much (probably not).
Any optimising compiler worth your time will produce identical code for either statement anyway. Usually it is able to transform the code into either lookup tables, trees of if & if/else statements, and often even branchless code. LLVM has been able to do all of this for a while now, as has GCC, and even MSVC has some capabilities here.
This guy is really being irresponsible by presenting his performance advice as true when it very rarely is, and not even measuring the difference.
@@codaaaaaaaaa and even if the compiler doesnt optimize in the vast majority of cases you dont even care as processors are extremely fast
Very good. We learn today that switch statements in c is highly optimized. Thanks!
I ran the if statements versions if O3 optimizations enabled with g++ compiler, and it got converted to the switch version !!
Compiler optimization is pretty dope
shouldn't the compiler be able to optimize the if tree to a switch like statement? (with -03 or -0fast)
I agree, but I'm not sure you can always rely on the compiler to do that on all platforms though
Trust, but check
So much this. At the end of the day if you didn't look at the compiler's assembly output you have no idea if your program is optimized
Yes, this code which reads from the same memory location repeatedly, with no other data memory accesses in between, is going to have cache misses. Do they not teach you CS guys anything about locality? Do you just see memory accesses and assume "cache could miss there"? Yes there are some edge cases, but in general, that should hit, and in many of those edge cases (interrupts, multicore, etc) the cause will cause a much larger slowdown than the single cache miss it causes in that code.
I've only ever work with 16-bit MSDOS (no cache) and even I knew this.
Love your videos man, I know some of this stuff but it's a great refresher or intro for others keep them coming
Reminds me of the time I used a function pointer rather than a switch statement (20 years ago) did not do this, but instead searched down the list like the if statement. This was for a state machine with lots of states. I knew it was inefficient.
This switch-case example looks kind of like a function pointer. When I have a long-ish list of compares like this, I tend to check the assembly output for efficiency like was done here.
Going to -O1 or -O2 could enable jump optimization and/or inlining and most of the code disappears in this example.
Might be different for out of order execution, but when your cpu is piping in instructions, it will usually be processing multiple instructions at a time. But that means for an instruction like jne, your cpu may end up processing instructions right after the jump when in reality it needed to be processing instructions at the location specified by the jump instruction. When this happens, the cpu is forced to flush out the instructions and then continue from the correct location. Branch predictors can reduce this chance but that is an additional cost of if statements.
Exactly my thoughts 😊
I would imagine that a switch statement makes it easier for a compiler to optimize for branch prediction.
Oh hey that's actually really cool! I've never seen someone explain how switches work at a byte level, and when you do it makes it really obvious why switches are so fast.
There are 3 ways to do this and option 3 is the fastest and cleanest. That’s having an array with function pointers for each of the enum values.
Nice tutorial! I know its just a test program but you should consider to clear the optionbuf with 0 before you use it since there may be arbitrary values in it and it potentially can trigger your stop condition
He also has a 1byte buffer overflow since the length specifier in scanf("%4s", optionbuf) doesn't include the null byte and completely failed to consider compiler optimizations. This video can basically be deleted
Because in many cases they translate directly to a single or very few assembly statements. However, given a properly designed compiler and language designed to be optimized, often IF statements can in select cases be translated to be as efficient as switch statements.
Well, what happens for the "default:" case, though? Does it detect if the number is out of a specific range to see it's not one of the wanted actions? Or how does it work that out?
🤓 This is my comp sci degree speaking out, but there is a case where an if/else block might have a marginal performance benefit. If your input data is distributed in a way where one or two values are the overwhelming majority, with tertiary and onward values being exceedingly rare, it might prove just barely optimal to use an if/else tree rather than a switch.
The important point is that you *MUST* profile your code. Modern processors are just so complex that you just can't make any assumptions about performance anymore.
would that be a buffer overflow at 2:00? as in, scanf may write a null terminator as the fifth character, right?
I was afraid of for loops and used nothing but switch statements in my code. Then I finally learned for loops and have began rewriting a lot of my code to make it a whole lot better.
I like that you supported your thesis with a reference to the assembly code. I would be even happier if you ran some test code to compare the time it takes to complete 100K of switch vs. if versions of code.
I still remember reading the AMD optimization manual like 20-ish years ago and figuring out how to use that to your advantage in so many ways. Even when you have simple string constants you can still take advantage of it by pre-hashing the strings, and there was another trick in there involving multiple if branches that could be compressed into a single switch by using the compiler's own optimizations to convert a test into a simple bit and then bitwise or them all together.
can you give me that resource to read?
@@sent4444 I can't directly link it, but just Duck the AMD optimization manual and you should find it. The Intel optimization manual is also a good resource. They both provide some really good tips if you're thinking of getting into writing assembly or a compiler or just improving your code from a higher level.
The indirect unconditional branches shown in the assembly for the switch statement actually dont play well with branch target predictor hardware in cpus so performance could actually potentially be lost by using a switch statement. Also I couldnt get compiler explorer to produce the indirect jump assembly with gcc at O0,O1,O2, or O3.
I have the same issue with gcc... clang does as the video though.
Thanks for the video, I think it would help a lot of people if you dived deeper into ASM to explain the differences between using ifs and switches. There are some things that really do need to be talked about in terms of what the computer and processor is ACTUALLY doing and switches and ifs are a perfect example.
PS. I had no idea people still had tabs as 8 spacing instead of 4 or even smaller. That is wild.
I remember learning about another benefit in my computer architecture class. Modern processors use tricks like out of order processing and branch prediction to improve performance. More branches = more wrong predictions and more operations performed that need to be discarded
before i even watch the video i gotta say with my minimal experience switches are always the best i found, why have 10 ifs if you can just directly point to what needs to happen when a case matches?
I have used switch statements, and find that they lead to lot of unintended errors in writing the code which are syntactically correct but logically wrong. Like missing break declaration or default statements which is not supposed to be default after some changes in the program.
Ultimately these take more time and cost more.
enum class in c++ is nicer because it doesn't pollute your namespace
std::array in c++ is nicer than char arr[4] because it has built in bounds checking without overhead. better still would be std::string, because that's the correct datatype for a string.
i could go on. I see no reason to use c over c++ when it's mostly backwards compatible anyway and you can just use the nice, safe and fast features of c++ over the unsafe trash that you will inexplicably write when writing in pure C. That said, mixing C and C++ will yield terrible code, so might as well just write c++ and dump all the c junk.
And yes, c++ works just fine for embedded.
glad to hear THAT, because the program I've been writing has a switch/case block with (so far) 171 cases in it, and it'll probably end up with a lot more than that before I'm done
I learned something new and also realized some illustrations would be very helpful. Especially if to explain how to use an input in a table of negative numbers to subtract it and find which option to choose without it comparing two objects 🤨?
Do note, this behavior is specific to languages that only implement switch statements that only accept integral types, and only compare against single integral constants, like C.
If you use C#, the behavior is different, according to Microsoft's documentation: "A switch statement evaluates case patterns in text order from top to bottom.", text patterns which can be either a constant list evaluation (in which case, if its an integral type, it can use jump table optimizations) *or* a bunch of comparison evaluations (e.g. value >= 10), which compiles down to IL that is no different than an if-else tree, so understand the behavior of your chosen language and use cases before throwing switch statements everywhere, expecting a performance boost that never manifests!
It's almost like the binary is a hash table and the switch is turned by the compiler into a very simple hash function to index(jump) to the right spot. Cool.
Pretty much. The beauty of this is that no comparisons are actually made. You basically have a big table where most options are "do nothing" so that no matter what the user types, all cases are handled
Since we have a subtraction from a large number to get address. What if our switch statement cases have more cases than the large number?
What? Is there a separate table created where jump adresses for the switch are stored? Where does it jump if no option is true? What do you mean by "we will use our input as an index into a table of negative numbers, we will subtract 'that' number and boom just go to that location?"
Which of these numbers is the user input? What is this "table of negative numbers"? I don't get it at all
What compiler and optimisation level did you use? I'm surprised that compiler didn't optimise it.
Very cool video, but I have some questions,
first of all what if I use a switch statement to compare a value to 0 and INT_MAX, then the jump table is gonna be pretty big sacrificing both on both storage and ram, which may actually matter for weak embedded systems
second compilers are pretty smart today so if you have a lot of if else statements after each other compering the same thing to something I'm pretty sure it's gonna optimize it away into a switch statement
For your first question, the compiler is smart enough to make it a branch. Chances are it will convert
switch (a){
case 0:
break;
case INT_MAX:
break;
default:
into
xor eax,eax
je IsZero
inc eax
je IsMaxInt
default:
ret
IsMaxInt:
ret
IsZero:
ret
(abusing the property that INT_MAX + 1 equals zero)
Yo this is actually so sick. I saw a comment saying what is basically is... But man am I glad I stayed around for the actual explanation this was great
Switch statements are also much more clean when you read code
It's more verbose
@@timmygilbert4102 well, I suppose we can't have everything. If I have to write more than two else statements, I prefer go for a switch
@@peyop5262In principle I agree it is cleaner, but I don't like the need for break.
@@cherubin7th break breaks everything, which is why I prefer if. Also in game making, whenever I have a switch I'll need to break it because exception happen, with a if it's just inserting or changing the test. Though nowadays I just pass a fonction or an object by reference, much cleaner.
6:54 "Very long branch instructions that may invoke a cache miss on your computer"
Do you mean a branch misprediction instead of a cache miss? AFAICT the if statements aren't fetching any previously unused memory so I don't see how that assembly could produce a cache miss (except of course if the thread running those branches got preempted and another thread went on to fill the cache with something else before the original thread got resumed or something like that)
Or did you mean it generally? That there *could* be code that fetches uncached memory in those if statements?
Also, it is interesting that the assembly code always loads optionbuf[0] again and again even though it would suffice to do only a single time after call to scanf (or at least that is what I understand movzx does there). I wonder if that has anything to do with the compiler being unsure if it can treat optionbuf as *restrict* ?
It can be hard to get a compiler to create a jump table (at least in gcc). When the cases are not consecutive values, and their span can only be represented with more than 8-bits, then the compiler considers it a waste of memory and makes an if/else equivalent. I once created a dozen fake cases using inline assembly NOPs to tip the scales in favor of a jump table.
Awesome video, thanks! :3
I wonder whether the same holds true for rust if statements vs switch statements
It does -- play around with rust's match with compiler explorer, it's very smart
6:56 how can comparing a register against a constant value invoke a cache-miss? It's.... a register....
So what your saying is it oils down to a condition jump which can writen as if val = n then goto line y else fall through and run a similar test with a different n value, again exit to y if that is true.... everything can be expressed as an if statement once you undserstand fall through. Unfortanatly a fair number of languages don't seem to be able to suport it due to not being able to call to a line or have a title part way through a function. I suspect the the switch statement is the implementation that obviscated away behind a compiler. But when we get down to brass tacks, it's just a state machine.
The issue is mostly about difference in branch prediction misses. Anytime one branch goes differently than predicted every instruction in the pipeline after branch gets squashed. For instance ZEN2 that's 18 cycles with potentially 4 instructions decoded per cycle, and for all modern PC processors it seems to be on that ballpark.
The nested if statements can do that multiple times while calculated only does it once.
Damn. I just needed this since I've completely forgotten to use the switch statements.
Calling function pointers from array is quite similar, but should be somewhat faster.
Is this also true if we increase the compiler optimization?
I would expect an optimized compiler to handle if statements more efficiently than just comparing each value? 🤔
hi, computer engineering student(hardware lol) here. switch statements are basically state machines on the hardware level(think of it like a recursive register that updates itself due to some conditions of the current state and whatever inputs/outputs) and they are quite fast as opposed to abunch of if statements/nested if statements as thats just a ton of multiplexers in a row and as great as a multiplexer is, they can have quite the long bit depth(the longest path a bit of information can take) its just more efficient to use a state machine which may only use a couple mulitplexers per clock cycle and register delay is much less than that of a mux.
one for one, a predicted if is faster than a switch. if you have more important values (such as a success vs error return) where the success performance is far more necessary, an if for the important value(s) first then do whatever with the rest. it allows for branch prediction then. Also, if your tables isnt dense it will be converted to an if-else tree anyways, Predicted branches are msotly just the instruction slot expense so single cycle - almost free.
I was going to suggest mapping your commands to different handlers via a hashtable, but it sounds like the switch statement is doing this already. I wonder what other languages implement switch statements like this?
For a menu example like you did in the first part, isn't it a good idea to use a pointer to the functions that we use in each case of the switch?
Yeah ideally here we’d make a function pointer table that we could index with the key. That’s sort of what’s happening under the hood in the switch table.
I always knew switch was the faster option after I learned Java but had no idea why until now. Thanks for this!
But how is the index of the jump table calculated? How do you reliably go from a character in a scan buffer to a predictable address?
MIND = BLOWN! What a great optimization, having seen that I can see bunches of ways to employ that to accelerate code flow in scenarios with complex nested comparisons. I would definitely want to wrap it with some sugar so its readable though.
I began programming in machine code, then assembly. Working down to the bare metal is the best way to understand how to write efficient code. I learned all these tricks from the start.
@6:00 Why does it "movzx eax,BYTE PTR" for every if case? ...seems redundant
I've actually run into situations where clang optimized a series of `if` statements to something faster than the equivalent `switch`. I'm not sure why they were different at all, but they were. (This was essentially a finite state machine regex.)
First of all, use "gcc -O2 -S", and then analyze the resulting assembly code. The assembly code you have shown (the first one) most probably resulted from unoptimized compilation. my gcc 13.2.1 generates different code.
Second, use non valued enums (not assigning values to the enumeration identifiers), and the result will even be better.
And finally: switch was always supposed to be implemented by jump tables, but in some cases (like having only a few choices), the code generator could select a more appropriate (=faster) implementation.
to me personally to use Switch statements the value has to be a single denominator AND the impacted options have to be atleast more than 3 or 4 options. If is just more flexible and Switch requires more code to work. But if you got many options depending on your value I think its a powerful tool and when you write big programmes or it gets called a lot its worth to implement
I also consider size of the table since I usually program on 16 bit hardware. These kind of things work best when the choices are consecutive. So instead of using letters it's better to ask the user to type 0, 1, 2, or 3 since you don't need a massive table, you can just use bit twiddling to clamp everything down to four options
My friend told me that if are faster or same when you have the compiler optimize the code
I found instructions 1243..=1260 (visible at 7:20) confusing and they seemed to be glossed over for some reason, but I eventually figured it out. A major realization was that the jump table values are relative to the jump table's base address (JTBA). Hopefully my explanation of those lines is clear.
`1243: mov eax,eax` is filler inserted by gcc either for alignment or hot-patching (the latter might be specific to msvc).
`1245: lea rdx,[rax*4+0x0]` multiplies the index (subtracted scanf value) by 4 into rdx because the jump table entries are 4 bytes wide.
`124d: lea rax,[rip+0xe00]` stores the JTBA in rax (rip's value is the address of the next instruction).
`1254: mox eax,DWORD PTR [rdx+rax*1]` stores into eax the value found at the memory address found by adding rdx (entry offset) and rax (JTBA); this value is the offset from the JTBA for the address to jump to.
`1257: cdqe` sign extends* eax/rax from 32 bits to 64 (the sign of a signed integer is stored in the most significant bit; sign extension essentially copies that bit to all the new bits).
`1259: lea rdx,[rip+0xdf4]` is storing the JTBA again, but into rdx, because rax was overwritten.
`1260: add rax,rdx` adds the JTBA to the jump offset, producing the address to jump to.
I don't know the answer, but what happens if you set your optimization flags higher, or for the switch code if you compile with -fPIC (position independent code)? Also, is there a case for switch in any performance dependent code? Like let's say you are crunching a ton of numbers, I've never really seen a process where switches are applicable because usually you've set up your data to work with fast simple code.
Switch statement is also to do docotomy when values are dispersed.
We can also take advantage of both worlds, by doing if (most probable value) and switch on the other values.
Ah yes the good old jump table. I remember playing ShenZhen IO (it's a game (if you can call it that) where you build digital circuits and code with a very simple RISC assembly) back in 2018. One of the levels was a sandwich making machine or something, and I had the idea to use a hard-coded jump table (using a ROM) to store the recipes and thereby cut down the number of instructions.
This is basically just that, but slightly more involved. Slightly.
interesting. Is this also true for other languages like C++ and C#, or just for C ?
For small count of options, a jump table can have negative impact because branch predictor can run into difficulties to predict jump through the table. However, the compilers often transform this special edge cases into if-else series anyway.
If one option appears often then others, is is probably better to make if statement with likely attribute for this option and then use switch-case for other options
What do you mean with prediction? isn't it a number that goes into adding to the program counter value to jump for that specific branch location line?
This was super cool. I got out of the habit of using switch statements but I'll start doing it more now.
Hi. Can you also compare switch statements with array of function pointers? I understand that switch statements are faster, but how much faster? Because for large cases the array of func ptrs improve readability.
I don't undertand: how does it arrange the jump table and how does it use one computation to get to the location? The enum values we need to compare against are not evenly spaced as far as I can tell.
The values in the jump table between the valid enum values can just be junk.
The location is a relative offset based on where in memory each case begins. Which "line" of code you're on is a variable that the program modifies
Is this only in C the case or in other programming languages as well ?
It looks more like a jump table than a hash table as others have opined. Also, this only works in certain cases, yes? if the range of the switch is too large (say, outside the range of a few alphabetical letters) then I think the switch statement resolves to a series of "if" statements. Great video, I'm loving all this geeky low-level stuff. I'm subscribed and continued success!
7:39
movzx eax,
movsx eax,al
I don't get what point of copying something to eax and then overwriting it?
Because optimization is not on
@@liningpan7601sorry, but your explanation does not really make sense in my head. Maybe movsx overwrites EAX partially? Cause otherwise we overwrite useful data...
@@NotYourArmy666 The second mov sign extends the byte to 32bit, so the actual result of subtraction is what you would expect if you look at the 32bit register. This unnecessary and it does go away if you turn on optimization.
@@liningpan7601, ah. i've got what i was sillying with. Really the basic stuff... It's even shameful...
Cool video! But I didn't quite get how the compiler generates the index at 10:00 , is that related to the variable being an enum? Or would it also work if we just used a char directly?
It should work even without the enum, as enums are just for your convenience really. The compiler basically breaks down what you write into a bunch of gotos so it really isn't hard for it to figure out how to generate the proper index
I can't help but wonder why gcc wouldn't optimize the iferoo version by doing the "movzx eax,BYTE PTR [rbp-0xc]" just the once. Seems unnecessary. Isn't it already in eax for all the subsequent compares?