Well that was maybe when someone had to do refactoring thinking: "This 1.5 is something I can refactor to reduce the number of magic numbers by 50%". That other magic number I cannot refactor because I would have to name it magic_number because I don't know where it comes from. Of course turning 1.5 into a variable named threehalfs also does not improve the readability and introduces a spelling error but fine. The code metrics have improved as was the objective. Maybe I'm wrong.
@@richardbloemenkamp8532 I think the answer is HOW C compiles code. The compiler adjust what is and what is not a constant value based on the code (and the compiler of course), and most importantly, WHERE to keep that value. In certains cases, constant are stored elsewhere in memory and does exists (or is available) at all times (it must be accessed first). Then you have constants that are stored IN the method itself, sort of like an "inlining" in JIT compilers like Java or C#. I think it is highly probable that the compiler the devs used THEN gave very slightly better results when it compiled the code when written this way, rather than otherwise. Of course, it would be the same algorithm, but the data access speed would be different (storing values in registers, in less CPU cycles), hence the choice to leave it like this. When the goal is speed at any cost, those additional CPU cycle to gather values and store them in registers can add-up quite quickly.
People have said it here in different ways but to say it clearly, a constant is not stored in memory and is just a macro replaced with its value upon compilation.
@@DaMonster This is not true, since on most platforms you cannot write 1.5F as an immediate operand in assembly, so the compiler has to put 1.5F in memory somewhere. It then loads the constant into a float register at runtime.
It has been explained many times. Even on youtube, if you search for "0x5f3759df" or "fast inverse square root" you can find many videos, some dating back to 8 years ago... Or you can look at the wikipedia page for more informations.
Performing a division AND a inverse root using only multiplications is nothing short of real world black magic. I am sincerely impressed.
Рік тому+219
@@kalpeshpatil8912 If you go to assembly, multiplication is just… multiplication. CPUs have had multiplication opcodes for decades, even for floating point.
Oh man, I remember taking a CS class that taught floating point representation with the exponents and mantissas, and I thought to myself "ok this is cool but I'll be programming higher-level than this, so I probably won't use this knowledge in practice." Well, this is a great example of how you might use it!
you just explained IEEE 754 better than any tutorial i could find and better than any university teacher i listened to. the animations helped a lot with the understanding process.
@@traida111 The exponent is how big the number is, the mantissa is specifically what number that is. It's the equivalent of saying "37 billion" instead of "37,000,000,000"
It's even scarier than "only 1% error"; using the default magic number of 0x5F3759DF actually gives about 0.175% error maximally, dropping thereafter. It is now believed the original author of the function (not Carmack, who never claimed he was), found the constant through trial and error rather than derivation. If you use Chris Lomont's updated magic number of 0x5F375A86 (which he derived), the error drops to a staggering 0.09% after just 1 iteration. BTW, the original author is unknown, but speculated to be Canadian Velvel Kahan, Turing Award winner and mathematician, circa 1985.
I thought it was implemented by John Carmac, but according to Wikipedia: "The algorithm was often misattributed to John Carmack, but in fact the code is based on an unpublished paper by William Kahan and K.C. Ng circulated in May 1986. The original constant was produced from a collaboration between Cleve Moler and Gregory Walsh, while they worked for Ardent Computing in the late 1980s." Thanks for educating and for your great work.
@@RedSntDK for the original guys definitely but quake 3 came out when the internet had been in serious use for years so I suspect not too difficult to come upon.
@@patty109109 I assumed they ripped it out of some existing math library without really understanding what it was doing, hence the "wtf" comment. These tricks are rarely used now days because CPU/GPU has gotten so fast in doing divisions and square roots.
Рік тому+9
@@patty109109 Please define "serious use"… because to me (and many many people), serious use is what the Internet (the actual Internet, and not timbl's Word Wide Web) was originally made for.
FYI for anyone learning C/C++: the "i = *(long*)x" approach is not standards compliant. It obviously worked for Quake III, but for a modern compiler under high optimization, this could produce bugs. The reason is complicated (search "strict aliasing" if you want to know more), but the correct alternative would be "memcpy(&i, &x, 4)". It's also a good idea to use "uint32_t" instead of "long" to ensure "i" has 4 bytes. Modern systems often have "long"s that are 8 bytes.
Also FYI for learners: there is no reason to write any code like this ever today. A lot has happened since the 90s and modern compilers are smarter than 99% of programmers will ever be.
@@Debrugger and here you wrong, compilers bound by standard, so it can't do most of tricks used in the code, as each of trick which makes calculation faster also makes it less precise and there is no standard way to say for compiler how much accuracy it needs to preserve. p.s. gcc has option --ffast-math to makes some optimization which do lose precision (compared to standard), but again option itself is not part of c standard, nor it allows to say precise how much precision can be lowered. (i.e. it will never go to such low precision as used in this code, even with such option), and which part of code it should be applied to.
@@SuperMarkusparkus while Thomas is wrong, answer to your question depends on which cpu you want to use for example for intel cpu there is instruction rsqrt14_ps obviously its still your problem to organize your data so you do 16 calculation of normales at once, otherwise no 16x profit.
It's one of the easier languages to understand the syntax from though, as there isn't a lot of syntax. And what is done here is a workaround for something the language wasn't designed for, so it might seem somewhat cryptic, and I'm pretty sure it's undefined behavior (meaning not defined in the standard, meaning platform/compiler dependent, as float doesn't need to be in IEEE 754 + the platform might have different alignment requirement for float and unsigned int + unsigned int doesn't need to be 32bit (all not common things though, but exist)). Maybe a clearer way to achieve the same is using a union of unsigned int and float, though you'd have to copy the value (which the compiler might optimize away, but probably not when this was written) In other words you shouldn't be doing something like this, unless you very well know what you are doing, and have testing around it to make sure it does what you expect for every target platform and compiler. To me this is what makes C and other languages like it, powerful and fun :)
This is just how every language works, but in most of them you don't have access to pointers (address of the value), you can just access the value the pointers point to (value in that address).
@@frydac well, it's not really any different than expecting an INT to be 32bits, when C does NOT define INT to specifically be 32bits. But it's almost always a safe assumption (or was).
You should try modern C++. I think the original C K&R was less than 300 pages (if I remember correctly) that describes the entire C language. I just bought a book that tries to clarify just one feature of C++, move semantics, and it is almost 300 pages long. I weep for humanity.
I'm reminded of the single greatest code comment I've ever read: "I do not understand the following line of code, but I've verified it is required for this function to work correctly." It was in a bit of Perl code, a language with several peculiar syntactic quirks.
The best coding that I ever did in C and in which I got a big smile and patted myself on the back for the effort, was using an array of pointers to functions where the correct function would execute based on the array index. Man, was I ever impressed with the poetic beauty of such a simple and short series of characters that commanded so much! I ran my code, it worked fine, but it was much slower than using a simple switch/case construct. LOTS slower. So, I went back to what I had before. That was back when I was using a 16-bit MS-DOS C complier, so no doubt all the crazy addressing shenanigans going on behind the scenes really mucked things up. I wonder how well it would run today as a native 64-bit compile, but the working version that I still use is from (with one, semi-recent and tiny data update) twenty years ago and written as a 32-bit Console app. Still, pointers-to-functions are really cool to invoke.
I don't think I understood this well enough to explain to my friends how cool it is... but I definitely understood enough to have my mind blown, and to tell them they need to see it
It's also one of the best way to explode something in your code (or better yet, corrupt your data in tiny, nearly invisible ways). Some people learned that the hard way.
Either that, or the dev that wrote it got that line from an online forum and had no freaking idea why it was subtracting from 0x5f3759df. As a dev I understood the rest of what this video covered by just reading through the code for a few seconds. The math behind using the constant 0x5f3759df still just made my brain freeze up. Either the dev had a degree in math (totally possible), or he was just accepting that the line worked.
@@normalmighty I think we all end up occasionally taking a solution from the Web without fully understanding how it works, but when I do that I include a link to the webpage in the comments for future reference.
@@normalmighty "Either that, or the dev that wrote it got that line from an online forum and had no freaking idea why it was subtracting from 0x5f3759df. " Yeah, an online forum before 1998... sure.
Frankly, i prefered this German dude, he just shows you what is going on from first principles, didnt try to 'blow your mind', and i learned something.
I remember when I started programming professionally in the 90's. Id's performance tuning was something that us programmers would nerd out about, and I'd often write my code to similar standards. Curious coworkers would sometimes encounter some of my "performance-tuned code" and be like "uhh, why did you do this like that?" I'd explain all about efficiencies and they'd just kind of roll their eyes and be like "okaaayy." We were, after all, just writing client-server apps for small businesses, for which performance tuning was totally pointless. But I wanted to be a game developer, damnit!
That sounds like my life being a graphics artist in the military simulation industry. The bar is set so low, even in 2022, that people we hire on that have been in the industry since the 2000s are either A, are asking why on earth we are wasting time experimenting with dynamic lighting and PBR materials, or B, blown away by how awesome our graphics are (which, considering we have hired GFX artists that think A, we do have a mixed bag of high quality art, and people who think blockouts are good enough). I'm in talks with a programmer friend about making an indie game just to take the edge off.
Lol this is me right now. I’m working on some iterative/list code that I’ve managed to get ~30% faster than the leading industry library. But even over 1M iterations it’s only a difference in milliseconds lol
@@revimfadli4666 for example for those who want ultimate cryptocurrency mining nothing beats custom made boards that implement the algorithm in hardware. The so called ASIC miner
@@Sorcerer86pt agreed, getting hardware-baked performance, without taking away gpus from people who need them & resigning them to that single use, is a win-win Although I was talking about materials scientists/engineers who formulate the semiconductor material itself & its manufacturing:P
It's not just being close to the metal. Math helps a lot any time you have a very hard problem to solve -- for example, the folks at OpenAI for example are 90% math PhDs and they don't spend much time programming bare metal I would guess. In the old days every problem was hard so math was valuable for everyone. Now there are lots of problems that programmers can solve that are not "hard" in a traditional sense; making things pretty for customers, implementing simple business models for yet-another-market, taking someone's prototype idea and making it into an app or a website, etc. But if you try to solve cutting-edge problems you will get much further if you understand mathematics deeply, especially linear algebra and multi-variable calculus which come up over and over if you know how to recognize them.
The video goes kind of light on the Newton iteration part so I wanted to clarify some things: f(y) = 1/(y^2) - x y is our estimated inverse square root, and we’re trying to get a better estimate. To do that we want to get f(y) close to zero, which happens when y is correct and x equals the number that we’re taking the inverse-square-root of, ie the input named ‘number’. The x from the above formula isn’t being updated, we know it exactly already. The derivative of f(y) is -2/(y^3). This is because the x is constant, 1/y^2 is the same as y^-2, so multiply by -2 and subtract 1 from the exponent. I was confused about why it looked like we were solving for a new value of x but we aren’t we’re getting a new value of y so the line is really ynew = y - f(y)/f’(y), which after we enter the function and its derivative is ynew = y - (1/(y^2) -x)/(-2/(y^3)) = y + y/2 - x*(y^3)/2 = y * (3/2 - y * y * x/2) …and then it matches the code
Maybe obvious, but can you explain to me how you know f(y)? Everything else I follow, but as I see it f(x) = y and y ~ (1/x^2), so f(×) ~ 1/x^2. What am I missing?
@@Gnefitisis before i say anything, i would first like to clarify i am no CS student and I also do not 100% follow this, but i think i understand where this part comes from. i think the answer to your question comes from the sequence of steps revolving around why you want to use newtons method and the rearanging of the formula to aquire f(y) instead of f(x). Since you origionally have the function f(x) from the initial steps of this code, but you want to find a more accurate approximation of what "y" would be equal to after the "wtf" line, you need to rearange the formula to make it become a function of "y". This is because you are no longer using "x" in the function you are now using "y" to find "ynew". So technically they are both using the "y" variable, but after performing the newtons approximation it reassignes the calculated value to the variable "y" again, meaning that it was a "function of y to calculate a new y value". I'm not sure if I answered your question, but if you want me to try and explain anything better i can try. But from how i interpretted your question, it seemed like more of a misunderstanding of the use of the applications of the variables in the video, not that you don't understanding the actual pure math being used. I hope this helped, if not I can try again if you'd like lol ✌
Some of my computer science classmates wonder what good learning the IEEE 754 standard and Newton's approximation could possibly do, and the answer is that it helps us to understand a cool video about a quake III algorithm
He will come back based on his comment in his community tab: *"I plan on covering algorithms in general. But if there's some obscure piece of code accompanying it, that's just more bonus points for the algorithm. - Nemean"*
@@sbravoo This is not the only impressive thing John Carmack has done, he's a "famous coder" if you will so you can bet i took notice when he left ID Software and went to Oculus as CTO.
I can't believe someone explained this piece of ancient code so eloquently. I remember reading about this routine years back and not having a single clue how this slice of magic worked. Thank you good sir and have a fantabulous day!
Eh, it hints a little. “Evil Bit Hack” = Exploiting the data in a way you’re not supposed to. “What The Fuck” = Black magic sorcery. “Newton Iteration” = Applying a mathematical law.
@@SomeFreakingCactus That's what I saw too! It does seem pretty straightforward as far as what it does, but tbh the actual process still hurts my brain to try to figure out
The misunderstanding there comes from not knowing why a length of a vector is calculated in such a way. First, you should keep in mind that this vector exists in 3 dimensions, not 2. Second, the math - ua-cam.com/video/51vgIfdBlAk/v-deo.html&ab_channel=rootmath
I have long heard of this magic, but never understood it. After watching your presentation, I held manager level understanding just for a moment. It’s gone, but that felt good. Thanks for making this understandable, even if only for a moment while you held our hands.
well funny thing is, they actually used engineering methods to solve their computer problem, this is the things that if i did use them, would not make me question myself why am i studying software engineering
@@Roald94 Well, software engineering today vs the one I studied 30 years ago are way different. It is also the reason why I quit developing and went to the infrastructure side of computing and can now yell at developers that they are not creative anymore and are not unable to optimized anything (my major was algorythmics) : they just copy code and use design patterns because they is no creativity anymore. Also Java...
If you're wondering why μ ≈ 0.0430: We know that log_2(x+1) ≈ x. If we want to see the relative error between them, we can use the function f(x)=log_2(1+x)-x. We want to find the maximum of this function, and we know that it only has one maximum. Since the function is continuous, we can find its maximum by finding when its derivative equals zero. Using a bunch of rules, we have f'(x)=1/((1+x) ln(2)) - 1. Setting this derivative equal to zero, we see that the maximum of the function occurs at x=(1/ln(2))+1. Substituting this number back into our first function, the maximum value is log_2(1/ln(2)) - (1/ln(2)) + 1 ≈ 0.860. To minimize the error between this maximum and our minimum, we want μ to be halfway between them. Taking the average of (0, 0.0860) yields our constant, 0.0430.
So like, I'm curious: The way we find the average value of the difference function is through finding the integral and dividing by our integrating interval length (1 minus 0 in this case). Wouldn't it be better to subtract by this amount to reduce the error on average? I realize that the 0.430 mu term is used to make sure the absolute value of the error is always less than or equal to 0.430, which means we can say a statement like "our error is always less than x%". But is there a good reason to prefer this over subtracting by the average value? Avg value seems a lot more relevant if you want error to be closest to 0.
@@horleckshelmos8472 The _maximum_ error is actually more important. If most cases give a close enough result, but there are outliers, your graphics will look weird when you run into those outliers. System math libraries typically want to minimize the maximum error when calculating functions; that's where the term "minimax polynomial" comes from.
Greetings from France. This video kicks a thousand arses. This Quake III algorithm is a mix between mathematical analysis, computer science and wizardry. Thank you.
John Carmack is my spirit animal, even just for releasing all of his games' source codes after a couple of years. An absolute genius of a programmer, and a blessing to the gaming world since 1990.
Just to be clear: he isn't the inventor of the algorithm. It was in use at SGI. In fact, he didn't fully understand how it worked, hence the "WTF" comment. Nevertheless, he is a genius programmer.
I remember reading that John Carmack hired someone to optimize Quake 1 code back in the days. So, it doesn't mean that all of Quake source code was written by him.
@@jonnyj. How do you know ? Any source to back up your claim ? Let's assume the author of the comment wasn't Carmack. Surely he could talk to him since the comment was on the original source code. So it's rather unlikely that the "WTF ?" and the magic constant would have stayed if Carmack knew where they came from. And even then, the algorithm was not invented by John Carmack, that is a well known fact. As soon as the Quake source code was made public, this algorithm was singled out as a source of interest and its origins were traced back to SGI.
You've explained this legendary piece of code and the concepts super well! One small nitpick (from an HW engineer working on FPUs, not a programmer). The "that's just how C works" part, where pointer aliasing is used for type punning, is actually undefined behavior (in two ways): 1. Using two pointers to the same memory location independently is not guaranteed to work like that in the C standard, but compilers mostly are nice enough to allow it and it works in this case. Using `union` would be better, though I think it's also not 100% clean. The "proper" way of doing it would probably involve explicit `memcpy` calls which would be much less performant. 2. Interpreting IEEE 754 values as anything other than floats is undefined behavior, as there are no guarantees how the bit patterns actually look in the processor. The formats in question are the "interchange formats" for when you need to transfer float values as binary strings. Implementations are allowed to store the values however they like as long as the operations are numerically equivalent, but most CPUs just use the format you've shown, so it works in this case. PS: Nowadays, processors have built-in instructions for inverse square root, so this fast inverse square root is much slower and less accurate than what a modern CPU can provide.
So if pointer aliasing would be generally undefined behavior, basically every HTTP, RTP, etc. parser would be illegal, since it takes a chunk of bytes dropping out of a socket and re-interprets it as e.g. an http-header struct. What you are referring to is breaking the 'strict aliasing rule', which does have some exceptions for legal and well-defined pointer aliasing.
@@dennis8636 Actually, not *pointers* alias, but memory locations. You are always allowed to address any memory with byte (char) pointers, but memory containing a float may not be accessed by a int pointer (may not meaning it has undefined behavior). Modern gcc will actually optimize the memory access away... been there, got my whole function optimized away. unions are the way to access bit patterns of floats.
@@Naburgondux I think it's perhaps true everywhere-- and this case illustrates it. You could say that id used a very tiny library, in principle . That's being a good programmer:don't reinvent wheels if you can use wheels made by geniuses. Carmack is a genius who invented many of his own wheela. As such he had no ego limits against learning from other geniuses.
I'd say both programming and math can be branches of logic. Specifically, the formalization of logic as well as the expansion/generalization of logic. They're two pathways to accomplish essentially the same goal. One doesn't necessarily follow from the other. But being good at one implies you're either good at, or it would not take long to become good at the other.
@@HueghMungus I kind of grasped it because it combines what I was taught during the time I spent chasing a degree in CS that I probably won't end up finishing due to other stuff and mental health issues. So yeah, approximately 2 years spent living and breathing CS. (At one point a professor told us we should be spending 8 hours a day, 5 days a week, working with C outside of school, with his class being one of 3 equally difficult classes out of the 6 classes in that semester alone. I wonder where the mental health issues came from.)
The explanaition among the smooth animations and the genius of Quake's III engineers makes this one of the best explainatory videos I ever seen in this platform.
Absolutely true. And if the code were meant to be implemented on several platforms, it would fail horribly (e.g. most modern processors have 64 bit longs, but that compiler on that processor had 32). But when you have to eke out the last few cycles and you don't need portability, it can work wonderfully. I have a lot of code flying around in space that uses UB because we needed the memory space or cycles to succeed. Pragmatism can be a virtue.
@@michaelterrazas1325 Yep, I agree, there are even situations where you cannot get the performance you want if you were to follow the standard to the letter. And you shouldn't; all you *need* to do is solve your problem and that's it. This case, however, you can solve with modern compilers without breaking the standard by just using memcpy. This wasn't the case back in the day of Q3.
@@RushilKasetty 1) Casting a pointer of one type into another and then accessing the object stored there. 2) Looking at the bit patterns in a float. 3) Assuming a congruence between the size of a float and a long. 4) Performing integer arithmetic on a float object.
@@michaelterrazas1325 OK. But the algorithm can be made portable by using a union instead of type-punning, using uint32_t instead of long, and testing for IEEE 754, for which there is a macro in C99. Then you know that a float is an IEEE 754 single prec number (so 32 bits), and the union takes care of the rest.
One of the most inspirational techniques for me. When I learned about this, I began to look for ways of optimizing things that had been taken for granted, no matter for how long.
I'm sure this is well explained and a mindblowin concept. I'm unsure however why this popped up in my feed and further why I watched the whole thing without understanding even half of it. Now I know how my dog feels when it talk to him
@@igorstasenko9183 Someone's got an EGO lol. Surround yourself with people at or above your level and you can talk about programming all day all night.
@@davidwest6019 I get spoiled (a bit) with cloth, houses, electricity and computational algorithms every day. I rate it 9/10, but I'd still recommend it to anyone.
Half is the noun. Halve is both the verb and the plural. (I only know as I looked it up last year). Personally, I prefer the nouns for variable names and verbs for methods. Even though three halves is the correct English. A bigger issue is that they couldn't be bothered to capitalise the leading character of the second word.🙄
2 years ago when I watched this, I had no idea what I was looking. After majoring for comp sci for the next 2 years and took Calc 2, Data Structures, and Computer Organization and Architecture, I have come back to video and finally understood the video.
Well, real numbers can't be represented in memory anyway since they have an infinite number of digits. So everything in the end is an approximation. Usually after 8 digits it's "good enough". For me usually 2 digits is good, but I work with milisecond. So for me 3.27 miliseconds is a good approximation, having 3.2698 is just useless.
This video is made so well, that I actually managed to understand it without a C programming experience. Far better than any other video on this topic.
I worked between 2016 and 2019 on the latest revision of the standard, ieee754-2019. You should probably note that the "evil bit hack" which I've used myself many times over the years, is no longer canonical C: You should instead use either a union, or memcpy() to move the bit pattern between the types. In reality, modern compilers will convert all three into the exact same code, i.e. either nothing (if integer and floating point use the same registers), or a move from one type to the other.
For anyone wanting to see the final derivation at 19:10: f(y) = 1/y^2 - x f'(y) = -2/y^3 y_new = y - f(y)/f'(y) = y + (1/y^2 - x)*y^3/2 = 1.5*y - 0.5*xy^3 = y*(1.5 - 0.5*xy^2)
Been a while since I have done derivatives and partial derivatives, but for this case, dx/dy is ignored or removed? Unless I’m misinterpreting the math here, which is possible since these calculations were never easy for me.
@@metalpachuramon What do you mean "miss the point" ? First point for every code is being readable and maintainable. We are sacrificing this only if the performance is important. And as I wrote a lot of perf-critical code, I can say that on review colleagues of mine always asked to improve readability without penalizing a performance. And if in your company reviews are equal to "check for code style" than you have a problems with expertise and/or review culture. It should be the experience sharing first of the place.
Your 20 minute video explained things better than most of my college courses. The explanation and the way you showed the bit values was really nice. C is the language that keeps on giving. I’ve dabbled in C for many years and I still feel like I haven’t mastered the basics.
I love examples like this. They aren't "Wow, what a mathematical genius" but instead "The person who thought this up clearly has a genius level mastery of their craft". I know about Newtonian iteration, logs, the physics related to using normed vectors etc but I would never. EVER. think of this. A friend of mine does coding where to get extreme speeds out of the computer they'll do compiler tricks specific to the architecture of the CPU so they can do similar stuff to this. I don't understand what he does.
A warning to fellow programmers: DO NOT use this anymore! This is a super outdated method that made sense in 90's. CPU native sqrt since pentium 4 is FASTER.
I enjoys math and and it's been years since I've opened a calculus book, a professor like this would have made my advancement much more quicker. super easy to understand him and get the full concept.
I've seen a paper previously with this covered in depth. Well done relaying the content in your UA-cam video, very easy to grasp numerical methods intro :)
He came dangerously close to the most hated words for eng students. "It can be easily shown that...." and he ACTUALLY used the corollary of the above hated words "It's left as an exercise for the reader...." LOL. Thanks for the video.
"The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with Gary Tarolli's implementation for the SGI Indigo as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods." /Wikipedia
"You simply _______ then obviously you ______, it's super easy!" Nearly everything in this video is so far over my head it's insane and fantastic, maybe one day I'll understand what you're talking about.
What an insane piece of code This just goes to show how much coding is about thought as it is about actual code Amazing explanation, this is almost 3Blue1Brown level quality, you've earned yourself a subscriber
Sadly, golden nuggets like these are few and far between. Even in good codebases, there is a lot of poorly structured code that tends to build up when bugfix after bugfix after bugfix gets applied without looking at the bigger picture again. And this is not even the fault of the developer, because you also don't want to redo large portions of code every time a bug fix or small feature is inconvenient. Planning and thought are definitely important, but from my experience: Most of the time you'll spend coding is being reactive to the code that already exists. You try to apply minimal changes to it while still making it do what you want to do. Everyone likes small commits!
I'd actually consider this 3Blue1Brown quality, this is insane! Is this algorithm faster than the normal square root algorithm if you inverse it? I think I'll actually test that.
@Qimodis the rules of physics had to be applied in the game as well for realistic lighting, so yes, this is really physics. Anytime you are dealing with animation and lighting you have to give objects in the program properties and rules that resemble real life physics behavior, in that way only it will look realistic.
@@NomadUrpagi Remember that it was not a single person behind this engine and there was some level of separation, who deals with physics, who is modeling, level designing... So we a probably forgetting how important teamwork and knowledge sharing between those teammates was. IMO, it is possible that people who coded this were not that level of Gods as we now think, but their teamwork was so synergetic, that they were able to come up with such brilliances.
As a teenager I spent countless hours playing Quake III and I am just amazed about Carmacks trickery. I would love to see a performance comparison between his code and the usual non-optimized approach.
@@kurzackd It predates Q3A, the Q3A source code is just where it became mainstream knowledge. Apparently one of the earliest known usages was in the SGI Indy but it predates that as well.
@@kurzackd Dave Plummer basically told us in his last video. But you can also just see it in the code (Abrash's books and timing at iD also give it away)
As a millennial programmer, I thank the gods every day that these OGs paved the way with stuff like this. I probably couldn't even recreate pong from scratch...
When you used packaged math fnctions from popular game engines, they are not optimized like this. they give you accurate data instead of fast data. This is why people who don't know low level programming turn out slow games because they don't even realize why it's slow.
@@roberte2945 Unity is wasted potential in my eyes. They should have just stuck with C++ like unreal engine. Using C# as a scripting language is great, but NOT for fast 3D graphics.
no shame in not being able to recreate original Pong as a programmer since it wasn't software in the first place but discrete logic chips. first 4bit CPU had only just been invented (building computer in a game machine out of discrete chips would've been cost-prohibitive before that)
If you can’t do pong, I question the fact that you are actually a programmer. It’s very important to understand the basics. Try and do it! You will learn a tone !
In case anyone is curious, John said in a recent podcast that he was not the one who coded this and is surprised he gets credited for it. It might have come about due to his prolific reputation as one of the best programmers ever. ua-cam.com/video/I845O57ZSy4/v-deo.html
Correct. It's the reason for the comment 'what the fuck?' in the code. Id Software didn't fully understand the code at the time when they used it either.
11:15 This is the difference between secant and tangent linearisation, both very useful and picking the right one can definitely make your life easier. Excellent video, thank you for the added info instead of a 2min video on just the core of the algorithm, it helped my understanding.
Man I’ve had this video saved in my watch later for like a year now and chose the perfect time to watch it. I’m taking computer architecture right now and my teacher is not the greatest at explaining things. Your explanation of IEEE 754 just saved me a ton of trouble 😂 thanks
I first learned about this algorithm years ago, and had a vague idea of where the magic number came from by reading from it, but never fully understood. I think your video explains it very clearly. Thanks!
@Michael: the most important word in the quote is "premature". I don't think that Knuth was against optimization in general. And I agree with you in the sense that one shouldn't cluelessly waste cycles (or memory, or both). But it doesn't make sense to fiddle around with bits at every place in the code from the beginning, unless you find some really hot spots that lead to a huge performance gain when optimized later. Many other early "optimizations" might turn out as nearly useless.
@@minastaros The first problem is that shitty programmers solely focus on "premature" and use it as an _excuse_ to _never_ optimize. Their "solution" is to just throw more money (hardware) at the problem. **FACEPALM** The second problem is that the **entire sentence** _"Premature optimization is the root of all evil_" itself **is complete bullshit.** Optimization is NOT the problem, let alone premature optimization. Shitty class designs such as mis-using OOP with deep hierarches, ignoring performance, ignoring memory management, shitty function and variable names, never looking at the assembly output of the compiler, etc. are MUCH more important then optimizing the "wrong" thing. Third, there is no such thing as "premature optimization". You are gathering a data point about potential performance! However it needs to be done in the CONTEXT of the larger problem. Are you being _efficient_ at WHAT you are optimizing? i.e. The most bang for the buck? Software engineering is all about managing decreasing returns. Optimization doesn't magically make that go away. First, you CALCULATE the maximum theoretical speed "best case". You will NEVER be faster then this! How do you KNOW your implementation is slow? By comparing to the best case! Next, you write a REFERENCE implementation because it doesn't matter how fast you are if you return the wrong results. Then once you have a working algorithm you can then focus on the what _exactly_ is happening to the data. You then write a DATA ORIENTED DESIGN version because it is simple to write, and fast. So yeah, the entire phrase is entirely misleading, inaccurate, and generally treated as dogma by programmers who are generally clueless.
@@MichaelPohoreski I think the point of that somewhat glib saying is that a lot of inexperienced programmers focus way too much on optimizing their code before they have a clear idea of what level of optimization is appropriate for the task at hand. In my experience there very much *is* such a thing as a premature optimization. It happens when you spend days or weeks making something perform like, 10% faster or smaller when that 10% doesn’t add up to more than what that engineering time was worth to you in the first place. Sometimes, like with Quake 3, coming up with something like this is necessary due to hardware limitations. Very often, it’s really not, and the not-clever solution is fine. I also find it really annoying when people hold up this sort of programming math trick as the line that separates good programmers from “shitty programmers”. It’s kind of elitist.
Also, I’ve read Carmack’s writings on programming, and I can’t recommend him enough. He’s a genius, but I also find him to be a very practically minded engineer as well, and his approach to design and software architecture is really insightful.
@@nicholaswood3250 Optimization starts _before_ you write code -- not after. If you aren't thinking of cache lines and HOW the data is being accessed you are a shit programmer. There is **a reason** Fred Brooks stated: _Show me your flowcharts and conceal your tables and I shall continue to be mystified. Show me your tables and I won't usually need your flowcharts; they'll be obvious._ The modern vernacular is: _Show me your code and I'll have to guess at your data structures. Show me your data and I won't have to see your code; I'll already know it._ Bad programmers think about code / algorithms first. Good programmers think about data first. Optimization **begins** at day zero. _Not after the fact._ This is why that ignorant statement by Knuth is complete bullshit. The people who don't know how to optimize constantly make excuses for why they remain stupid.
13:05 this is the point where the magic happens. granting the program the lenience to be less accurate rather than forcing it to calculate unnecessarily exact values is what reduces the time and energy demand on the computer
Why don't they show us this kind of stuff in my college courses? Part of what we learned in CS was the flag-exponent-mantissa calculation and we had to write and interpret numbers using it, and although I thought it was very fascinating I only thought of it as a fun fact and nothing more. Never knew it - along with derivations - could be used like this, this is simply awesome!
Because there is virtually zero practical application of this technique. It is so niche it is ridiculous. If anything it would be irresponsible of a professor to waste time on such a thing.
@@jyutzler I'm not saying they need to show this one specifically - rather, I'm arguing that real-world examples of its usage would make the topic far more interesting and make otherwise seemingly pointless material purposeful.
Computer graphics books are full of weird hacks.. for example, the algorithm for efficiently drawing a straight line in the legendary Foley, Van Dam, et al book. CG is all about hacks like this. The Q3 developers just continued this tradition.
You should know how IEEE floating point _works_ because it’s fundamental to just about any system that’s not ancient or using a decimal library. (For example, understanding why when creating a list of numbers in NumPy, incremented by 0.1, you get …interesting… results) But learning bithacks is pretty useless and not generalizable except as an interesting study.
UA-cam recommended this to me, seemingly at random. I also note that it's the only video you have made at this time. I'm absolutely impressed with the content and encourage you to make more videos like these.
@@botlarry4437 ooh that's fine as far as you get a taste for low-level prog you are good better than never touching something and staying only in py/js world
@@nopegok318 "we don't want this conversion to mess with our bits" Converting to Judaism will mess with your "bits". Meaning circumcision of the penis. Clear enough? :)
This code is so amazing. It's a superb balance between math and cpu instructions speed. The lines have more inside knowledge than meet the eye. Not sure what compiler/linker id used, but when transcoded to assembly, this may well be the best optimized code speed/precision wise. I watched a documentary some time ago that said that Mr. Carmack programmed endless hours on Wolfenstein 3D, in order to be able to achieve better frame rates with the petty hardware available back then - when many many devs were merely dealing with sprites in their products.
While I was going to school for my B.S. in EE, one of my favorite classes was on assembly and machine code. Learning how to write assembly code requires an intimate understanding of the internal hardware that comprises a computer. Plus the resultant programs were very efficient, small size and usually faster than the same program written in a higher level language like basic or c\c++. Plus it allowed me to write code for my TI-89
@@seanriopel3132 As a mere hobbyist in programming, I learned assembly (for some early processors) because I thought it was interesting, but have never bothered to learn C. I thought it was amusing that they had to use a trick to get C to use bitwise operations on the float -- with no data typing in machine language, it's a problem you don't have in the first place. As convenient as C is, it also created this extra problem. In any case, I'd really like to see what the assembly for this ended up looking like.
@@seanriopel3132 We only had assembly as an option back then. However you got into the habit of writing efficient and well documented code (because you would have to maintain it and no matter how efficient you made it you needed memory joggers to explain it to your future self!!).
@@CruisingDog_inCA damn straight. Even when you were an entire assembly program, and tried doing back and fixing some thing, it was like looking at hiroglyphics
This is great; I have way more experience with math than with coding, so it was very interesting seeing how people use math tricks to get around coding quirks! Like, I personally would never need to care about the speed of dividing something, but small bits of time add up really quickly in code! Super neat.
Well at least there is a point in knowing it to understand why float is imprecise. I've just recently had a bug in business application that works with money (calculates exchange rates), and I spotted the bug instantly in a wall of numbers (1.2/1.5 was equal to 0.80001)
There is no point, actually. it's obsolete for two decades, dude. And remembering things just for use it once and never again... In programming you don't do it, you study such things on demamd., ok?
@@Acid31337 How do you know then what to study when you need it and don't know that it is a solution to your problem because you don't know how it works? Chicken and egg
Additional commentary (as it wasn't super-touched on, but I feel is worth mentioning): the need for this algorithm was because at the time Quake III came out, very few 3D graphics cards had dedicated hardware for calculating lighting and geometry transformations (T&L, it's called), requiring the CPU to do the heavy lifting of doing it advance before pushing the results to the GPU for drawing. Therefore, CPU efficiency was paramount, and integer math is typically much faster than floating-point math in most applications-in most CPUs, you have to send the floating-point values to a separate computing unit (the FPU) and wait to get the results back. And even in the integer math world, multiplication and division by factors other than 2 could be costly operations where every cycle counted. A year before this game launched, AMD came out with the "3DNow!" instruction set extensions, and Intel followed a year later with "SSE," both of which could do the inverse square root in one instruction on multiple data points simultaneously. However, since most people buying the game wouldn't have these new processors immediately to hand, it made the software algorithm a necessity all the same. The first consumer PC graphics card to have hardware T&L support was the Nvidia GeForce 256 in 2001.
I remember when I started working with Calculus and I only had an Apple IIgs to work with. The ToolBox had a really nifty built-in set of functions called SANE (Standard Apple Numeric Engine). I had to buy a book, which I still own, that describes SANE and the IEEE 754 standard in detail. I'm so glad someone else did all the hard work. Thanks for sharing this tidbit of information.
I watched this video a few months after it came out iirc, it only took me 3 semesters of computer science college, calculus 1 and numeric calculus classes to understand it. Nice!
It's not out of your league man, the last bit is a little confusing, but you can understand with a bit of precalc. Now coming up with something like this....yeah, that's way out of 99.9% of our leagues lol
Daaaamn... And you look at what kind of coders call themselves "seniors" nowadays... This code is brilliant, fun and inspiring. Thanks a lot. To you, but especially to the so very talented dev who came up with this beautiful elegancy.
@@Kokurorokuko this is not about being genius. This is about being professional. So yes, when an incompetent person calls themselves a professional, I do make fun of them regardless of the field they're in.
Well, I wouldn't say it's "simple". It's short. It's like the first time you learn about recursion and binary trees and find out that you can do insane things with just three lines of code but the thought process behind it can be daunting for beginners. And then you wonder why it's so inefficient and then dive into dynamic programming. It's beautiful really.
I love how this video has brought several branches of engineering, where doing things _"just by chance, otherwise it won't work bro"_ is eerily common, and unbelievably, very functional.
As a non-mathematician and non-coder, I have watched this and referenced this dozens of times to attain the joy of comprehension. The task is complete, and I am smarter and I am happier. Thank you for this video and restoring my faith in humanity.
I enjoy greatly that the creator of this video just blew my mind with IEEE 754 structuring, number modification through bit manipulation, and just concludes the video once everything was said. This video was so cool!
The math and tricks are in fact very common and regularly used, long before Quake. See e.g. the book "C64 Intern" by Data Becker, 1984 (!), section 5.3.4, it shows how to compute the normal square root using the exact same type of methods, just as an arbitrary example to speed up the Commodore 64 computer's BASIC language which was rather slow. They halved the float's exponent which gives a very fast order-accurate approximation to the square root and then apply Newton iterations (four instead of one, as this algorithm needed to be precise to the last digit). Those old 8 bit computers did not have math coprocessors so you would have to think yourself about handling all the tiny details of the exponent and several mantisse components. The miracle in Quake was common practice twenty years earlier.
@@chonchjohnchyou are absolutely correct. I spent most of highschool decompiling games, making code improvements, and sending it back to the authors, I can assure you, common did not exist. sloppy and slow. C-64 had a really good user base of teck's that shared code, I never saw this implementation. I wish I had seen it back then.
@@MichaelRaschhuh, I am new to the world of "hacking" you can say, how did decompiling go? Did you modify the pure assembly? I have absolutely no idea about this stuff.
@@hodayfa000h back then 36-40 years ago, you could Peek and Poke a memory space ( that's how you made adjustments ). I had back then a $ 260 compiler. there were lot's of disassembler's on the scene back then. basically how I made some money was. Buy a game at the store. play it, document where it was slow. dump the code with care and print it out. then start inserting my code and re-compile it. before I knew it, I had 5% to 20% increases across the board on refreshes. then when I was done fixing all the problems, I would mail it to them. easy money. some presidents were not happy with my updates and I would get a C&D, but who cared, I was young and stupid with a boat load of cash I made and paid taxes on.
I love the fact that he felt the need to define a constant for "threehalfs", but just threw that 0x5F3759DF in there no problem LOL
well he used it twice...
until it got commented out
Well that was maybe when someone had to do refactoring thinking: "This 1.5 is something I can refactor to reduce the number of magic numbers by 50%". That other magic number I cannot refactor because I would have to name it magic_number because I don't know where it comes from. Of course turning 1.5 into a variable named threehalfs also does not improve the readability and introduces a spelling error but fine. The code metrics have improved as was the objective. Maybe I'm wrong.
@@richardbloemenkamp8532 I think the answer is HOW C compiles code. The compiler adjust what is and what is not a constant value based on the code (and the compiler of course), and most importantly, WHERE to keep that value. In certains cases, constant are stored elsewhere in memory and does exists (or is available) at all times (it must be accessed first). Then you have constants that are stored IN the method itself, sort of like an "inlining" in JIT compilers like Java or C#.
I think it is highly probable that the compiler the devs used THEN gave very slightly better results when it compiled the code when written this way, rather than otherwise. Of course, it would be the same algorithm, but the data access speed would be different (storing values in registers, in less CPU cycles), hence the choice to leave it like this.
When the goal is speed at any cost, those additional CPU cycle to gather values and store them in registers can add-up quite quickly.
People have said it here in different ways but to say it clearly, a constant is not stored in memory and is just a macro replaced with its value upon compilation.
@@DaMonster This is not true, since on most platforms you cannot write 1.5F as an immediate operand in assembly, so the compiler has to put 1.5F in memory somewhere. It then loads the constant into a float register at runtime.
10:23 "for no obvious reason at all, let's take the logarithm of that expression" you basically just summed up the Electrical Engineering experience
And Statistics. And pretty much any other field where approximating confusing maths becomes neccesary.
necessary* damnit I hate English sometimes
And information theory
@@martiensventer9191 *Math
A joke among mathematicians is "What sounds does a number theorist make when they drown? Log(log(log(log(log(log(log..."
I found out about that inverse square root in 2010 (about) and only now, a decade later, has it been explained well. Thank you!
It has been explained many times. Even on youtube, if you search for "0x5f3759df" or "fast inverse square root" you can find many videos, some dating back to 8 years ago... Or you can look at the wikipedia page for more informations.
@@fredg8328 I think this explanation is better than those explanations.
This video takes far too long and too many unnecessary steps :/
Oh, and one of them is wrong, see if you can spot it.
@@Kyrelel I disagree. Every step is necessary. Anything less won't give a sufficently complete understanding of the subject.
@@Kyrelel seems correct to me? Where do you spot a mistake?
Performing a division AND a inverse root using only multiplications is nothing short of real world black magic. I am sincerely impressed.
@@kalpeshpatil8912 If you go to assembly, multiplication is just… multiplication. CPUs have had multiplication opcodes for decades, even for floating point.
@@kalpeshpatil8912You seem confused
Newton method does this! The zeros of any function are just a bunch of sums and multiplications.
@@MrRenanwill not all functions but yeah some of them
@ sometimes multiplicaation is still written as add and bitshift, when that gets too hard, yes you can use imul
Oh man, I remember taking a CS class that taught floating point representation with the exponents and mantissas, and I thought to myself "ok this is cool but I'll be programming higher-level than this, so I probably won't use this knowledge in practice." Well, this is a great example of how you might use it!
Wtf cary kill hitler holy shit
Its cary kicks homes!
It's just like fact families in math. You think it's useless right up until you need to do something.
lol that's actually funny
Ok. I'm 5th
you just explained IEEE 754 better than any tutorial i could find and better than any university teacher i listened to. the animations helped a lot with the understanding process.
And in just five minutes...
Im glad someone understands it. I just did a motorboat and thought ''glad I didnt study maths''
@@traida111 The exponent is how big the number is, the mantissa is specifically what number that is. It's the equivalent of saying "37 billion" instead of "37,000,000,000"
My best understanding of it came from another video game video by Pannenkoek about Super Mario 64 and floats lol
I totally agree
It's even scarier than "only 1% error"; using the default magic number of 0x5F3759DF actually gives about 0.175% error maximally, dropping thereafter. It is now believed the original author of the function (not Carmack, who never claimed he was), found the constant through trial and error rather than derivation.
If you use Chris Lomont's updated magic number of 0x5F375A86 (which he derived), the error drops to a staggering 0.09% after just 1 iteration.
BTW, the original author is unknown, but speculated to be Canadian Velvel Kahan, Turing Award winner and mathematician, circa 1985.
it's not Gary T. as mentioned at 10:41?
@@IkethRacing no it's Gary v. He discovered it when his entire NFT stock portfolio plummeted.
@@jcsc2001 Hahahahaha
What is this methods name
I think we still call it John Carmack's Function since he was the first guy to talk about it
I thought it was implemented by John Carmac, but according to Wikipedia: "The algorithm was often misattributed to John Carmack, but in fact the code is based on an unpublished paper by William Kahan and K.C. Ng circulated in May 1986. The original constant was produced from a collaboration between Cleve Moler and Gregory Walsh, while they worked for Ardent Computing in the late 1980s." Thanks for educating and for your great work.
It's still quite a feat to acquire that knowledge back in the day.
@@RedSntDK for the original guys definitely but quake 3 came out when the internet had been in serious use for years so I suspect not too difficult to come upon.
Isn't that what implemented means?
@@patty109109 I assumed they ripped it out of some existing math library without really understanding what it was doing, hence the "wtf" comment. These tricks are rarely used now days because CPU/GPU has gotten so fast in doing divisions and square roots.
@@patty109109 Please define "serious use"… because to me (and many many people), serious use is what the Internet (the actual Internet, and not timbl's Word Wide Web) was originally made for.
FYI for anyone learning C/C++: the "i = *(long*)x" approach is not standards compliant. It obviously worked for Quake III, but for a modern compiler under high optimization, this could produce bugs. The reason is complicated (search "strict aliasing" if you want to know more), but the correct alternative would be "memcpy(&i, &x, 4)".
It's also a good idea to use "uint32_t" instead of "long" to ensure "i" has 4 bytes. Modern systems often have "long"s that are 8 bytes.
Also FYI for learners: there is no reason to write any code like this ever today. A lot has happened since the 90s and modern compilers are smarter than 99% of programmers will ever be.
@@Debrugger So how would you go about this today?
@@Debrugger and here you wrong, compilers bound by standard, so it can't do most of tricks used in the code, as each of trick which makes calculation faster also makes it less precise and there is no standard way to say for compiler how much accuracy it needs to preserve.
p.s.
gcc has option --ffast-math to makes some optimization which do lose precision (compared to standard), but again option itself is not part of c standard, nor it allows to say precise how much precision can be lowered. (i.e. it will never go to such low precision as used in this code, even with such option), and which part of code it should be applied to.
@@SuperMarkusparkus Literally just write 1/sqrt(x) and let the compiler optimise
@@SuperMarkusparkus while Thomas is wrong, answer to your question depends on which cpu you want to use for example for intel cpu there is instruction rsqrt14_ps obviously its still your problem to organize your data so you do 16 calculation of normales at once, otherwise no 16x profit.
"I don't know what else to say, that's just how C works"
An excellent explanation of all of C
It's one of the easier languages to understand the syntax from though, as there isn't a lot of syntax.
And what is done here is a workaround for something the language wasn't designed for, so it might seem somewhat cryptic, and I'm pretty sure it's undefined behavior (meaning not defined in the standard, meaning platform/compiler dependent, as float doesn't need to be in IEEE 754 + the platform might have different alignment requirement for float and unsigned int + unsigned int doesn't need to be 32bit (all not common things though, but exist)). Maybe a clearer way to achieve the same is using a union of unsigned int and float, though you'd have to copy the value (which the compiler might optimize away, but probably not when this was written)
In other words you shouldn't be doing something like this, unless you very well know what you are doing, and have testing around it to make sure it does what you expect for every target platform and compiler.
To me this is what makes C and other languages like it, powerful and fun :)
This is just how every language works, but in most of them you don't have access to pointers (address of the value), you can just access the value the pointers point to (value in that address).
@@frydac well, it's not really any different than expecting an INT to be 32bits, when C does NOT define INT to specifically be 32bits. But it's almost always a safe assumption (or was).
You should try modern C++. I think the original C K&R was less than 300 pages (if I remember correctly) that describes the entire C language. I just bought a book that tries to clarify just one feature of C++, move semantics, and it is almost 300 pages long. I weep for humanity.
@@vincei4252 C++ move semantics are overengineered in a stupid and useless way.
I'm reminded of the single greatest code comment I've ever read: "I do not understand the following line of code, but I've verified it is required for this function to work correctly." It was in a bit of Perl code, a language with several peculiar syntactic quirks.
Saying Perl has a few peculiar syntactic quirks is like saying a porcupine has a few sharp points.
Perl : a write-only language.
The best coding that I ever did in C and in which I got a big smile and patted myself on the back for the effort, was using an array of pointers to functions where the correct function would execute based on the array index. Man, was I ever impressed with the poetic beauty of such a simple and short series of characters that commanded so much!
I ran my code, it worked fine, but it was much slower than using a simple switch/case construct. LOTS slower. So, I went back to what I had before.
That was back when I was using a 16-bit MS-DOS C complier, so no doubt all the crazy addressing shenanigans going on behind the scenes really mucked things up. I wonder how well it would run today as a native 64-bit compile, but the working version that I still use is from (with one, semi-recent and tiny data update) twenty years ago and written as a 32-bit Console app. Still, pointers-to-functions are really cool to invoke.
Officially "Practical Extraction and Reporting Language". Unofficially "Pathologically Eclectic Rubbish Lister".
I once saw a version of Hangman encoded in five characters long Perl script. Won some award in crypto coding I think.
I don't think I understood this well enough to explain to my friends how cool it is... but I definitely understood enough to have my mind blown, and to tell them they need to see it
Funny seeing you here!
Same here! ;-)
You have just described my thoughts regarding most of the things I like lol, specially in the CS and programming field.
Who is going to qoute Einstien first
I understood it well enough to paint a picture of a dinosaur with my own feces
You know an algorithm is good when one of the steps includes "C doesn't think we should ever do this, so we have to trick it into letting us"
It's also one of the best way to explode something in your code (or better yet, corrupt your data in tiny, nearly invisible ways). Some people learned that the hard way.
pointers, references and type casting is the best part of c/c++
@@WXKFA ... or the worst.
@@Evan490BC they are the worst for the same poeple who are afraid of goto
@@geiger21 You basically have no idea what you're talking about, do you? No one is afraid of horse carts but we have cars in the 21st century.
the "what the fuck" comment was probably added by another engineer reading the code. The creator was a chad.
Either that, or the dev that wrote it got that line from an online forum and had no freaking idea why it was subtracting from 0x5f3759df.
As a dev I understood the rest of what this video covered by just reading through the code for a few seconds. The math behind using the constant 0x5f3759df still just made my brain freeze up. Either the dev had a degree in math (totally possible), or he was just accepting that the line worked.
it's another way of saying "dont ever touch this, just hope it never breaks"
@@normalmighty I think we all end up occasionally taking a solution from the Web without fully understanding how it works, but when I do that I include a link to the webpage in the comments for future reference.
@@normalmighty "Either that, or the dev that wrote it got that line from an online forum and had no freaking idea why it was subtracting from 0x5f3759df.
"
Yeah, an online forum before 1998... sure.
@@thewaver8 Well IRC and BBS existed back then, and there were newsgroups too I think, so maybe they got it from one of those places 🤷♀️
honestly one of the best made videos I've seen on UA-cam
Wszędzie bym się spodziewał.. ale nie tutaj
@@Seagateee Was gonna say the same thing. Even the animation style is similar
Frankly, i prefered this German dude, he just shows you what is going on from first principles, didnt try to 'blow your mind', and i learned something.
@@cannaroe1213 What's wrong with "blowing your mind"?
This one is 2nd best? ua-cam.com/video/4nShTeUEJIQ/v-deo.html
I remember when I started programming professionally in the 90's. Id's performance tuning was something that us programmers would nerd out about, and I'd often write my code to similar standards. Curious coworkers would sometimes encounter some of my "performance-tuned code" and be like "uhh, why did you do this like that?" I'd explain all about efficiencies and they'd just kind of roll their eyes and be like "okaaayy." We were, after all, just writing client-server apps for small businesses, for which performance tuning was totally pointless. But I wanted to be a game developer, damnit!
Did you ever actually get into game development?
@@yusuf_kizilkaya Nope. And now I'm old and burned-out.
@@LarsSveen never too late, and being a game developer now is easier than ever. Maybe you could still give it a shot
That sounds like my life being a graphics artist in the military simulation industry. The bar is set so low, even in 2022, that people we hire on that have been in the industry since the 2000s are either A, are asking why on earth we are wasting time experimenting with dynamic lighting and PBR materials, or B, blown away by how awesome our graphics are (which, considering we have hired GFX artists that think A, we do have a mixed bag of high quality art, and people who think blockouts are good enough). I'm in talks with a programmer friend about making an indie game just to take the edge off.
Lol this is me right now. I’m working on some iterative/list code that I’ve managed to get ~30% faster than the leading industry library. But even over 1M iterations it’s only a difference in milliseconds lol
Wait. This is his only video?
Holy shit, what a start
he is not fucking around
wow, didn't realize. 20k subs in one video, Holy shit.
probably hacked youtube and made himself trending LOL
reminds me of that channel that uploaded those gameboy cpu videos, went viral, then never uploaded again.
Guess he won the UA-cam lottery
The original programmers were just maths majors. Now we see why. The closer to the metal you work, the more helpful high level math can be.
Exactly
Especially when you design the metal itself. As a semiconductor engineer :P
@@revimfadli4666 for example for those who want ultimate cryptocurrency mining nothing beats custom made boards that implement the algorithm in hardware. The so called ASIC miner
@@Sorcerer86pt agreed, getting hardware-baked performance, without taking away gpus from people who need them & resigning them to that single use, is a win-win
Although I was talking about materials scientists/engineers who formulate the semiconductor material itself & its manufacturing:P
It's not just being close to the metal. Math helps a lot any time you have a very hard problem to solve -- for example, the folks at OpenAI for example are 90% math PhDs and they don't spend much time programming bare metal I would guess.
In the old days every problem was hard so math was valuable for everyone. Now there are lots of problems that programmers can solve that are not "hard" in a traditional sense; making things pretty for customers, implementing simple business models for yet-another-market, taking someone's prototype idea and making it into an app or a website, etc. But if you try to solve cutting-edge problems you will get much further if you understand mathematics deeply, especially linear algebra and multi-variable calculus which come up over and over if you know how to recognize them.
The video goes kind of light on the Newton iteration part so I wanted to clarify some things:
f(y) = 1/(y^2) - x
y is our estimated inverse square root, and we’re trying to get a better estimate. To do that we want to get f(y) close to zero, which happens when y is correct and x equals the number that we’re taking the inverse-square-root of, ie the input named ‘number’. The x from the above formula isn’t being updated, we know it exactly already.
The derivative of f(y) is -2/(y^3). This is because the x is constant, 1/y^2 is the same as y^-2, so multiply by -2 and subtract 1 from the exponent.
I was confused about why it looked like we were solving for a new value of x but we aren’t we’re getting a new value of y so the line is really ynew = y - f(y)/f’(y), which after we enter the function and its derivative is
ynew = y - (1/(y^2) -x)/(-2/(y^3))
= y + y/2 - x*(y^3)/2
= y * (3/2 - y * y * x/2)
…and then it matches the code
Thanks, I was confused by that part as well!
Yeah, I think he could have said that x and y have different roles in the graph and the code.
Incredible
Maybe obvious, but can you explain to me how you know f(y)? Everything else I follow, but as I see it f(x) = y and y ~ (1/x^2), so f(×) ~ 1/x^2. What am I missing?
@@Gnefitisis before i say anything, i would first like to clarify i am no CS student and I also do not 100% follow this, but i think i understand where this part comes from. i think the answer to your question comes from the sequence of steps revolving around why you want to use newtons method and the rearanging of the formula to aquire f(y) instead of f(x). Since you origionally have the function f(x) from the initial steps of this code, but you want to find a more accurate approximation of what "y" would be equal to after the "wtf" line, you need to rearange the formula to make it become a function of "y". This is because you are no longer using "x" in the function you are now using "y" to find "ynew". So technically they are both using the "y" variable, but after performing the newtons approximation it reassignes the calculated value to the variable "y" again, meaning that it was a "function of y to calculate a new y value". I'm not sure if I answered your question, but if you want me to try and explain anything better i can try. But from how i interpretted your question, it seemed like more of a misunderstanding of the use of the applications of the variables in the video, not that you don't understanding the actual pure math being used. I hope this helped, if not I can try again if you'd like lol ✌
Some of my computer science classmates wonder what good learning the IEEE 754 standard and Newton's approximation could possibly do, and the answer is that it helps us to understand a cool video about a quake III algorithm
This is how legends are born: *man appears out of nowhere, disintegrates the fuck out of ancient code and vanishes into silence*
He will come back based on his comment in his community tab:
*"I plan on covering algorithms in general. But if there's some obscure piece of code accompanying it, that's just more bonus points for the algorithm. - Nemean"*
Well no that made him become the CTO for Oculus
@@Korbert how do you know that ?
@@sbravoo This is not the only impressive thing John Carmack has done, he's a "famous coder" if you will so you can bet i took notice when he left ID Software and went to Oculus as CTO.
@@Korbert yeah i know carnamak i misunderstood i thought he was talking about Nemean cause he uploaded this video and then disappeared
I can't believe someone explained this piece of ancient code so eloquently. I remember reading about this routine years back and not having a single clue how this slice of magic worked. Thank you good sir and have a fantabulous day!
In the immortal words of my CS professor: "I'm sure that this approach is somewhere in between sloppy mathematics and black magic".
That's how all of my code ends up. I write on intuition alone, so when something eventually works even I don't know why.
@@uwufemboy5683 a slow subconscious warning giving me anxiety that this code won’t work and ends up working anyway
Far from sloppy. This is old Amiga Assembly tricks.
Hilarious 🤣
Why sloppy?
The breakdown of IEEE 754 is... mind-blowing. That's so supremely clever, both the standard and the explanation.
"But it does hint that there are three steps to this algorithm."
Step 1: *Evil Bit Hack*
Step 2: *What The Fuck*
Step 3: *Newton Iteration*
Step 4: Profit!
@@shurik121 Step 5: Open source it!
Eh, it hints a little.
“Evil Bit Hack” = Exploiting the data in a way you’re not supposed to.
“What The Fuck” = Black magic sorcery.
“Newton Iteration” = Applying a mathematical law.
@@shurik121 My man!
@@SomeFreakingCactus That's what I saw too! It does seem pretty straightforward as far as what it does, but tbh the actual process still hurts my brain to try to figure out
"You might already see where this is going". No, no I do not.
lol same feeling
The misunderstanding there comes from not knowing why a length of a vector is calculated in such a way. First, you should keep in mind that this vector exists in 3 dimensions, not 2. Second, the math - ua-cam.com/video/51vgIfdBlAk/v-deo.html&ab_channel=rootmath
No, I don't think I will.
> "You might already see where this is going"
"Yeah, same direction as the original vector, but at a different speed"
I have a master's degree in CS and I was not seeing where he was going.
I have long heard of this magic, but never understood it. After watching your presentation, I held manager level understanding just for a moment. It’s gone, but that felt good. Thanks for making this understandable, even if only for a moment while you held our hands.
Lol manager level understanding. Thanks, gonna start using that XD
AD
well funny thing is, they actually used engineering methods to solve their computer problem, this is the things that if i did use them, would not make me question myself why am i studying software engineering
@@Roald94 Well, software engineering today vs the one I studied 30 years ago are way different. It is also the reason why I quit developing and went to the infrastructure side of computing and can now yell at developers that they are not creative anymore and are not unable to optimized anything (my major was algorythmics) : they just copy code and use design patterns because they is no creativity anymore. Also Java...
@@ryzenforce Ah Knuth!
If you're wondering why μ ≈ 0.0430:
We know that log_2(x+1) ≈ x. If we want to see the relative error between them, we can use the function f(x)=log_2(1+x)-x. We want to find the maximum of this function, and we know that it only has one maximum.
Since the function is continuous, we can find its maximum by finding when its derivative equals zero. Using a bunch of rules, we have f'(x)=1/((1+x) ln(2)) - 1. Setting this derivative equal to zero, we see that the maximum of the function occurs at x=(1/ln(2))+1.
Substituting this number back into our first function, the maximum value is log_2(1/ln(2)) - (1/ln(2)) + 1 ≈ 0.860. To minimize the error between this maximum and our minimum, we want μ to be halfway between them. Taking the average of (0, 0.0860) yields our constant, 0.0430.
Thanks for the nice explanation, but at the second paragraph you have an error: the maximum of the function occurs at *x=(1/ln(2))-1*
So like, I'm curious: The way we find the average value of the difference function is through finding the integral and dividing by our integrating interval length (1 minus 0 in this case). Wouldn't it be better to subtract by this amount to reduce the error on average?
I realize that the 0.430 mu term is used to make sure the absolute value of the error is always less than or equal to 0.430, which means we can say a statement like "our error is always less than x%". But is there a good reason to prefer this over subtracting by the average value? Avg value seems a lot more relevant if you want error to be closest to 0.
I ain't reading all that man
@@ObtecularPkshort attention span?
@@horleckshelmos8472 The _maximum_ error is actually more important. If most cases give a close enough result, but there are outliers, your graphics will look weird when you run into those outliers.
System math libraries typically want to minimize the maximum error when calculating functions; that's where the term "minimax polynomial" comes from.
Greetings from France.
This video kicks a thousand arses.
This Quake III algorithm is a mix between mathematical analysis, computer science and wizardry.
Thank you.
John Carmack is my spirit animal, even just for releasing all of his games' source codes after a couple of years.
An absolute genius of a programmer, and a blessing to the gaming world since 1990.
Just to be clear: he isn't the inventor of the algorithm. It was in use at SGI. In fact, he didn't fully understand how it worked, hence the "WTF" comment. Nevertheless, he is a genius programmer.
@@lolilollolilol7773 Um... how does that relate to the original comment? The comment was written by someone else, not john carmack
I remember reading that John Carmack hired someone to optimize Quake 1 code back in the days. So, it doesn't mean that all of Quake source code was written by him.
@@jonnyj. How do you know ? Any source to back up your claim ? Let's assume the author of the comment wasn't Carmack. Surely he could talk to him since the comment was on the original source code. So it's rather unlikely that the "WTF ?" and the magic constant would have stayed if Carmack knew where they came from.
And even then, the algorithm was not invented by John Carmack, that is a well known fact. As soon as the Quake source code was made public, this algorithm was singled out as a source of interest and its origins were traced back to SGI.
Personally if I wrote code that devious I'd write What The Fuck next to it myself
🤓🤜🤛😎
You've explained this legendary piece of code and the concepts super well!
One small nitpick (from an HW engineer working on FPUs, not a programmer). The "that's just how C works" part, where pointer aliasing is used for type punning, is actually undefined behavior (in two ways):
1. Using two pointers to the same memory location independently is not guaranteed to work like that in the C standard, but compilers mostly are nice enough to allow it and it works in this case. Using `union` would be better, though I think it's also not 100% clean. The "proper" way of doing it would probably involve explicit `memcpy` calls which would be much less performant.
2. Interpreting IEEE 754 values as anything other than floats is undefined behavior, as there are no guarantees how the bit patterns actually look in the processor. The formats in question are the "interchange formats" for when you need to transfer float values as binary strings. Implementations are allowed to store the values however they like as long as the operations are numerically equivalent, but most CPUs just use the format you've shown, so it works in this case.
PS: Nowadays, processors have built-in instructions for inverse square root, so this fast inverse square root is much slower and less accurate than what a modern CPU can provide.
Haha yes, I really hope this video didn't make anyone actually use this code.
So if pointer aliasing would be generally undefined behavior, basically every HTTP, RTP, etc. parser would be illegal, since it takes a chunk of bytes dropping out of a socket and re-interprets it as e.g. an http-header struct. What you are referring to is breaking the 'strict aliasing rule', which does have some exceptions for legal and well-defined pointer aliasing.
In C you can use unions I think (but not in c++ apparently)
@@tomtravis858 :( just threw it into my own little tinkering library....
@@dennis8636 Actually, not *pointers* alias, but memory locations. You are always allowed to address any memory with byte (char) pointers, but memory containing a float may not be accessed by a int pointer (may not meaning it has undefined behavior). Modern gcc will actually optimize the memory access away... been there, got my whole function optimized away.
unions are the way to access bit patterns of floats.
"you don't need to be good at maths to do programming"
The Code:
That's true in web dev xD
@@Naburgondux Yeah luckily loll
@@Naburgondux I think it's perhaps true everywhere-- and this case illustrates it. You could say that id used a very tiny library, in principle . That's being a good programmer:don't reinvent wheels if you can use wheels made by geniuses. Carmack is a genius who invented many of his own wheela. As such he had no ego limits against learning from other geniuses.
you need to be good at maths to be godly at programming
I'd say both programming and math can be branches of logic. Specifically, the formalization of logic as well as the expansion/generalization of logic. They're two pathways to accomplish essentially the same goal. One doesn't necessarily follow from the other. But being good at one implies you're either good at, or it would not take long to become good at the other.
19:46 "Now we finally understand the fast inverse square root."
Hey, speak for yourself, buddy! XD
IDK why I watched this, I don't even code, I'm still confused as when I started and when I left... What The Fuck!
I love that I can understand 99% of the words but 20% of the meaning.
@@maxtadeu181 XD, nice one bud'
@@HueghMungus i work as a programmer, and i didnt understand also lol, too much mathematics
@@HueghMungus I kind of grasped it because it combines what I was taught during the time I spent chasing a degree in CS that I probably won't end up finishing due to other stuff and mental health issues. So yeah, approximately 2 years spent living and breathing CS. (At one point a professor told us we should be spending 8 hours a day, 5 days a week, working with C outside of school, with his class being one of 3 equally difficult classes out of the 6 classes in that semester alone. I wonder where the mental health issues came from.)
The explanaition among the smooth animations and the genius of Quake's III engineers makes this one of the best explainatory videos I ever seen in this platform.
The reason why "The Evil Bit Hack" is indeed Evil, is because according to the C standard, this is actually Undefined Behavior.
Absolutely true. And if the code were meant to be implemented on several platforms, it would fail horribly (e.g. most modern processors have 64 bit longs, but that compiler on that processor had 32). But when you have to eke out the last few cycles and you don't need portability, it can work wonderfully. I have a lot of code flying around in space that uses UB because we needed the memory space or cycles to succeed. Pragmatism can be a virtue.
@@michaelterrazas1325 Yep, I agree, there are even situations where you cannot get the performance you want if you were to follow the standard to the letter. And you shouldn't; all you *need* to do is solve your problem and that's it.
This case, however, you can solve with modern compilers without breaking the standard by just using memcpy. This wasn't the case back in the day of Q3.
Really? Which part is undefined? Is it the last part where you dereference the long pointer?
@@RushilKasetty 1) Casting a pointer of one type into another and then accessing the object stored there. 2) Looking at the bit patterns in a float. 3) Assuming a congruence between the size of a float and a long. 4) Performing integer arithmetic on a float object.
@@michaelterrazas1325 OK. But the algorithm can be made portable by using a union instead of type-punning, using uint32_t instead of long, and testing for IEEE 754, for which there is a macro in C99. Then you know that a float is an IEEE 754 single prec number (so 32 bits), and the union takes care of the rest.
One of the most inspirational techniques for me. When I learned about this, I began to look for ways of optimizing things that had been taken for granted, no matter for how long.
I'm sure this is well explained and a mindblowin concept. I'm unsure however why this popped up in my feed and further why I watched the whole thing without understanding even half of it. Now I know how my dog feels when it talk to him
This is why i refrain talking about programming to most of people i meet in my life. Don't wanna humiliate them with my dog-speech down to them :)
but you were a very good boy in doing so
@Sum Ting Wong yes i can. i can also fix your pc, reload paper in your printer, and write a smartphone app which will make you rich overnight :)
@@igorstasenko9183 Can you do all that by tomorrow and also teach me so I don't gave to hire you again? I have $5...
@@igorstasenko9183 Someone's got an EGO lol. Surround yourself with people at or above your level and you can talk about programming all day all night.
Quake developers: spend months coming up with an algorithm for fast inverse square root
All FPS gamers: **turns off all shading**
I guess you've forgotten to disable dynamic lighting too.
At least FPS gamers in 1998.
In 2021, we are all a bit spoiled with compute shaders and all...
@@davidwest6019 I get spoiled (a bit) with cloth, houses, electricity and computational algorithms every day. I rate it 9/10, but I'd still recommend it to anyone.
All graphics off playing competitive Medal of Honor (which ran on Q3E) was the way to go back in the day.
/picmip :)
I found a bug...."threehalfs" should actually be spelt "threehalves"
Commit accepted. You are now a contributor! Congratulations!
"halfs" is a nonstandard plural of "half", but that doesn't necessarily make it _incorrect_... After all, it's an acceptable Scrabble word 🤔
Half is the noun.
Halve is both the verb and the plural. (I only know as I looked it up last year).
Personally, I prefer the nouns for variable names and verbs for methods. Even though three halves is the correct English.
A bigger issue is that they couldn't be bothered to capitalise the leading character of the second word.🙄
@@garethevans9789 Agree, camel case
@@akhilesh6072 *CAML
2 years ago when I watched this, I had no idea what I was looking.
After majoring for comp sci for the next 2 years and took Calc 2, Data Structures, and Computer Organization and Architecture, I have come back to video and finally understood the video.
The WTF step made my day
The approximation of the log of a float to its own binary representation is amazing
Well, real numbers can't be represented in memory anyway since they have an infinite number of digits. So everything in the end is an approximation.
Usually after 8 digits it's "good enough". For me usually 2 digits is good, but I work with milisecond. So for me 3.27 miliseconds is a good approximation, having 3.2698 is just useless.
@@redguard128 Measures twice cuts once is one's process !
If it saves a milli-second, done a million times saves lots of computation time any time !
the IEEE float explanation is so damn good
This video is made so well, that I actually managed to understand it without a C programming experience. Far better than any other video on this topic.
I worked between 2016 and 2019 on the latest revision of the standard, ieee754-2019. You should probably note that the "evil bit hack" which I've used myself many times over the years, is no longer canonical C: You should instead use either a union, or memcpy() to move the bit pattern between the types. In reality, modern compilers will convert all three into the exact same code, i.e. either nothing (if integer and floating point use the same registers), or a move from one type to the other.
If the result is still same, I prefer to use the "evil" hack. As it is more clear what is being done and also it has some old-school vibe to it.
The evil bit hack has much more style
For anyone wanting to see the final derivation at 19:10:
f(y) = 1/y^2 - x
f'(y) = -2/y^3
y_new = y - f(y)/f'(y)
= y + (1/y^2 - x)*y^3/2
= 1.5*y - 0.5*xy^3
= y*(1.5 - 0.5*xy^2)
Been a while since I have done derivatives and partial derivatives, but for this case, dx/dy is ignored or removed? Unless I’m misinterpreting the math here, which is possible since these calculations were never easy for me.
@@Ruben_M. x is not a function of y here, so it just drops out of the equation there
@@Bhuyakasha Ah that makes sense I suppose, thanks for the clarification!
Excuse me, maybe I’m lost; But what the fuck is this?
I’m joking
Code review: Missing about a dozen comments.
Please re-write, not readable code. And use math.h, and please, add faster processors to get the same speed. PR sent back. :)
@@farhanyousaf5616 lmao it hurts
@@farhanyousaf5616 hahaha exactly my thoughts, they might've not gotten away with it nowadays.
I feel like code reviews sometimes miss the point
@@metalpachuramon What do you mean "miss the point" ? First point for every code is being readable and maintainable. We are sacrificing this only if the performance is important. And as I wrote a lot of perf-critical code, I can say that on review colleagues of mine always asked to improve readability without penalizing a performance.
And if in your company reviews are equal to "check for code style" than you have a problems with expertise and/or review culture. It should be the experience sharing first of the place.
@@mapron1 this is why no one likes you
This video is actually incredible. I've seen this piece of code before and this explanation made it super clear and understandable. Amazing!
Your 20 minute video explained things better than most of my college courses. The explanation and the way you showed the bit values was really nice. C is the language that keeps on giving. I’ve dabbled in C for many years and I still feel like I haven’t mastered the basics.
The almighty youtube recommendation algorithm brought me to this video. Nemean : I cannot overstate how good this video is. Please make more of this!
I love examples like this. They aren't "Wow, what a mathematical genius" but instead "The person who thought this up clearly has a genius level mastery of their craft". I know about Newtonian iteration, logs, the physics related to using normed vectors etc but I would never. EVER. think of this.
A friend of mine does coding where to get extreme speeds out of the computer they'll do compiler tricks specific to the architecture of the CPU so they can do similar stuff to this. I don't understand what he does.
oh boi, do they mess with assembly?
Most of the computer scientists and game developers are trained in algorithm making and mathematical methods, so it is no surprise they made this
They know how to get into the “Flow”
A warning to fellow programmers: DO NOT use this anymore! This is a super outdated method that made sense in 90's. CPU native sqrt since pentium 4 is FASTER.
@@andykhaigoumate3171 just make a benchmark yourself same as i did, just run sqrt and this hackish method in a bunch of loops.
Pentium guys just stole this
But it would be fun to try and see if it works if one ever would be to make some kind of game.
Well, Quake works on both modern and old CPUs. So this code makes it backwards compatible, so what's a disadvantage?
@@Mr.Leeroy The disadvantage is that unless you are targeting 25 year old cpus, the only thing this'll do is make your game slower.
being someone who never passed any of my calculus classes, this was actually surprisingly easy to follow.
If you couldn't pass calculus, you weren't being taught correctly. Calculus is incredibly intuitive once you're exposed to its principles.
calc mad simple but this shit is intense
Calculus is honestly a glorified symbol pushing course imo.
I enjoys math and and it's been years since I've opened a calculus book, a professor like this would have made my advancement much more quicker. super easy to understand him and get the full concept.
wait what there was no calc im here tho they said newton iteratiom but didnt go into it
I've seen a paper previously with this covered in depth. Well done relaying the content in your UA-cam video, very easy to grasp numerical methods intro :)
Just curious, do you have a link to this paper?
He came dangerously close to the most hated words for eng students. "It can be easily shown that...." and he ACTUALLY used the corollary of the above hated words "It's left as an exercise for the reader...." LOL. Thanks for the video.
"The algorithm was originally attributed to John Carmack, but an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and 3dfx Interactive, with Gary Tarolli's implementation for the SGI Indigo as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods." /Wikipedia
"You simply _______ then obviously you ______, it's super easy!"
Nearly everything in this video is so far over my head it's insane and fantastic, maybe one day I'll understand what you're talking about.
What an insane piece of code
This just goes to show how much coding is about thought as it is about actual code
Amazing explanation, this is almost 3Blue1Brown level quality, you've earned yourself a subscriber
same here
He's probably using the animation code made by 3b1b
great voiceover too :)
Sadly, golden nuggets like these are few and far between. Even in good codebases, there is a lot of poorly structured code that tends to build up when bugfix after bugfix after bugfix gets applied without looking at the bigger picture again. And this is not even the fault of the developer, because you also don't want to redo large portions of code every time a bug fix or small feature is inconvenient.
Planning and thought are definitely important, but from my experience: Most of the time you'll spend coding is being reactive to the code that already exists. You try to apply minimal changes to it while still making it do what you want to do. Everyone likes small commits!
I'd actually consider this 3Blue1Brown quality, this is insane!
Is this algorithm faster than the normal square root algorithm if you inverse it? I think I'll actually test that.
This guy explained the standard ieee 754 better than my professor
Sure. Professor just did not understand it himself.
Even a random indian girl on yt, giving example of converting decimal to IEEE 754 explain it better than my professor.
I am an PLC engineer myself but man, these IT guys always seemed too smart for me. God. That guy knows coding, math AND physics.
So like a robotics guy?
@@revimfadli4666 general industrial process automation. That can include robotics, but rarely. I only look after 1 robot.
@Qimodis the rules of physics had to be applied in the game as well for realistic lighting, so yes, this is really physics. Anytime you are dealing with animation and lighting you have to give objects in the program properties and rules that resemble real life physics behavior, in that way only it will look realistic.
@@NomadUrpagi Remember that it was not a single person behind this engine and there was some level of separation, who deals with physics, who is modeling, level designing... So we a probably forgetting how important teamwork and knowledge sharing between those teammates was. IMO, it is possible that people who coded this were not that level of Gods as we now think, but their teamwork was so synergetic, that they were able to come up with such brilliances.
@@revimfadli4666 if he wanted to say robotics he would've said that.
As a teenager I spent countless hours playing Quake III and I am just amazed about Carmacks trickery. I would love to see a performance comparison between his code and the usual non-optimized approach.
The fact that no one knows exactly who wrote the code is my favorite part of the whole situation.
Where is this said?
It was John Carmack who wrote the code.
@@kurzackd It predates Q3A, the Q3A source code is just where it became mainstream knowledge. Apparently one of the earliest known usages was in the SGI Indy but it predates that as well.
Almost certainly Michael Abrash. This is not Carmack-esque code.
@@macaw2000 ...source?
@@kurzackd Dave Plummer basically told us in his last video. But you can also just see it in the code (Abrash's books and timing at iD also give it away)
Nemean: "Take a moment to look at it again"
Also Nemean after 3 seconds: "Well, the longer you look at it the less it makes sense"
As a millennial programmer, I thank the gods every day that these OGs paved the way with stuff like this. I probably couldn't even recreate pong from scratch...
When you used packaged math fnctions from popular game engines, they are not optimized like this. they give you accurate data instead of fast data.
This is why people who don't know low level programming turn out slow games because they don't even realize why it's slow.
@@khaoscero That explains a lot of Unity trash that looks terrible and makes your graphics card hot enough to fry eggs.
@@roberte2945 Unity is wasted potential in my eyes. They should have just stuck with C++ like unreal engine. Using C# as a scripting language is great, but NOT for fast 3D graphics.
no shame in not being able to recreate original Pong as a programmer since it wasn't software in the first place but discrete logic chips. first 4bit CPU had only just been invented (building computer in a game machine out of discrete chips would've been cost-prohibitive before that)
If you can’t do pong, I question the fact that you are actually a programmer. It’s very important to understand the basics. Try and do it! You will learn a tone !
In case anyone is curious, John said in a recent podcast that he was not the one who coded this and is surprised he gets credited for it. It might have come about due to his prolific reputation as one of the best programmers ever.
ua-cam.com/video/I845O57ZSy4/v-deo.html
Correct. It's the reason for the comment 'what the fuck?' in the code. Id Software didn't fully understand the code at the time when they used it either.
Started watching this video expecting to not understand anything but ended up getting most of it. Great job, really well explained.
Everytime you said " now you see what's happening" you had me looking over my shoulder.
Still watched to the end 👍
lmao
11:15 This is the difference between secant and tangent linearisation, both very useful and picking the right one can definitely make your life easier.
Excellent video, thank you for the added info instead of a 2min video on just the core of the algorithm, it helped my understanding.
Maybe without the error term the approximation would still be good but may have needed another iteration of newton's method
Man I’ve had this video saved in my watch later for like a year now and chose the perfect time to watch it. I’m taking computer architecture right now and my teacher is not the greatest at explaining things. Your explanation of IEEE 754 just saved me a ton of trouble 😂 thanks
I first learned about this algorithm years ago, and had a vague idea of where the magic number came from by reading from it, but never fully understood. I think your video explains it very clearly. Thanks!
"Premature optimisation is the fast inverse square root of all evil" - Donald Knuth, probably
@Michael: the most important word in the quote is "premature". I don't think that Knuth was against optimization in general. And I agree with you in the sense that one shouldn't cluelessly waste cycles (or memory, or both).
But it doesn't make sense to fiddle around with bits at every place in the code from the beginning, unless you find some really hot spots that lead to a huge performance gain when optimized later. Many other early "optimizations" might turn out as nearly useless.
@@minastaros The first problem is that shitty programmers solely focus on "premature" and use it as an _excuse_ to _never_ optimize. Their "solution" is to just throw more money (hardware) at the problem. **FACEPALM**
The second problem is that the **entire sentence** _"Premature optimization is the root of all evil_" itself **is complete bullshit.** Optimization is NOT the problem, let alone premature optimization. Shitty class designs such as mis-using OOP with deep hierarches, ignoring performance, ignoring memory management, shitty function and variable names, never looking at the assembly output of the compiler, etc. are MUCH more important then optimizing the "wrong" thing.
Third, there is no such thing as "premature optimization". You are gathering a data point about potential performance! However it needs to be done in the CONTEXT of the larger problem. Are you being _efficient_ at WHAT you are optimizing? i.e. The most bang for the buck? Software engineering is all about managing decreasing returns. Optimization doesn't magically make that go away.
First, you CALCULATE the maximum theoretical speed "best case". You will NEVER be faster then this! How do you KNOW your implementation is slow? By comparing to the best case!
Next, you write a REFERENCE implementation because it doesn't matter how fast you are if you return the wrong results.
Then once you have a working algorithm you can then focus on the what _exactly_ is happening to the data. You then write a DATA ORIENTED DESIGN version because it is simple to write, and fast.
So yeah, the entire phrase is entirely misleading, inaccurate, and generally treated as dogma by programmers who are generally clueless.
@@MichaelPohoreski I think the point of that somewhat glib saying is that a lot of inexperienced programmers focus way too much on optimizing their code before they have a clear idea of what level of optimization is appropriate for the task at hand. In my experience there very much *is* such a thing as a premature optimization. It happens when you spend days or weeks making something perform like, 10% faster or smaller when that 10% doesn’t add up to more than what that engineering time was worth to you in the first place. Sometimes, like with Quake 3, coming up with something like this is necessary due to hardware limitations. Very often, it’s really not, and the not-clever solution is fine. I also find it really annoying when people hold up this sort of programming math trick as the line that separates good programmers from “shitty programmers”. It’s kind of elitist.
Also, I’ve read Carmack’s writings on programming, and I can’t recommend him enough. He’s a genius, but I also find him to be a very practically minded engineer as well, and his approach to design and software architecture is really insightful.
@@nicholaswood3250 Optimization starts _before_ you write code -- not after.
If you aren't thinking of cache lines and HOW the data is being accessed you are a shit programmer.
There is **a reason** Fred Brooks stated:
_Show me your flowcharts and conceal your tables and I shall continue to be mystified. Show me your tables and I won't usually need your flowcharts; they'll be obvious._
The modern vernacular is:
_Show me your code and I'll have to guess at your data structures. Show me your data and I won't have to see your code; I'll already know it._
Bad programmers think about code / algorithms first.
Good programmers think about data first.
Optimization **begins** at day zero. _Not after the fact._
This is why that ignorant statement by Knuth is complete bullshit. The people who don't know how to optimize constantly make excuses for why they remain stupid.
13:05 this is the point where the magic happens. granting the program the lenience to be less accurate rather than forcing it to calculate unnecessarily exact values is what reduces the time and energy demand on the computer
Why don't they show us this kind of stuff in my college courses? Part of what we learned in CS was the flag-exponent-mantissa calculation and we had to write and interpret numbers using it, and although I thought it was very fascinating I only thought of it as a fun fact and nothing more. Never knew it - along with derivations - could be used like this, this is simply awesome!
Because there is virtually zero practical application of this technique. It is so niche it is ridiculous. If anything it would be irresponsible of a professor to waste time on such a thing.
@@jyutzler I'm not saying they need to show this one specifically - rather, I'm arguing that real-world examples of its usage would make the topic far more interesting and make otherwise seemingly pointless material purposeful.
@@MisterShnig That's just it, there is hardly any practical real-world application.
Computer graphics books are full of weird hacks.. for example, the algorithm for efficiently drawing a straight line in the legendary Foley, Van Dam, et al book. CG is all about hacks like this. The Q3 developers just continued this tradition.
You should know how IEEE floating point _works_ because it’s fundamental to just about any system that’s not ancient or using a decimal library.
(For example, understanding why when creating a list of numbers in NumPy, incremented by 0.1, you get …interesting… results)
But learning bithacks is pretty useless and not generalizable except as an interesting study.
UA-cam recommended this to me, seemingly at random. I also note that it's the only video you have made at this time. I'm absolutely impressed with the content and encourage you to make more videos like these.
It's unreal how clearly you break down a complex topic. Bravo.
underrated comment
I’m sure it’s not legal to mention Unreal in a Quake video.
@@KusacUK underrated response
Lmao this guy is so confident in me 😭
He believes in your ability to understand. Do you?
@Victor Joseph Yeah me too. Snek language is fun :) Variable type? Whats that?
@@botlarry4437 wait you guys only did python never touched C or Assembly?
@@ko-Daegu I've played around with it for Arduino projects but in no way efficient at it. C is not really suited for most of the work I do.
@@botlarry4437
ooh that's fine as far as you get a taste for low-level prog you are good
better than never touching something and staying only in py/js world
The right way to watch it is to watch it two times in a row. Absolutely amazing!!
13:45 - That's exactly the reason why I refuse to convert to Judaism.
Underrated comment 😂
how did you even think of that lol
i don't get it.. can someone explain
@@nopegok318 "we don't want this conversion to mess with our bits" Converting to Judaism will mess with your "bits". Meaning circumcision of the penis. Clear enough? :)
You magnificent bastard
This code is so amazing. It's a superb balance between math and cpu instructions speed. The lines have more inside knowledge than meet the eye. Not sure what compiler/linker id used, but when transcoded to assembly, this may well be the best optimized code speed/precision wise. I watched a documentary some time ago that said that Mr. Carmack programmed endless hours on Wolfenstein 3D, in order to be able to achieve better frame rates with the petty hardware available back then - when many many devs were merely dealing with sprites in their products.
While I was going to school for my B.S. in EE, one of my favorite classes was on assembly and machine code. Learning how to write assembly code requires an intimate understanding of the internal hardware that comprises a computer. Plus the resultant programs were very efficient, small size and usually faster than the same program written in a higher level language like basic or c\c++. Plus it allowed me to write code for my TI-89
@@seanriopel3132 As a mere hobbyist in programming, I learned assembly (for some early processors) because I thought it was interesting, but have never bothered to learn C. I thought it was amusing that they had to use a trick to get C to use bitwise operations on the float -- with no data typing in machine language, it's a problem you don't have in the first place. As convenient as C is, it also created this extra problem.
In any case, I'd really like to see what the assembly for this ended up looking like.
@@seanriopel3132 We only had assembly as an option back then. However you got into the habit of writing efficient and well documented code (because you would have to maintain it and no matter how efficient you made it you needed memory joggers to explain it to your future self!!).
@@CruisingDog_inCA damn straight. Even when you were an entire assembly program, and tried doing back and fixing some thing, it was like looking at hiroglyphics
@@seanriopel3132 The good old days !!
This is great; I have way more experience with math than with coding, so it was very interesting seeing how people use math tricks to get around coding quirks! Like, I personally would never need to care about the speed of dividing something, but small bits of time add up really quickly in code! Super neat.
This is what true content is explaining the reasons behind most of my unanswered questions
Me learning the IEEE float format at school: There is no point to know what the bits are about when coding.
After this video:
oh......
Well at least there is a point in knowing it to understand why float is imprecise.
I've just recently had a bug in business application that works with money (calculates exchange rates), and I spotted the bug instantly in a wall of numbers (1.2/1.5 was equal to 0.80001)
Hey! I know what your name means, *_Forest Grove_*
@@Ubeogesh funny. One of the first things I learned is that you can't use floats when dealing with money.
There is no point, actually. it's obsolete for two decades, dude.
And remembering things just for use it once and never again...
In programming you don't do it, you study such things on demamd., ok?
@@Acid31337 How do you know then what to study when you need it and don't know that it is a solution to your problem because you don't know how it works? Chicken and egg
This solution is like doing chemistry in math.
Im bad at both.
Waaaaait....
Is solution a chem pun? O.o
Or doing machine learning in computational physics. Wait...
Well, even there were Taylor expansions. Physics xD
I literally wasn't expecting things to start looking like differential equations.
This is, as Charles Stross would put it, computational demonology.
well then that fits the games running on the idTech engines (quake, doom, etc)
Nice profile picture
@@hydraslair4723 Thanks! It's from a game called Primordia.
Additional commentary (as it wasn't super-touched on, but I feel is worth mentioning): the need for this algorithm was because at the time Quake III came out, very few 3D graphics cards had dedicated hardware for calculating lighting and geometry transformations (T&L, it's called), requiring the CPU to do the heavy lifting of doing it advance before pushing the results to the GPU for drawing. Therefore, CPU efficiency was paramount, and integer math is typically much faster than floating-point math in most applications-in most CPUs, you have to send the floating-point values to a separate computing unit (the FPU) and wait to get the results back. And even in the integer math world, multiplication and division by factors other than 2 could be costly operations where every cycle counted.
A year before this game launched, AMD came out with the "3DNow!" instruction set extensions, and Intel followed a year later with "SSE," both of which could do the inverse square root in one instruction on multiple data points simultaneously. However, since most people buying the game wouldn't have these new processors immediately to hand, it made the software algorithm a necessity all the same. The first consumer PC graphics card to have hardware T&L support was the Nvidia GeForce 256 in 2001.
I remember when I started working with Calculus and I only had an Apple IIgs to work with. The ToolBox had a really nifty built-in set of functions called SANE (Standard Apple Numeric Engine). I had to buy a book, which I still own, that describes SANE and the IEEE 754 standard in detail. I'm so glad someone else did all the hard work. Thanks for sharing this tidbit of information.
Having seen this code several times, understanding it just makes it all the more impressive. Great explanation!
1.8 million views for a hardcore binary math algorithm video. The author is very talented.
I watched this video a few months after it came out iirc, it only took me 3 semesters of computer science college, calculus 1 and numeric calculus classes to understand it. Nice!
Boa sorte na sua formação fren :3
@@DsiakMondala obrigado, amigo, boa sorte com tudo também!
"// What the fuck?" Killed me
I too like to cuss in code comments
That's what I said at 06:09. Why does every math youtuber display at least ONE wrong example???
@@achtsekundenfurz7876 because you learn from mistakes. Seeing mistakes is a part of it.
As a programmer this is just insane, out of my league and some next level algorithm.
It's not out of your league man, the last bit is a little confusing, but you can understand with a bit of precalc. Now coming up with something like this....yeah, that's way out of 99.9% of our leagues lol
@@sergeantsausage2609 Ah found "that" guy.
@@mugiwara_no_ace you're both that guy. ya redditor, you.
I watched it all, and enjoyed it all. I put a lot of hours into playing Doom and Quake back in the day. Clever stuff.
Daaaamn... And you look at what kind of coders call themselves "seniors" nowadays... This code is brilliant, fun and inspiring. Thanks a lot. To you, but especially to the so very talented dev who came up with this beautiful elegancy.
Do you make fun of medical professionals simply because they aren't geniuses too?
@@Kokurorokuko this is not about being genius. This is about being professional. So yes, when an incompetent person calls themselves a professional, I do make fun of them regardless of the field they're in.
This is by far the best explanation of that piece of code I've ever seen or heard, thank you!
I love how absolutely simple this is, too often algorithms for seemingly simple tasks become unwieldy but this is so compact
Well, I wouldn't say it's "simple". It's short. It's like the first time you learn about recursion and binary trees and find out that you can do insane things with just three lines of code but the thought process behind it can be daunting for beginners. And then you wonder why it's so inefficient and then dive into dynamic programming. It's beautiful really.
Thats optimized coding for ya
I love how this video has brought several branches of engineering, where doing things _"just by chance, otherwise it won't work bro"_ is eerily common, and unbelievably, very functional.
Her: wanna come over for Netflix&chill?
Me: Can't do, watching some legacy source code hacks.
@@wahabfiles6260 ?
You deserve my reply.
You just made me cry by calling C legacy
I had to watch it on toilet in short break. I wish to learn something instead of watching again tv show with gf. No free time.
that's how a nerds life is working xD
I watched this video 8 times in the past year, just love the fact i cannot wrap this around my head completely
As a non-mathematician and non-coder, I have watched this and referenced this dozens of times to attain the joy of comprehension. The task is complete, and I am smarter and I am happier.
Thank you for this video and restoring my faith in humanity.
You are more disciplined than I, for sure.
I enjoy greatly that the creator of this video just blew my mind with IEEE 754 structuring, number modification through bit manipulation, and just concludes the video once everything was said.
This video was so cool!
The math and tricks are in fact very common and regularly used, long before Quake. See e.g. the book "C64 Intern" by Data Becker, 1984 (!), section 5.3.4, it shows how to compute the normal square root using the exact same type of methods, just as an arbitrary example to speed up the Commodore 64 computer's BASIC language which was rather slow. They halved the float's exponent which gives a very fast order-accurate approximation to the square root and then apply Newton iterations (four instead of one, as this algorithm needed to be precise to the last digit). Those old 8 bit computers did not have math coprocessors so you would have to think yourself about handling all the tiny details of the exponent and several mantisse components. The miracle in Quake was common practice twenty years earlier.
not all tricks, but some of them yes obviously, since this algorithm is a tweaked implementation of the Newton method to calculate a root
@@chonchjohnchyou are absolutely correct. I spent most of highschool decompiling games, making code improvements, and sending it back to the authors, I can assure you, common did not exist. sloppy and slow. C-64 had a really good user base of teck's that shared code, I never saw this implementation. I wish I had seen it back then.
@@MichaelRaschhuh, I am new to the world of "hacking" you can say, how did decompiling go? Did you modify the pure assembly?
I have absolutely no idea about this stuff.
@@hodayfa000h back then 36-40 years ago, you could Peek and Poke a memory space ( that's how you made adjustments ). I had back then a $ 260 compiler. there were lot's of disassembler's on the scene back then. basically how I made some money was. Buy a game at the store. play it, document where it was slow. dump the code with care and print it out. then start inserting my code and re-compile it. before I knew it, I had 5% to 20% increases across the board on refreshes. then when I was done fixing all the problems, I would mail it to them. easy money. some presidents were not happy with my updates and I would get a C&D, but who cared, I was young and stupid with a boat load of cash I made and paid taxes on.