Not only is having twice as many mantissa bits pretty nice, it is so useful that the last time the ieee754 FP standard was updated (2016-2019), we added two new instructions: AugmentedAddition() and AugmentedMultiplication(), both of them take the usual two inputs, but they then deliver two results, the high and the low part, so that together the operation is always exact. This can typically be implemented in FPUs with no more than one extra cycle of latency, and they are also very good building blocks for even more precision, i.e. using arrays of double to implement arbitrary precision.
Commenting to bookmark this. I enjoy reading about IEEE754 standard for purposes of deterministic lockstep networked applications and somehow I missed this. Will check out later.
Note that `AugmentedMultiplication` isn't really necessary because you can compute it with a single fused multiply add instruction (fma). Once you have computed c=a*b, you can compute c_lo = fma(a, -b, c) which will preduce exactly correct result for the multiplication low bits.
@@oscarsmith3942 You are of course correct. It is in fact the existence of fused FMA which makes the Dekker add/sub algorithm really useful, even though it typically takes at least three times as long as a single FADD: FMUL + FMA is "only" twice as slow as a single FMUL, while both Augmented operations can realistically run in zero or one cycle more than the plain ops. In the end this makes extended-precision operations at least twice as fast as what is possible today.
One place these "old tricks" are very useful is when programming GPUs. Some GPUs are significantly faster with floats than with doubles (Nvidia) and some don't support doubles at all (Intel Xe) - identifying where your precision bottlenecks are and applying these methods gives you much better memory use and performance while keeping precision... As long as you're not using a device compiler that doesn't let you turn off unsafe floating point optimizations that throw them away (looking at you, Intel).
For most Nvidia GPUs you are right that the peak double precision performance (i.e. FLOP/s) is worse by a factor of 64. Just two refinements of that statement: 1. High-end server accelerators (P100, V100, A30, A100, H100) have good (i.e. only a factor 2 worse than single precision) double precision performance. 2. That performance difference only has practical implications for algorithms that are not memory-bound (low arithmetic intensity) anyway. E.g. for properly implemented dense matrix multiplication (DGEMM) you should be able to see it, but not for vector addition (DAXPY).
I studied chemical engineering at the University of British Columbia in the late 70s and early 80s. It was a demanding bachelor's program with many labs, including an 8-hour chem lab, and you had to do a bachelor's thesis (very unusual in itself). In addition, you had to do a summer project (before your final year). In the project I chose, I took over the bachelor's thesis work of two previous students. Their efforts ...had not worked, and Dr. Janis Lielmezs wondered if I could get it to work. Long story short: the program needed double double precision instead of just double precision. Dr. Lielmezs was most pleased with the result. And passed away in 2021.
Never quite got my head around with Dekker's Double Double computation. But you made it significantly easy particularly carry Over or under, Thank YOU Very Very Much, my Aussie Friends, I always look out your video.
Curiously enough, a while ago, when i was wondering around the c++ std lib, i found some functions using this method, even got to understand some of it, but now i know how ingenious it really is, great video man!
Can't BELIEVE that I missed a new Creel vid for 2 weeks! For shame! Thanks for the cool info as always, matey! P.S. In the UK, we honour Dekker's algos all over the country with our DoubleDekker buses. ^_^
Interesting! I've implemented 256 bit fixed point before (64.196), and figuring out on efficient-ish assembly implementation for multiplication and reciprocal was tricky! I figured it was possible to chain multiple floating point values together as well, and it's nice to have it spelled out. Thanks! :)
Neat! I had always wondered how this was done, but never took the time to understand it. A lot of GPU based computational physics codes use this method given a limited number of FP64 units (depending on the problem scale, the exponent in FP32 is sufficient, but we really want higher decimal precision). That does lead to an obvious question though: does it need to be a "round to nearest" operation? Or will this work with any rounding mode assuming it's consistent, such as round to 0 (truncation)? Anyway, looking forward to the multiplication and division!
the first equation reminds me of the equations used to encode and decode video or audio signals that have multiple channels... It always looked weird to me, but it is making more sense now. thankyou for this video!
Canadian programmers of almost every skill level know all about double-double. Many employ it literally daily today. And for us, the OG is A Cruller. Tim Horton's got most of the credit for popularizing this, but it was around for yonks before that.
even without the extra precision, being able to check how far away from the precise answer this easily is great. it can be a sanity check for code modifications or let you know when long term drift starts being huge in long term accumulated values
I learned about this when I had way too much error and wanted to reduce it. There was a lot of summation, so Kahar sums were very helpful (they basically do the same, but compensator is carried over instead of being stored as part of value)
I wasn't sure what the result would be initially, but I quickly remembered Posit documentation on a use of the proposed "quire" to round the value, subtract that value from the quire, and then round again and repeat for however much precision the quire can store. Here we don't have a higher precision register to use, but the core principle seems to be the same: representing a higher precision value as the sum of multiple values with differing levels of precision.
Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…
Hahaha ... enjoyed the piano ... checked out your music channel too ... good stuff. Great video on this ... back in the days of early microcontrollers and the like, worked out or used all sorts of methods to computer various precisions. Thanks much for the video. Cheers from sunny Florida, USA.
I haven't worked with this, but I have used Kahan summation, which is based on the same core idea - you're adding together a bunch of regular doubles, and your output is a regular double, but you keep double-double precision on the running sum (and just forget about the lower part once you're done, that's your un-representable roundoff error). In this special case, you can skip about half of the floating point operations compared to doing full Dekker addition each time.
Very interesting this video came out right after I was playing with adding or subtracting and dividing numbers by 2. I was trying to figure out a naive way to efficiently store the result. The reason for this is odd numbers have a .5 result so this needs an extra bit of data to store but modern computers use 8bit as 1 byte so that means I'd be wasting 7bits at best and 8bits at worst. I ended up just using 1byte to store the .5 for the next 8bytes and that's the best way I've found so far but I was also considering investigating variable length encoding.
Very interesting,and well presented. I know of Dekker from the mutex work but had not seen this before. Fun fact: up here in Canada, a double double usually refers to a coffe with double sugar and double cream, usually from a coffe-and-donuts store like Tim Hortons.
It would have been interesting to see what happens if the decimal point locations in the two operands were farther apart. E.g., adding 470 + 0.0022. I'm sure this method works, but it would be interesting to see how the smaller number completely disappears in the result.
Sure, but I also think that he's provided enough info for us to experiment with excel or pen and paper. Just gave it a go! Your example actually yields C = 470, c = 0.0022 -> no loss of precision: the lower hal just isn't adjacent to the upper. However, if you use 4709 and 0.002216, you get 4700 and 9.0 so all the lower digits disappear.
The bucket of fish is back! I don't use this technique.. I use two 64-bit integers. One of them covers all integral values between -2^63 and +2^63-1. The second integer covers all fractional values from 1/1e17 to 1-(1/1e17). Much easier and faster!
Early in the 1970's one of the first computer languages IBM BASIC required that you set your varibles to Binary, Single Presicision, or Double Prescision. Binary only used one bit, Single used eight bits, and Double used sixteen bits. This was when our school's computer only used a eight bit chip and only had 4K of memory. .
I've done this for years - just never knew what it was called - especially in multiply-accumulate expressions like polynomial evaluation or FIR filters.
This brings me back to 2006 when I ported dhbailey's doubledouble in Java and found and fixed a lot of bugs, like corner cases in exp() and found some very specific "Devil's values" as I'd like to call them, that really stress the algorithm.
After watching the video, the algorithm I have (and verified on paper) is not using IF statements, and is more complicated. It has many operations as in your p++ code, if the if and else statements are both counted, and then some. public DoubleDouble add(DoubleDouble y) { double a, b, c, d, e, f; e = this.hi + y.hi; d = this.hi - e; a = this.lo + y.lo; f = this.lo - a; d = ((this.hi - (d + e)) + (d + y.hi)) + a; b = e + d; c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b)); a = b + c; return new DoubleDouble(a, c + (b - a)); }
When you rounded when calculating A+B do you mean actual round or ceil? E.g if the result was 154 would you go to 160 or 150? Thanks for your videos, you explain complicated stuff in really understandable way
I think it was more of a ceil or truncation - not sure how it'd deal with negative numbers, but your example would be 150 (and 4 remainder in the other half of the computation) I think
No but it does have something to do with playing piano, because... like, if you... well you see, the... it's like when you... the tone of the string contains harmonics of various factors of the fundamental frequency, which is analogous to the variable containing subvariables of various factors of precision of the number. Boom. Nailed it.
6:43 so, its like a tuple right? in the same way a complex number, and also a rational number is. its nice to think about how to do operations on 2 of these dd's. that the result of op on error has to be added to the main part, then leftover propagated back. feels like how cascading in electronic full adders work.
Question. At one point we made sure to have the bigger number at first for substracrion of R. But your c++ code did not show any parentheses to make sure the order ist carried out that way? Because if I am not mistaken either clang or gcc is doing its operations from right to left.... Or won't this matter?
@Creel Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?
A practical problem that I have is in using a GPU. It has double precision numbers - but in 3D graphics - we have to use the available hardware interpolators that can only cope with single precision. I'm trying to get my head around whether a similar Dekker trick would allow me to use hardware interpolators to interpolate double precision using single precision hardware.
1:48 I'm convinced I'm watching a legitimate madman, and this is him showing his true colors. If you haven't found it already, you've got to find yourself the "real instructions ||| mental disorders" meme image for x86. I think you've personified it 😂
Hmmm. Many years ago I wrote a floating point emulator for the 8087 family of Intel co-processors. Since it was a full blown but granular co-processor emulator, I had to implement the chip family's 80-bit registers. So it was just as accurate as common ISO 64-bit doubles. Since then I've also had to code numerous 'fixed point' math systems as well to process hard real-time data for ECU systems. If total flexibility was the goal, then I'd simply extend my co-processor library out to arbitrarily many bits in the registers increasing both the absolute magnitude and the bits of precision whether that'd be 128bits (double-double) 160 bits (double-80), 240 bits (triple-80), 256 bits (quad-double)or more, perhaps 512 bits (64 byte or octo-double) the code is almost unchanging, and with early out detectors for rational data the speed penalty for such super precision is quite minimal. What's even better, assuming the C compiler still supports the '/Emulator' switch, the original (simple) compiled code would run on any computer without additional modification, or could make use of the co-processor for faster first pass estimations. Anyway, thanks for the trip down memory lane -- no pun intended
Surely there are op codes to handle this quickly. Just like how if you wanted multiplication (or floating point) for the 6502, it could be done, but slowly. So CPU designers added op codes to later chips to greatly speed these operations up.
I'll watch the vid. It's interesting to compare with Kohan-Babushka-Neumaier and Shevchuk algorithms. P.S. Watched already. Nice. You actually use KBN algo for adding higher and lower parts of double double.
Does this addition algorithm always produce the most accurate double double result? One edge case that wasn't shown here is when the overflow in r happens it will lose information about the least significant digit of the sum due to the overlap between r and R. This will not be recovered by computation of c and we will be left with c which will definitely have a zero in the lowest digit, losing one bit of precision.
Does that second number have a name? If it were integer division, it would be called the "remainder". This value in floating point rounding plays a similar role. I could see it being called a "remainder" too, as maths don't shy away from reusing terms as long as it's clear from context what the meaning is.
Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?
There is a better formula by Knuth where we do not need to order the values by magnitude: R = a + b c = R - a r = (a - (R - c)) + (b - c) now you can add stuff like 1 + 1e100
in my experience, Dekker's formula is faster on modern CPUs. I'd guess it's because branch prediction is good enough for this case so it's faster than adding extra floating point operations.
@@michaeldeakin9492 I doubt that, but maybe I will check when I have time. I don't think branch prediction actually works well in this case, since the relative size of the values is unknown and basically impossible to predict. It all comes down to what the compiler actually does. The intel compiler for instance uses logical masks to avoid branching (fp-model=precise needs to be used, otherwise the error value is optimized to zero). For simd branching is out of the question anyway.
@@INT41O I have actually tried it somewhat recently; for my use case (exact predicate evaluation) it was faster. I'd link my project, but youtube doesn't seem to let any post through pointing to it at all. And while branches won't work with simd, blending will work. I don't know if that will be faster than Knuth's formula. BTW, Chandler Carruth's talk isn't quite the same but seems relevant here: CppCon 2017: Chandler Carruth “Going Nowhere Faster” Skip to the cmov section for the relevant part.
@@INT41O Okay, I reran some scalar benchmarks I had with 3 different methods, and implemented vector versions: scalar Dekker summation, scalar branchless Dekker summation via array indexing, scalar Knuth summation, vector (branchless) Dekker summation, and vector Knuth summation. Scalar performance ranking from best to very much the worst (as in absolutely awful) was as follows: Dekker summation, Knuth summation, branchless Dekker summation Vector performance was similar, Knuth summation was much better than branchless Dekker summation. I'm unsure why at this point
Very nice video. It was a bit frustrating that you didn’t include the information that X has to be the ‘larger’ number, or more specifically the number with digits in the same places as R.
Would double double work in a case like 5700.018 + 1800.053 ? Because then it would actually have really weird, specific positions where it is much more than double the precision of a normal double.
I wonder how double dekker doubles would compare to a single floating point number with double the bits? Are there any advantages perhaps? At the very least the method allows double the precision on hardware not built for it, and the two groups of bits don't have to be consecutive
Wait, if _r_ overflows, don't we lose precision already? How are we going to get it back if we don't split _r_ itself into an upper and a lower part? I'm sure I missed something but I can't see it right now.
The idea is that if r ended up being 15.2 in the example, we’d need to round it to 15 losing the 0.3, but the 1 in the tens place can also be stored in R by adding it, and then finding that we’d be left with 5. Applying the final clean up,you’d recover the .3 and end up with R=170 and r=5.3
15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.
I think this could be done recursively to get some 'arbitrary-precision' floats. (Obviously memory will limit how many bits of precision is possible) Though I can see this having a large impact on performance
Any language with generics makes this pretty easy, if you can overload your maths operators, you have define Compensated with the algorithms in the video, and then you can use Compensated for quad precision etc. the calculation does get slow very… fast.
Not only is having twice as many mantissa bits pretty nice, it is so useful that the last time the ieee754 FP standard was updated (2016-2019), we added two new instructions:
AugmentedAddition() and AugmentedMultiplication(), both of them take the usual two inputs, but they then deliver two results, the high and the low part, so that together the operation is always exact. This can typically be implemented in FPUs with no more than one extra cycle of latency, and they are also very good building blocks for even more precision, i.e. using arrays of double to implement arbitrary precision.
Commenting to bookmark this. I enjoy reading about IEEE754 standard for purposes of deterministic lockstep networked applications and somehow I missed this. Will check out later.
Note that `AugmentedMultiplication` isn't really necessary because you can compute it with a single fused multiply add instruction (fma). Once you have computed c=a*b, you can compute c_lo = fma(a, -b, c) which will preduce exactly correct result for the multiplication low bits.
@@oscarsmith3942 You are of course correct. It is in fact the existence of fused FMA which makes the Dekker add/sub algorithm really useful, even though it typically takes at least three times as long as a single FADD: FMUL + FMA is "only" twice as slow as a single FMUL, while both Augmented operations can realistically run in zero or one cycle more than the plain ops.
In the end this makes extended-precision operations at least twice as fast as what is possible today.
One place these "old tricks" are very useful is when programming GPUs. Some GPUs are significantly faster with floats than with doubles (Nvidia) and some don't support doubles at all (Intel Xe) - identifying where your precision bottlenecks are and applying these methods gives you much better memory use and performance while keeping precision... As long as you're not using a device compiler that doesn't let you turn off unsafe floating point optimizations that throw them away (looking at you, Intel).
For most Nvidia GPUs you are right that the peak double precision performance (i.e. FLOP/s) is worse by a factor of 64. Just two refinements of that statement: 1. High-end server accelerators (P100, V100, A30, A100, H100) have good (i.e. only a factor 2 worse than single precision) double precision performance. 2. That performance difference only has practical implications for algorithms that are not memory-bound (low arithmetic intensity) anyway. E.g. for properly implemented dense matrix multiplication (DGEMM) you should be able to see it, but not for vector addition (DAXPY).
It's also useful for microcontroller programming. Cortex m4 has floating point 32 bit instructions (float), but no 64 bit (double)
I studied chemical engineering at the University of British Columbia in the late 70s and early 80s. It was a demanding bachelor's program with many labs, including an 8-hour chem lab, and you had to do a bachelor's thesis (very unusual in itself). In addition, you had to do a summer project (before your final year). In the project I chose, I took over the bachelor's thesis work of two previous students. Their efforts ...had not worked, and Dr. Janis Lielmezs wondered if I could get it to work. Long story short: the program needed double double precision instead of just double precision. Dr. Lielmezs was most pleased with the result. And passed away in 2021.
When you think he couldn't become any cooler and he suddenly plays the piano as an intro. Legend!
The legend returns. He comes baring his usual gifts of entertaining madness sprinkled with useful knowledge. And we are grateful.
Incredible! With twice the memory, we can store twice the information!
LOL, that was my reaction too
Love the skills on the piano my dude. Also, thanks for explaining the topic so well.
With those skills I hope he can get the piano tuned.
3:56 *The death of a 0.3 floating point is a tragedy, the death of millions people is a statistic*
cit
I've used compensated arithmetic on a number of occasions. It's very handy on embedded devices that hardware float support but don't support doubles.
Never quite got my head around with Dekker's Double Double computation. But you made it significantly easy particularly carry Over or under, Thank YOU Very Very Much, my Aussie Friends, I always look out your video.
Curiously enough, a while ago, when i was wondering around the c++ std lib, i found some functions using this method, even got to understand some of it, but now i know how ingenious it really is, great video man!
This is pretty cool. I can see how it extends to quad double but with significantly more bookkeeping.
Yep, it just keeps going! Cheers for watching mate :)
Can't BELIEVE that I missed a new Creel vid for 2 weeks!
For shame! Thanks for the cool info as always, matey!
P.S. In the UK, we honour Dekker's algos all over the country with our DoubleDekker buses. ^_^
Interesting! I've implemented 256 bit fixed point before (64.196), and figuring out on efficient-ish assembly implementation for multiplication and reciprocal was tricky! I figured it was possible to chain multiple floating point values together as well, and it's nice to have it spelled out. Thanks! :)
Ty Creel.... love how precise and in depth your videos are. Piano skills are killer as well
That intro is brilliant, thanks for brightening my day, now on to the actual content.
Man I wish your upload schedule was more frequent 😅 all your content is so interesting and insightful.
I first encountered the concept of double precision when reading about programming the DEC PDP8E. That was nearly 50 years ago. 😊
Neat! I had always wondered how this was done, but never took the time to understand it. A lot of GPU based computational physics codes use this method given a limited number of FP64 units (depending on the problem scale, the exponent in FP32 is sufficient, but we really want higher decimal precision).
That does lead to an obvious question though: does it need to be a "round to nearest" operation? Or will this work with any rounding mode assuming it's consistent, such as round to 0 (truncation)?
Anyway, looking forward to the multiplication and division!
the first equation reminds me of the equations used to encode and decode video or audio signals that have multiple channels... It always looked weird to me, but it is making more sense now. thankyou for this video!
babe wakeup, Creel vid dropped
This was thoroughly entertaining! Your delivery is clear and super fun. Love your humor. Nerd out
I didn't watch the video yet, but that is the best intro ever
Canadian programmers of almost every skill level know all about double-double. Many employ it literally daily today. And for us, the OG is A Cruller. Tim Horton's got most of the credit for popularizing this, but it was around for yonks before that.
Now I need to order a Tim Horton's "Quad precision float". I wonder if I'll end up with a milkshake.
Your videos are absolutely fantastic!
even without the extra precision, being able to check how far away from the precise answer this easily is great. it can be a sanity check for code modifications or let you know when long term drift starts being huge in long term accumulated values
This was a fascinating topic you brought up! Really appreciate all the concepts you find worthy enough to discuss!!
Nice to see you back!
I learned about this when I had way too much error and wanted to reduce it. There was a lot of summation, so Kahar sums were very helpful (they basically do the same, but compensator is carried over instead of being stored as part of value)
I wasn't sure what the result would be initially, but I quickly remembered Posit documentation on a use of the proposed "quire" to round the value, subtract that value from the quire, and then round again and repeat for however much precision the quire can store. Here we don't have a higher precision register to use, but the core principle seems to be the same: representing a higher precision value as the sum of multiple values with differing levels of precision.
i bumped into this method when i got different results double additions by the cpu vs using nvidia cuda. now i know the history of this. cool
Welcome back man! You are the best!
Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…
it has been a while have not seen your video ! I am delighted.
Hahaha ... enjoyed the piano ... checked out your music channel too ... good stuff. Great video on this ... back in the days of early microcontrollers and the like, worked out or used all sorts of methods to computer various precisions. Thanks much for the video. Cheers from sunny Florida, USA.
I haven't worked with this, but I have used Kahan summation, which is based on the same core idea - you're adding together a bunch of regular doubles, and your output is a regular double, but you keep double-double precision on the running sum (and just forget about the lower part once you're done, that's your un-representable roundoff error). In this special case, you can skip about half of the floating point operations compared to doing full Dekker addition each time.
Just what I needed, your videos are pure gold
The font on his slides is terrible. But this video is very helpful! One of the best videos I've watched on floats so far
Never heard of this technique, thanks!
The idea of changing the order of calculations to reduce the error works for integer and floating point arithmetic too.
Very interesting this video came out right after I was playing with adding or subtracting and dividing numbers by 2. I was trying to figure out a naive way to efficiently store the result. The reason for this is odd numbers have a .5 result so this needs an extra bit of data to store but modern computers use 8bit as 1 byte so that means I'd be wasting 7bits at best and 8bits at worst. I ended up just using 1byte to store the .5 for the next 8bytes and that's the best way I've found so far but I was also considering investigating variable length encoding.
This structure reminds me a bit of the "lifting scheme" of reversible wavelet transforms. Noyce !
I’d like to propose doubly-imprecise floating point. Use one floating point number to hold the mantissa and one to hold the exponent
Nice video 🙂
Woulda been nice to have a quick run through with overflow.
Very interesting,and well presented. I know of Dekker from the mutex work but had not seen this before. Fun fact: up here in Canada, a double double usually refers to a coffe with double sugar and double cream, usually from a coffe-and-donuts store like Tim Hortons.
You spel't "Dunkin Donuts" wrong. ;^)
❤ please upload more often
It would have been interesting to see what happens if the decimal point locations in the two operands were farther apart. E.g., adding 470 + 0.0022. I'm sure this method works, but it would be interesting to see how the smaller number completely disappears in the result.
Sure, but I also think that he's provided enough info for us to experiment with excel or pen and paper.
Just gave it a go! Your example actually yields C = 470, c = 0.0022 -> no loss of precision: the lower hal just isn't adjacent to the upper.
However, if you use 4709 and 0.002216, you get 4700 and 9.0 so all the lower digits disappear.
Yessss ur back what took so long
Nothing in particular mate, cheers for watching :)
My goodness! Finally some good f***ing content
The bucket of fish is back! I don't use this technique.. I use two 64-bit integers. One of them covers all integral values between -2^63 and +2^63-1. The second integer covers all fractional values from 1/1e17 to 1-(1/1e17). Much easier and faster!
This is a doubly good video
now i need this extended to infinite (arbitrary) precision with some clever loops and arrays
Early in the 1970's one of the first computer languages IBM BASIC required that you set your varibles to Binary, Single Presicision, or Double Prescision. Binary only used one bit, Single used eight bits, and Double used sixteen bits. This was when our school's computer only used a eight bit chip and only had 4K of memory.
.
Fascinating !
Double Dekker arithmetic.
So famous they named a type of British bus after it.
I really really liked this video
I've done this for years - just never knew what it was called - especially in multiply-accumulate expressions like polynomial evaluation or FIR filters.
Gotta get eyes in the sky so that we can high in the sky, get a birds eye view.
Hi, great video as always. You present very well!
Can you do one on QUATERNIONS please, in your inimitable style!!!
5:26 that ligature with fl at the base line. while youtube making the fl ligature at their heads. woah.
I play the piano too. I love being explained things that have nothing to do with the piano.
This brings me back to 2006 when I ported dhbailey's doubledouble in Java and found and fixed a lot of bugs, like corner cases in exp() and found some very specific "Devil's values" as I'd like to call them, that really stress the algorithm.
After watching the video, the algorithm I have (and verified on paper) is not using IF statements, and is more complicated. It has many operations as in your p++ code, if the if and else statements are both counted, and then some.
public DoubleDouble add(DoubleDouble y) {
double a, b, c, d, e, f;
e = this.hi + y.hi;
d = this.hi - e;
a = this.lo + y.lo;
f = this.lo - a;
d = ((this.hi - (d + e)) + (d + y.hi)) + a;
b = e + d;
c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b));
a = b + c;
return new DoubleDouble(a, c + (b - a));
}
mad piano playing my friend
Why did they not call it "double Dekker" arithmetic?!
Nice :)
Came to say the same thing!
You should upload videos on solving crackme(s). That would be awesome!
When you rounded when calculating A+B do you mean actual round or ceil?
E.g if the result was 154 would you go to 160 or 150?
Thanks for your videos, you explain complicated stuff in really understandable way
I think it was more of a ceil or truncation - not sure how it'd deal with negative numbers, but your example would be 150 (and 4 remainder in the other half of the computation) I think
No but it does have something to do with playing piano, because... like, if you... well you see, the... it's like when you... the tone of the string contains harmonics of various factors of the fundamental frequency, which is analogous to the variable containing subvariables of various factors of precision of the number. Boom. Nailed it.
6:43 so, its like a tuple right? in the same way a complex number, and also a rational number is. its nice to think about how to do operations on 2 of these dd's. that the result of op on error has to be added to the main part, then leftover propagated back.
feels like how cascading in electronic full adders work.
Question. At one point we made sure to have the bigger number at first for substracrion of R. But your c++ code did not show any parentheses to make sure the order ist carried out that way? Because if I am not mistaken either clang or gcc is doing its operations from right to left.... Or won't this matter?
Now im thinking how do I go about outputting this nicely formatted?
7:43 I was thinking Jemaine Clement from Flight of the Conchords
@Creel
Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?
A practical problem that I have is in using a GPU. It has double precision numbers - but in 3D graphics - we have to use the available hardware interpolators that can only cope with single precision. I'm trying to get my head around whether a similar Dekker trick would allow me to use hardware interpolators to interpolate double precision using single precision hardware.
1:48 I'm convinced I'm watching a legitimate madman, and this is him showing his true colors. If you haven't found it already, you've got to find yourself the "real instructions ||| mental disorders" meme image for x86. I think you've personified it 😂
Hmmm. Many years ago I wrote a floating point emulator for the 8087 family of Intel co-processors. Since it was a full blown but granular co-processor emulator, I had to implement the chip family's 80-bit registers. So it was just as accurate as common ISO 64-bit doubles. Since then I've also had to code numerous 'fixed point' math systems as well to process hard real-time data for ECU systems.
If total flexibility was the goal, then I'd simply extend my co-processor library out to arbitrarily many bits in the registers increasing both the absolute magnitude and the bits of precision whether that'd be 128bits (double-double) 160 bits (double-80), 240 bits (triple-80), 256 bits (quad-double)or more, perhaps 512 bits (64 byte or octo-double) the code is almost unchanging, and with early out detectors for rational data the speed penalty for such super precision is quite minimal.
What's even better, assuming the C compiler still supports the '/Emulator' switch, the original (simple) compiled code would run on any computer without additional modification, or could make use of the co-processor for faster first pass estimations.
Anyway, thanks for the trip down memory lane -- no pun intended
In Canada, a double-double is a coffee with 2 cream and 2 sugar.
Surely there are op codes to handle this quickly. Just like how if you wanted multiplication (or floating point) for the 6502, it could be done, but slowly. So CPU designers added op codes to later chips to greatly speed these operations up.
Your videos are goated
great piano.
What a super interesting video, thanks Creel. I wonder if some game engines include this as an option in order to increase e.g. 3D precision?
I'll watch the vid. It's interesting to compare with Kohan-Babushka-Neumaier and Shevchuk algorithms.
P.S. Watched already. Nice. You actually use KBN algo for adding higher and lower parts of double double.
Soo... All aboard the double Dekker bus? 🚌😂
Does this addition algorithm always produce the most accurate double double result? One edge case that wasn't shown here is when the overflow in r happens it will lose information about the least significant digit of the sum due to the overlap between r and R. This will not be recovered by computation of c and we will be left with c which will definitely have a zero in the lowest digit, losing one bit of precision.
Does that second number have a name? If it were integer division, it would be called the "remainder". This value in floating point rounding plays a similar role. I could see it being called a "remainder" too, as maths don't shy away from reusing terms as long as it's clear from context what the meaning is.
Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?
A “double” precision solution? Devised by a Mr. “Dekker”? If God exists, he must absolutely love puns.
This should be called Double Dekker arithmetic.
There has to be a display/print/string conversion algorithm too, right?
Is this a way to better the result when subtracting two numbers that are very close?
There is a better formula by Knuth where we do not need to order the values by magnitude:
R = a + b
c = R - a
r = (a - (R - c)) + (b - c)
now you can add stuff like 1 + 1e100
in my experience, Dekker's formula is faster on modern CPUs. I'd guess it's because branch prediction is good enough for this case so it's faster than adding extra floating point operations.
@@michaeldeakin9492 I doubt that, but maybe I will check when I have time. I don't think branch prediction actually works well in this case, since the relative size of the values is unknown and basically impossible to predict.
It all comes down to what the compiler actually does. The intel compiler for instance uses logical masks to avoid branching (fp-model=precise needs to be used, otherwise the error value is optimized to zero).
For simd branching is out of the question anyway.
@@INT41O I have actually tried it somewhat recently; for my use case (exact predicate evaluation) it was faster. I'd link my project, but youtube doesn't seem to let any post through pointing to it at all.
And while branches won't work with simd, blending will work. I don't know if that will be faster than Knuth's formula.
BTW, Chandler Carruth's talk isn't quite the same but seems relevant here: CppCon 2017: Chandler Carruth “Going Nowhere Faster”
Skip to the cmov section for the relevant part.
@@INT41O Okay, I reran some scalar benchmarks I had with 3 different methods, and implemented vector versions: scalar Dekker summation, scalar branchless Dekker summation via array indexing, scalar Knuth summation, vector (branchless) Dekker summation, and vector Knuth summation.
Scalar performance ranking from best to very much the worst (as in absolutely awful) was as follows: Dekker summation, Knuth summation, branchless Dekker summation
Vector performance was similar, Knuth summation was much better than branchless Dekker summation. I'm unsure why at this point
Very nice video. It was a bit frustrating that you didn’t include the information that X has to be the ‘larger’ number, or more specifically the number with digits in the same places as R.
SORRY! You explained that in minute 14. I just started thinking about it on my own b4 I got there. Yea me and you!
Would double double work in a case like 5700.018 + 1800.053 ?
Because then it would actually have really weird, specific positions where it is much more than double the precision of a normal double.
I wonder how double dekker doubles would compare to a single floating point number with double the bits? Are there any advantages perhaps? At the very least the method allows double the precision on hardware not built for it, and the two groups of bits don't have to be consecutive
I do wonder how it compares performance wise with emulated floating point using integer arithmetic to get the same precision.....
Great video, thanks! 🙂 Just one tiny bit pick: why is your metronome obviously 17 times as expensive as your camera? 😎
How does this compare to using 128-bit floating point numbers instead?
Wait, if _r_ overflows, don't we lose precision already? How are we going to get it back if we don't split _r_ itself into an upper and a lower part? I'm sure I missed something but I can't see it right now.
The idea is that if r ended up being 15.2 in the example, we’d need to round it to 15 losing the 0.3, but the 1 in the tens place can also be stored in R by adding it, and then finding that we’d be left with 5. Applying the final clean up,you’d recover the .3 and end up with R=170 and r=5.3
Nice trick, trading speed and space for precision. Weird choice of example values that result in no carry happening.
15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.
Would truncation work instead of rounding?
I think this could be done recursively to get some 'arbitrary-precision' floats. (Obviously memory will limit how many bits of precision is possible)
Though I can see this having a large impact on performance
It certainly can! Tends to be too much computation after about 4 doubles.
Any language with generics makes this pretty easy, if you can overload your maths operators, you have define Compensated with the algorithms in the video, and then you can use Compensated for quad precision etc. the calculation does get slow very… fast.