Double it Like Dekker: A Remarkable Technique to Double Floating Point Precision (Part 1)

Creel

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 лис 2024

КОМЕНТАРІ • 190

@TerjeMathisen Рік тому ⁺¹⁴⁷
Not only is having twice as many mantissa bits pretty nice, it is so useful that the last time the ieee754 FP standard was updated (2016-2019), we added two new instructions:
AugmentedAddition() and AugmentedMultiplication(), both of them take the usual two inputs, but they then deliver two results, the high and the low part, so that together the operation is always exact. This can typically be implemented in FPUs with no more than one extra cycle of latency, and they are also very good building blocks for even more precision, i.e. using arrays of double to implement arbitrary precision.
@marcomoreno6748 Рік тому ⁺⁷
Commenting to bookmark this. I enjoy reading about IEEE754 standard for purposes of deterministic lockstep networked applications and somehow I missed this. Will check out later.
@oscarsmith3942 Рік тому ⁺⁵
Note that `AugmentedMultiplication` isn't really necessary because you can compute it with a single fused multiply add instruction (fma). Once you have computed c=a*b, you can compute c_lo = fma(a, -b, c) which will preduce exactly correct result for the multiplication low bits.
@TerjeMathisen Рік тому ⁺¹¹
@@oscarsmith3942 You are of course correct. It is in fact the existence of fused FMA which makes the Dekker add/sub algorithm really useful, even though it typically takes at least three times as long as a single FADD: FMUL + FMA is "only" twice as slow as a single FMUL, while both Augmented operations can realistically run in zero or one cycle more than the plain ops.
In the end this makes extended-precision operations at least twice as fast as what is possible today.
@friendlyoctopus9391 Рік тому ⁺⁹⁰
One place these "old tricks" are very useful is when programming GPUs. Some GPUs are significantly faster with floats than with doubles (Nvidia) and some don't support doubles at all (Intel Xe) - identifying where your precision bottlenecks are and applying these methods gives you much better memory use and performance while keeping precision... As long as you're not using a device compiler that doesn't let you turn off unsafe floating point optimizations that throw them away (looking at you, Intel).
@Spielix Рік тому ⁺¹⁴
For most Nvidia GPUs you are right that the peak double precision performance (i.e. FLOP/s) is worse by a factor of 64. Just two refinements of that statement: 1. High-end server accelerators (P100, V100, A30, A100, H100) have good (i.e. only a factor 2 worse than single precision) double precision performance. 2. That performance difference only has practical implications for algorithms that are not memory-bound (low arithmetic intensity) anyway. E.g. for properly implemented dense matrix multiplication (DGEMM) you should be able to see it, but not for vector addition (DAXPY).
@lpi3 Рік тому ⁺¹⁰
It's also useful for microcontroller programming. Cortex m4 has floating point 32 bit instructions (float), but no 64 bit (double)
@FloydMaxwell Рік тому ⁺²¹
I studied chemical engineering at the University of British Columbia in the late 70s and early 80s. It was a demanding bachelor's program with many labs, including an 8-hour chem lab, and you had to do a bachelor's thesis (very unusual in itself). In addition, you had to do a summer project (before your final year). In the project I chose, I took over the bachelor's thesis work of two previous students. Their efforts ...had not worked, and Dr. Janis Lielmezs wondered if I could get it to work. Long story short: the program needed double double precision instead of just double precision. Dr. Lielmezs was most pleased with the result. And passed away in 2021.
@Florian-sh9nf Рік тому ⁺²⁵
When you think he couldn't become any cooler and he suddenly plays the piano as an intro. Legend!
@andersjjensen Рік тому ⁺³
The legend returns. He comes baring his usual gifts of entertaining madness sprinkled with useful knowledge. And we are grateful.
@warmCabin Рік тому ⁺³
Incredible! With twice the memory, we can store twice the information!
@MrRobertX70 Рік тому
LOL, that was my reaction too
@Intermernet Рік тому ⁺⁶³
Love the skills on the piano my dude. Also, thanks for explaining the topic so well.
@FadkinsDiet Рік тому ⁺²
With those skills I hope he can get the piano tuned.
@lorenzodiambra5210 Рік тому ⁺¹
3:56 *The death of a 0.3 floating point is a tragedy, the death of millions people is a statistic*
cit
@jacob_90s Рік тому ⁺²²
I've used compensated arithmetic on a number of occasions. It's very handy on embedded devices that hardware float support but don't support doubles.
@imrank340 Рік тому ⁺³
Never quite got my head around with Dekker's Double Double computation. But you made it significantly easy particularly carry Over or under, Thank YOU Very Very Much, my Aussie Friends, I always look out your video.
@dainxor Рік тому ⁺⁴
Curiously enough, a while ago, when i was wondering around the c++ std lib, i found some functions using this method, even got to understand some of it, but now i know how ingenious it really is, great video man!
@johnteran8889 Рік тому ⁺¹⁴
This is pretty cool. I can see how it extends to quad double but with significantly more bookkeeping.
@WhatsACreel Рік тому ⁺³
Yep, it just keeps going! Cheers for watching mate :)
@oresteszoupanos Рік тому ⁺¹
Can't BELIEVE that I missed a new Creel vid for 2 weeks!
For shame! Thanks for the cool info as always, matey!
P.S. In the UK, we honour Dekker's algos all over the country with our DoubleDekker buses. ^_^
@slembcke Рік тому ⁺⁴
Interesting! I've implemented 256 bit fixed point before (64.196), and figuring out on efficient-ish assembly implementation for multiplication and reciprocal was tricky! I figured it was possible to chain multiple floating point values together as well, and it's nice to have it spelled out. Thanks! :)
@slow_learner Рік тому ⁺²
Ty Creel.... love how precise and in depth your videos are. Piano skills are killer as well
@MartinPHellwig Рік тому
That intro is brilliant, thanks for brightening my day, now on to the actual content.
@Jhat Рік тому ⁺⁴
Man I wish your upload schedule was more frequent 😅 all your content is so interesting and insightful.
@rjones6219 Рік тому
I first encountered the concept of double precision when reading about programming the DEC PDP8E. That was nearly 50 years ago. 😊
@hjups Рік тому ⁺²⁴
Neat! I had always wondered how this was done, but never took the time to understand it. A lot of GPU based computational physics codes use this method given a limited number of FP64 units (depending on the problem scale, the exponent in FP32 is sufficient, but we really want higher decimal precision).
That does lead to an obvious question though: does it need to be a "round to nearest" operation? Or will this work with any rounding mode assuming it's consistent, such as round to 0 (truncation)?
Anyway, looking forward to the multiplication and division!
@ktownjunkie Рік тому
the first equation reminds me of the equations used to encode and decode video or audio signals that have multiple channels... It always looked weird to me, but it is making more sense now. thankyou for this video!
@dankillinger Рік тому ⁺¹
babe wakeup, Creel vid dropped
@AT-zr9tv Рік тому
This was thoroughly entertaining! Your delivery is clear and super fun. Love your humor. Nerd out
@ciberman Рік тому
I didn't watch the video yet, but that is the best intro ever
@osakaandrew Рік тому ⁺¹⁹
Canadian programmers of almost every skill level know all about double-double. Many employ it literally daily today. And for us, the OG is A Cruller. Tim Horton's got most of the credit for popularizing this, but it was around for yonks before that.
@bytesandbikes Рік тому ⁺¹⁰
Now I need to order a Tim Horton's "Quad precision float". I wonder if I'll end up with a milkshake.
@WyrdieBeardie Рік тому
Your videos are absolutely fantastic!
@steubens7 Рік тому
even without the extra precision, being able to check how far away from the precise answer this easily is great. it can be a sanity check for code modifications or let you know when long term drift starts being huge in long term accumulated values
@Rottingflare Рік тому
This was a fascinating topic you brought up! Really appreciate all the concepts you find worthy enough to discuss!!
@codemonkey6173 Рік тому
Nice to see you back!
@wumi2419 Рік тому ⁺²
I learned about this when I had way too much error and wanted to reduce it. There was a lot of summation, so Kahar sums were very helpful (they basically do the same, but compensator is carried over instead of being stored as part of value)
@angeldude101 Рік тому ⁺¹
I wasn't sure what the result would be initially, but I quickly remembered Posit documentation on a use of the proposed "quire" to round the value, subtract that value from the quire, and then round again and repeat for however much precision the quire can store. Here we don't have a higher precision register to use, but the core principle seems to be the same: representing a higher precision value as the sum of multiple values with differing levels of precision.
@child_of_god_ Рік тому
i bumped into this method when i got different results double additions by the cpu vs using nvidia cuda. now i know the history of this. cool
@MAli-wu4rx Рік тому ⁺¹
Welcome back man! You are the best!
@rstewart2702 6 місяців тому
Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…
@mhcbon4606 Рік тому
it has been a while have not seen your video ! I am delighted.
@algorithminc.8850 Рік тому
Hahaha ... enjoyed the piano ... checked out your music channel too ... good stuff. Great video on this ... back in the days of early microcontrollers and the like, worked out or used all sorts of methods to computer various precisions. Thanks much for the video. Cheers from sunny Florida, USA.
@hobbified Рік тому ⁺²
I haven't worked with this, but I have used Kahan summation, which is based on the same core idea - you're adding together a bunch of regular doubles, and your output is a regular double, but you keep double-double precision on the running sum (and just forget about the lower part once you're done, that's your un-representable roundoff error). In this special case, you can skip about half of the floating point operations compared to doing full Dekker addition each time.
@Antonio-yy2ec Рік тому
Just what I needed, your videos are pure gold
@timpalmer4939 9 місяців тому
The font on his slides is terrible. But this video is very helpful! One of the best videos I've watched on floats so far
@VioletGiraffe Рік тому
Never heard of this technique, thanks!
@rogerfroud300 Рік тому
The idea of changing the order of calculations to reduce the error works for integer and floating point arithmetic too.
@asandax6 Рік тому
Very interesting this video came out right after I was playing with adding or subtracting and dividing numbers by 2. I was trying to figure out a naive way to efficiently store the result. The reason for this is odd numbers have a .5 result so this needs an extra bit of data to store but modern computers use 8bit as 1 byte so that means I'd be wasting 7bits at best and 8bits at worst. I ended up just using 1byte to store the .5 for the next 8bytes and that's the best way I've found so far but I was also considering investigating variable length encoding.
@leyasep5919 Рік тому
This structure reminds me a bit of the "lifting scheme" of reversible wavelet transforms. Noyce !
@IanBLacy Рік тому
I’d like to propose doubly-imprecise floating point. Use one floating point number to hold the mantissa and one to hold the exponent
@IcyMidnight Рік тому
Nice video 🙂
Woulda been nice to have a quick run through with overflow.
@zxborg9681 Рік тому ⁺²
Very interesting,and well presented. I know of Dekker from the mutex work but had not seen this before. Fun fact: up here in Canada, a double double usually refers to a coffe with double sugar and double cream, usually from a coffe-and-donuts store like Tim Hortons.
@PaulPassarelli Рік тому ⁺¹
You spel't "Dunkin Donuts" wrong. ;^)
@electrolyteorb Рік тому
❤ please upload more often
@kenhaley4 Рік тому ⁺⁶
It would have been interesting to see what happens if the decimal point locations in the two operands were farther apart. E.g., adding 470 + 0.0022. I'm sure this method works, but it would be interesting to see how the smaller number completely disappears in the result.
@damian_smith Рік тому ⁺³
Sure, but I also think that he's provided enough info for us to experiment with excel or pen and paper.
Just gave it a go! Your example actually yields C = 470, c = 0.0022 -> no loss of precision: the lower hal just isn't adjacent to the upper.
However, if you use 4709 and 0.002216, you get 4700 and 9.0 so all the lower digits disappear.
@barakap2372 Рік тому ⁺²
Yessss ur back what took so long
@WhatsACreel Рік тому ⁺¹
Nothing in particular mate, cheers for watching :)
@criticalpoint2779 Рік тому
My goodness! Finally some good f***ing content
@activelow9297 Рік тому
The bucket of fish is back! I don't use this technique.. I use two 64-bit integers. One of them covers all integral values between -2^63 and +2^63-1. The second integer covers all fractional values from 1/1e17 to 1-(1/1e17). Much easier and faster!
@esra_erimez Рік тому
This is a doubly good video
@aonodensetsu Рік тому
now i need this extended to infinite (arbitrary) precision with some clever loops and arrays
@MaxTSanches Рік тому
Early in the 1970's one of the first computer languages IBM BASIC required that you set your varibles to Binary, Single Presicision, or Double Prescision. Binary only used one bit, Single used eight bits, and Double used sixteen bits. This was when our school's computer only used a eight bit chip and only had 4K of memory.
.
@romancharak3675 Рік тому
Fascinating !
@stuartmcconnachie Рік тому
Double Dekker arithmetic.
So famous they named a type of British bus after it.
@esra_erimez Рік тому
I really really liked this video
@ianboard544 Рік тому
I've done this for years - just never knew what it was called - especially in multiply-accumulate expressions like polynomial evaluation or FIR filters.
@sb_dunk Рік тому ⁺¹
Gotta get eyes in the sky so that we can high in the sky, get a birds eye view.
@AnotherPointOfView944 Рік тому
Hi, great video as always. You present very well!
Can you do one on QUATERNIONS please, in your inimitable style!!!
@yash1152 Рік тому
5:26 that ligature with fl at the base line. while youtube making the fl ligature at their heads. woah.
@ruadeil_zabelin Рік тому ⁺²
I play the piano too. I love being explained things that have nothing to do with the piano.
@ZomB1986 Рік тому
This brings me back to 2006 when I ported dhbailey's doubledouble in Java and found and fixed a lot of bugs, like corner cases in exp() and found some very specific "Devil's values" as I'd like to call them, that really stress the algorithm.
@ZomB1986 Рік тому
After watching the video, the algorithm I have (and verified on paper) is not using IF statements, and is more complicated. It has many operations as in your p++ code, if the if and else statements are both counted, and then some.
public DoubleDouble add(DoubleDouble y) {
double a, b, c, d, e, f;
e = this.hi + y.hi;
d = this.hi - e;
a = this.lo + y.lo;
f = this.lo - a;
d = ((this.hi - (d + e)) + (d + y.hi)) + a;
b = e + d;
c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b));
a = b + c;
return new DoubleDouble(a, c + (b - a));
}
@Czeckie Рік тому
mad piano playing my friend
@tbird-z1r Рік тому ⁺⁸
Why did they not call it "double Dekker" arithmetic?!
@unvergebeneid Рік тому
Nice :)
@wrc1210 Рік тому ⁺¹
Came to say the same thing!
@Crush_Netics Рік тому
You should upload videos on solving crackme(s). That would be awesome!
@Shifticek Рік тому ⁺¹
When you rounded when calculating A+B do you mean actual round or ceil?
E.g if the result was 154 would you go to 160 or 150?
Thanks for your videos, you explain complicated stuff in really understandable way
@Erhannis Рік тому ⁺¹
I think it was more of a ceil or truncation - not sure how it'd deal with negative numbers, but your example would be 150 (and 4 remainder in the other half of the computation) I think
@c4ashley Рік тому
No but it does have something to do with playing piano, because... like, if you... well you see, the... it's like when you... the tone of the string contains harmonics of various factors of the fundamental frequency, which is analogous to the variable containing subvariables of various factors of precision of the number. Boom. Nailed it.
@yash1152 Рік тому
6:43 so, its like a tuple right? in the same way a complex number, and also a rational number is. its nice to think about how to do operations on 2 of these dd's. that the result of op on error has to be added to the main part, then leftover propagated back.
feels like how cascading in electronic full adders work.
@Sonnentau1 Рік тому
Question. At one point we made sure to have the bigger number at first for substracrion of R. But your c++ code did not show any parentheses to make sure the order ist carried out that way? Because if I am not mistaken either clang or gcc is doing its operations from right to left.... Or won't this matter?
@macsmola Рік тому
Now im thinking how do I go about outputting this nicely formatted?
@Datamining101 Рік тому
7:43 I was thinking Jemaine Clement from Flight of the Conchords
@OpenGL4ever Рік тому
@Creel
Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?
@SteveBakerIsHere Рік тому
A practical problem that I have is in using a GPU. It has double precision numbers - but in 3D graphics - we have to use the available hardware interpolators that can only cope with single precision. I'm trying to get my head around whether a similar Dekker trick would allow me to use hardware interpolators to interpolate double precision using single precision hardware.
@andrewporter1868 Рік тому
1:48 I'm convinced I'm watching a legitimate madman, and this is him showing his true colors. If you haven't found it already, you've got to find yourself the "real instructions ||| mental disorders" meme image for x86. I think you've personified it 😂
@PaulPassarelli Рік тому
Hmmm. Many years ago I wrote a floating point emulator for the 8087 family of Intel co-processors. Since it was a full blown but granular co-processor emulator, I had to implement the chip family's 80-bit registers. So it was just as accurate as common ISO 64-bit doubles. Since then I've also had to code numerous 'fixed point' math systems as well to process hard real-time data for ECU systems.
If total flexibility was the goal, then I'd simply extend my co-processor library out to arbitrarily many bits in the registers increasing both the absolute magnitude and the bits of precision whether that'd be 128bits (double-double) 160 bits (double-80), 240 bits (triple-80), 256 bits (quad-double)or more, perhaps 512 bits (64 byte or octo-double) the code is almost unchanging, and with early out detectors for rational data the speed penalty for such super precision is quite minimal.
What's even better, assuming the C compiler still supports the '/Emulator' switch, the original (simple) compiled code would run on any computer without additional modification, or could make use of the co-processor for faster first pass estimations.
Anyway, thanks for the trip down memory lane -- no pun intended
@bbbl67 Рік тому
In Canada, a double-double is a coffee with 2 cream and 2 sugar.
@GrayBlood1331 Рік тому
Surely there are op codes to handle this quickly. Just like how if you wanted multiplication (or floating point) for the 6502, it could be done, but slowly. So CPU designers added op codes to later chips to greatly speed these operations up.
@alecgamer420 Рік тому
Your videos are goated
@GMHarpov 6 місяців тому
great piano.
@ChrisM541 Рік тому
What a super interesting video, thanks Creel. I wonder if some game engines include this as an option in order to increase e.g. 3D precision?
@ІгорАлієв Рік тому
I'll watch the vid. It's interesting to compare with Kohan-Babushka-Neumaier and Shevchuk algorithms.
P.S. Watched already. Nice. You actually use KBN algo for adding higher and lower parts of double double.
@MaxCareyPlus Рік тому ⁺¹
Soo... All aboard the double Dekker bus? 🚌😂
@aleksandersabak Рік тому
Does this addition algorithm always produce the most accurate double double result? One edge case that wasn't shown here is when the overflow in r happens it will lose information about the least significant digit of the sum due to the overlap between r and R. This will not be recovered by computation of c and we will be left with c which will definitely have a zero in the lowest digit, losing one bit of precision.
@uplink-on-yt Рік тому
Does that second number have a name? If it were integer division, it would be called the "remainder". This value in floating point rounding plays a similar role. I could see it being called a "remainder" too, as maths don't shy away from reusing terms as long as it's clear from context what the meaning is.
@rstewart2702 6 місяців тому
Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?
@cmyk8964 Рік тому
A “double” precision solution? Devised by a Mr. “Dekker”? If God exists, he must absolutely love puns.
@Paul_Hanson Рік тому ⁺¹
This should be called Double Dekker arithmetic.
@X_Baron Рік тому
There has to be a display/print/string conversion algorithm too, right?
@fredrikbergquist5734 Рік тому
Is this a way to better the result when subtracting two numbers that are very close?
@INT41O Рік тому ⁺²
There is a better formula by Knuth where we do not need to order the values by magnitude:
R = a + b
c = R - a
r = (a - (R - c)) + (b - c)
now you can add stuff like 1 + 1e100
@michaeldeakin9492 Рік тому
in my experience, Dekker's formula is faster on modern CPUs. I'd guess it's because branch prediction is good enough for this case so it's faster than adding extra floating point operations.
@INT41O Рік тому
@@michaeldeakin9492 I doubt that, but maybe I will check when I have time. I don't think branch prediction actually works well in this case, since the relative size of the values is unknown and basically impossible to predict.
It all comes down to what the compiler actually does. The intel compiler for instance uses logical masks to avoid branching (fp-model=precise needs to be used, otherwise the error value is optimized to zero).
For simd branching is out of the question anyway.
@michaeldeakin9492 Рік тому
@@INT41O I have actually tried it somewhat recently; for my use case (exact predicate evaluation) it was faster. I'd link my project, but youtube doesn't seem to let any post through pointing to it at all.
And while branches won't work with simd, blending will work. I don't know if that will be faster than Knuth's formula.
BTW, Chandler Carruth's talk isn't quite the same but seems relevant here: CppCon 2017: Chandler Carruth “Going Nowhere Faster”
Skip to the cmov section for the relevant part.
@michaeldeakin9492 Рік тому
@@INT41O Okay, I reran some scalar benchmarks I had with 3 different methods, and implemented vector versions: scalar Dekker summation, scalar branchless Dekker summation via array indexing, scalar Knuth summation, vector (branchless) Dekker summation, and vector Knuth summation.
Scalar performance ranking from best to very much the worst (as in absolutely awful) was as follows: Dekker summation, Knuth summation, branchless Dekker summation
Vector performance was similar, Knuth summation was much better than branchless Dekker summation. I'm unsure why at this point
@jimf2525 Рік тому
Very nice video. It was a bit frustrating that you didn’t include the information that X has to be the ‘larger’ number, or more specifically the number with digits in the same places as R.
@jimf2525 Рік тому
SORRY! You explained that in minute 14. I just started thinking about it on my own b4 I got there. Yea me and you!
@Lexxaro Рік тому
Would double double work in a case like 5700.018 + 1800.053 ?
Because then it would actually have really weird, specific positions where it is much more than double the precision of a normal double.
@dunda563 Рік тому
I wonder how double dekker doubles would compare to a single floating point number with double the bits? Are there any advantages perhaps? At the very least the method allows double the precision on hardware not built for it, and the two groups of bits don't have to be consecutive
@codenamelambda Рік тому
I do wonder how it compares performance wise with emulated floating point using integer arithmetic to get the same precision.....
@chickenduckhappy Рік тому
Great video, thanks! 🙂 Just one tiny bit pick: why is your metronome obviously 17 times as expensive as your camera? 😎
Рік тому
How does this compare to using 128-bit floating point numbers instead?
@unvergebeneid Рік тому ⁺¹
Wait, if _r_ overflows, don't we lose precision already? How are we going to get it back if we don't split _r_ itself into an upper and a lower part? I'm sure I missed something but I can't see it right now.
@Axman6 Рік тому
The idea is that if r ended up being 15.2 in the example, we’d need to round it to 15 losing the 0.3, but the 1 in the tens place can also be stored in R by adding it, and then finding that we’d be left with 5. Applying the final clean up,you’d recover the .3 and end up with R=170 and r=5.3
@vaakdemandante8772 Рік тому
Nice trick, trading speed and space for precision. Weird choice of example values that result in no carry happening.
@KipIngram 5 місяців тому
15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.
@ktownjunkie Рік тому
Would truncation work instead of rounding?
@bucketofnubbins8532 Рік тому ⁺¹
I think this could be done recursively to get some 'arbitrary-precision' floats. (Obviously memory will limit how many bits of precision is possible)
Though I can see this having a large impact on performance
@WhatsACreel Рік тому
It certainly can! Tends to be too much computation after about 4 doubles.
@Axman6 Рік тому
Any language with generics makes this pretty easy, if you can overload your maths operators, you have define Compensated with the algorithms in the video, and then you can use Compensated for quad precision etc. the calculation does get slow very… fast.