Double it Like Dekker: A Remarkable Technique to Double Floating Point Precision (Part 1)

Поділитися
Вставка
  • Опубліковано 20 чер 2023
  • This video is an introduction to Double Double arithmetic. Which is a technique used to greatly improve the precision of floating point computations in programming. It was first proposed by T. J. (Dirk) Dekker in his paper from 1971.
    Link to Dekker's original paper: csclub.uwaterloo.ca/~pbarfuss...
    Support What's a Creel? on Patreon: / whatsacreel
    FaceBook: / whatsacreel
    Software used to make this vid: Visual Studio 2019 Community: www.visualstudio.com/downloads/
    Audacity: www.audacityteam.org/
    Davinci Resolve 16: www.blackmagicdesign.com/prod...
    OpenOffice: www.openoffice.org/
    Gimp: www.gimp.org/

КОМЕНТАРІ • 188

  • @FloydMaxwell
    @FloydMaxwell 11 місяців тому +17

    I studied chemical engineering at the University of British Columbia in the late 70s and early 80s. It was a demanding bachelor's program with many labs, including an 8-hour chem lab, and you had to do a bachelor's thesis (very unusual in itself). In addition, you had to do a summer project (before your final year). In the project I chose, I took over the bachelor's thesis work of two previous students. Their efforts ...had not worked, and Dr. Janis Lielmezs wondered if I could get it to work. Long story short: the program needed double double precision instead of just double precision. Dr. Lielmezs was most pleased with the result. And passed away in 2021.

  • @TerjeMathisen
    @TerjeMathisen 11 місяців тому +140

    Not only is having twice as many mantissa bits pretty nice, it is so useful that the last time the ieee754 FP standard was updated (2016-2019), we added two new instructions:
    AugmentedAddition() and AugmentedMultiplication(), both of them take the usual two inputs, but they then deliver two results, the high and the low part, so that together the operation is always exact. This can typically be implemented in FPUs with no more than one extra cycle of latency, and they are also very good building blocks for even more precision, i.e. using arrays of double to implement arbitrary precision.

    • @marcomoreno6748
      @marcomoreno6748 11 місяців тому +5

      Commenting to bookmark this. I enjoy reading about IEEE754 standard for purposes of deterministic lockstep networked applications and somehow I missed this. Will check out later.

    • @oscarsmith3942
      @oscarsmith3942 11 місяців тому +4

      Note that `AugmentedMultiplication` isn't really necessary because you can compute it with a single fused multiply add instruction (fma). Once you have computed c=a*b, you can compute c_lo = fma(a, -b, c) which will preduce exactly correct result for the multiplication low bits.

    • @TerjeMathisen
      @TerjeMathisen 11 місяців тому +10

      @@oscarsmith3942 You are of course correct. It is in fact the existence of fused FMA which makes the Dekker add/sub algorithm really useful, even though it typically takes at least three times as long as a single FADD: FMUL + FMA is "only" twice as slow as a single FMUL, while both Augmented operations can realistically run in zero or one cycle more than the plain ops.
      In the end this makes extended-precision operations at least twice as fast as what is possible today.

  • @friendlyoctopus9391
    @friendlyoctopus9391 11 місяців тому +81

    One place these "old tricks" are very useful is when programming GPUs. Some GPUs are significantly faster with floats than with doubles (Nvidia) and some don't support doubles at all (Intel Xe) - identifying where your precision bottlenecks are and applying these methods gives you much better memory use and performance while keeping precision... As long as you're not using a device compiler that doesn't let you turn off unsafe floating point optimizations that throw them away (looking at you, Intel).

    • @Spielix
      @Spielix 11 місяців тому +13

      For most Nvidia GPUs you are right that the peak double precision performance (i.e. FLOP/s) is worse by a factor of 64. Just two refinements of that statement: 1. High-end server accelerators (P100, V100, A30, A100, H100) have good (i.e. only a factor 2 worse than single precision) double precision performance. 2. That performance difference only has practical implications for algorithms that are not memory-bound (low arithmetic intensity) anyway. E.g. for properly implemented dense matrix multiplication (DGEMM) you should be able to see it, but not for vector addition (DAXPY).

    • @lpi3
      @lpi3 11 місяців тому +10

      It's also useful for microcontroller programming. Cortex m4 has floating point 32 bit instructions (float), but no 64 bit (double)

  • @Florian-sh9nf
    @Florian-sh9nf 11 місяців тому +25

    When you think he couldn't become any cooler and he suddenly plays the piano as an intro. Legend!

  • @Intermernet
    @Intermernet 11 місяців тому +61

    Love the skills on the piano my dude. Also, thanks for explaining the topic so well.

    • @FadkinsDiet
      @FadkinsDiet 11 місяців тому +2

      With those skills I hope he can get the piano tuned.

    • @lorenzodiambra5210
      @lorenzodiambra5210 11 місяців тому +1

      3:56 *The death of a 0.3 floating point is a tragedy, the death of millions people is a statistic*
      cit

  • @jacob_90s
    @jacob_90s 11 місяців тому +20

    I've used compensated arithmetic on a number of occasions. It's very handy on embedded devices that hardware float support but don't support doubles.

  • @andersjjensen
    @andersjjensen 11 місяців тому +3

    The legend returns. He comes baring his usual gifts of entertaining madness sprinkled with useful knowledge. And we are grateful.

  • @dainxor
    @dainxor 11 місяців тому +3

    Curiously enough, a while ago, when i was wondering around the c++ std lib, i found some functions using this method, even got to understand some of it, but now i know how ingenious it really is, great video man!

  • @warmCabin
    @warmCabin 11 місяців тому +2

    Incredible! With twice the memory, we can store twice the information!

    • @MrRobertX70
      @MrRobertX70 10 місяців тому

      LOL, that was my reaction too

  • @oresteszoupanos
    @oresteszoupanos 11 місяців тому

    Can't BELIEVE that I missed a new Creel vid for 2 weeks!
    For shame! Thanks for the cool info as always, matey!
    P.S. In the UK, we honour Dekker's algos all over the country with our DoubleDekker buses. ^_^

  • @imrank340
    @imrank340 11 місяців тому +3

    Never quite got my head around with Dekker's Double Double computation. But you made it significantly easy particularly carry Over or under, Thank YOU Very Very Much, my Aussie Friends, I always look out your video.

  • @johnteran8889
    @johnteran8889 11 місяців тому +13

    This is pretty cool. I can see how it extends to quad double but with significantly more bookkeeping.

    • @WhatsACreel
      @WhatsACreel  11 місяців тому +3

      Yep, it just keeps going! Cheers for watching mate :)

  • @Jhat
    @Jhat 11 місяців тому +4

    Man I wish your upload schedule was more frequent 😅 all your content is so interesting and insightful.

  • @hjups
    @hjups 11 місяців тому +23

    Neat! I had always wondered how this was done, but never took the time to understand it. A lot of GPU based computational physics codes use this method given a limited number of FP64 units (depending on the problem scale, the exponent in FP32 is sufficient, but we really want higher decimal precision).
    That does lead to an obvious question though: does it need to be a "round to nearest" operation? Or will this work with any rounding mode assuming it's consistent, such as round to 0 (truncation)?
    Anyway, looking forward to the multiplication and division!

  • @Rottingflare
    @Rottingflare 11 місяців тому

    This was a fascinating topic you brought up! Really appreciate all the concepts you find worthy enough to discuss!!

  • @slow_learner
    @slow_learner 11 місяців тому +2

    Ty Creel.... love how precise and in depth your videos are. Piano skills are killer as well

  • @MartinPHellwig
    @MartinPHellwig 11 місяців тому

    That intro is brilliant, thanks for brightening my day, now on to the actual content.

  • @AT-zr9tv
    @AT-zr9tv 11 місяців тому

    This was thoroughly entertaining! Your delivery is clear and super fun. Love your humor. Nerd out

  • @codemonkey6173
    @codemonkey6173 11 місяців тому

    Nice to see you back!

  • @Antonio-yy2ec
    @Antonio-yy2ec 11 місяців тому

    Just what I needed, your videos are pure gold

  • @WyrdieBeardie
    @WyrdieBeardie 7 місяців тому

    Your videos are absolutely fantastic!

  • @MAli-wu4rx
    @MAli-wu4rx 11 місяців тому +1

    Welcome back man! You are the best!

  • @dankillinger
    @dankillinger 11 місяців тому +1

    babe wakeup, Creel vid dropped

  • @barakap2372
    @barakap2372 11 місяців тому +2

    Yessss ur back what took so long

    • @WhatsACreel
      @WhatsACreel  11 місяців тому +1

      Nothing in particular mate, cheers for watching :)

  • @osakaandrew
    @osakaandrew 11 місяців тому +18

    Canadian programmers of almost every skill level know all about double-double. Many employ it literally daily today. And for us, the OG is A Cruller. Tim Horton's got most of the credit for popularizing this, but it was around for yonks before that.

    • @bytesandbikes
      @bytesandbikes 11 місяців тому +10

      Now I need to order a Tim Horton's "Quad precision float". I wonder if I'll end up with a milkshake.

  • @slembcke
    @slembcke 11 місяців тому +3

    Interesting! I've implemented 256 bit fixed point before (64.196), and figuring out on efficient-ish assembly implementation for multiplication and reciprocal was tricky! I figured it was possible to chain multiple floating point values together as well, and it's nice to have it spelled out. Thanks! :)

  • @mhcbon4606
    @mhcbon4606 11 місяців тому

    it has been a while have not seen your video ! I am delighted.

  • @kenhaley4
    @kenhaley4 11 місяців тому +6

    It would have been interesting to see what happens if the decimal point locations in the two operands were farther apart. E.g., adding 470 + 0.0022. I'm sure this method works, but it would be interesting to see how the smaller number completely disappears in the result.

    • @damian_smith
      @damian_smith 10 місяців тому +2

      Sure, but I also think that he's provided enough info for us to experiment with excel or pen and paper.
      Just gave it a go! Your example actually yields C = 470, c = 0.0022 -> no loss of precision: the lower hal just isn't adjacent to the upper.
      However, if you use 4709 and 0.002216, you get 4700 and 9.0 so all the lower digits disappear.

  • @VioletGiraffe
    @VioletGiraffe 11 місяців тому

    Never heard of this technique, thanks!

  • @rstewart2702
    @rstewart2702 Місяць тому

    Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…

  • @ciberman
    @ciberman 11 місяців тому

    I didn't watch the video yet, but that is the best intro ever

  • @timpalmer4939
    @timpalmer4939 3 місяці тому

    The font on his slides is terrible. But this video is very helpful! One of the best videos I've watched on floats so far

  • @ktownjunkie
    @ktownjunkie 11 місяців тому

    the first equation reminds me of the equations used to encode and decode video or audio signals that have multiple channels... It always looked weird to me, but it is making more sense now. thankyou for this video!

  • @electrolyteorb
    @electrolyteorb 11 місяців тому

    ❤ please upload more often

  • @criticalpoint2779
    @criticalpoint2779 11 місяців тому

    My goodness! Finally some good f***ing content

  • @esra_erimez
    @esra_erimez 11 місяців тому

    I really really liked this video

  • @angeldude101
    @angeldude101 11 місяців тому +1

    I wasn't sure what the result would be initially, but I quickly remembered Posit documentation on a use of the proposed "quire" to round the value, subtract that value from the quire, and then round again and repeat for however much precision the quire can store. Here we don't have a higher precision register to use, but the core principle seems to be the same: representing a higher precision value as the sum of multiple values with differing levels of precision.

  • @romancharak3675
    @romancharak3675 11 місяців тому

    Fascinating !

  • @esra_erimez
    @esra_erimez 11 місяців тому

    This is a doubly good video

  • @rogerfroud300
    @rogerfroud300 11 місяців тому

    The idea of changing the order of calculations to reduce the error works for integer and floating point arithmetic too.

  • @tbird81
    @tbird81 11 місяців тому +6

    Why did they not call it "double Dekker" arithmetic?!

    • @unvergebeneid
      @unvergebeneid 11 місяців тому

      Nice :)

    • @wrc1210
      @wrc1210 11 місяців тому +1

      Came to say the same thing!

  • @rjones6219
    @rjones6219 11 місяців тому

    I first encountered the concept of double precision when reading about programming the DEC PDP8E. That was nearly 50 years ago. 😊

  • @algorithminc.8850
    @algorithminc.8850 11 місяців тому

    Hahaha ... enjoyed the piano ... checked out your music channel too ... good stuff. Great video on this ... back in the days of early microcontrollers and the like, worked out or used all sorts of methods to computer various precisions. Thanks much for the video. Cheers from sunny Florida, USA.

  • @child_of_god_
    @child_of_god_ 11 місяців тому

    i bumped into this method when i got different results double additions by the cpu vs using nvidia cuda. now i know the history of this. cool

  • @alecgamer420
    @alecgamer420 10 місяців тому

    Your videos are goated

  • @steubens7
    @steubens7 11 місяців тому

    even without the extra precision, being able to check how far away from the precise answer this easily is great. it can be a sanity check for code modifications or let you know when long term drift starts being huge in long term accumulated values

  • @stuartmcconnachie
    @stuartmcconnachie 11 місяців тому

    Double Dekker arithmetic.
    So famous they named a type of British bus after it.

  • @hobbified
    @hobbified 11 місяців тому +1

    I haven't worked with this, but I have used Kahan summation, which is based on the same core idea ­- you're adding together a bunch of regular doubles, and your output is a regular double, but you keep double-double precision on the running sum (and just forget about the lower part once you're done, that's your un-representable roundoff error). In this special case, you can skip about half of the floating point operations compared to doing full Dekker addition each time.

  • @MaxTSanches
    @MaxTSanches 11 місяців тому

    Early in the 1970's one of the first computer languages IBM BASIC required that you set your varibles to Binary, Single Presicision, or Double Prescision. Binary only used one bit, Single used eight bits, and Double used sixteen bits. This was when our school's computer only used a eight bit chip and only had 4K of memory.
    .

  • @IanBLacy
    @IanBLacy 10 місяців тому

    I’d like to propose doubly-imprecise floating point. Use one floating point number to hold the mantissa and one to hold the exponent

  • @wumi2419
    @wumi2419 11 місяців тому +1

    I learned about this when I had way too much error and wanted to reduce it. There was a lot of summation, so Kahar sums were very helpful (they basically do the same, but compensator is carried over instead of being stored as part of value)

  • @leyasep5919
    @leyasep5919 11 місяців тому

    This structure reminds me a bit of the "lifting scheme" of reversible wavelet transforms. Noyce !

  • @asandax6
    @asandax6 11 місяців тому

    Very interesting this video came out right after I was playing with adding or subtracting and dividing numbers by 2. I was trying to figure out a naive way to efficiently store the result. The reason for this is odd numbers have a .5 result so this needs an extra bit of data to store but modern computers use 8bit as 1 byte so that means I'd be wasting 7bits at best and 8bits at worst. I ended up just using 1byte to store the .5 for the next 8bytes and that's the best way I've found so far but I was also considering investigating variable length encoding.

  • @ChrisM541
    @ChrisM541 11 місяців тому

    What a super interesting video, thanks Creel. I wonder if some game engines include this as an option in order to increase e.g. 3D precision?

  • @IcyMidnight
    @IcyMidnight 9 місяців тому

    Nice video 🙂
    Woulda been nice to have a quick run through with overflow.

  • @Czeckie
    @Czeckie 11 місяців тому

    mad piano playing my friend

  • @activelow9297
    @activelow9297 11 місяців тому

    The bucket of fish is back! I don't use this technique.. I use two 64-bit integers. One of them covers all integral values between -2^63 and +2^63-1. The second integer covers all fractional values from 1/1e17 to 1-(1/1e17). Much easier and faster!

  • @ruadeil_zabelin
    @ruadeil_zabelin 11 місяців тому +2

    I play the piano too. I love being explained things that have nothing to do with the piano.

  • @Shifticek
    @Shifticek 10 місяців тому +1

    When you rounded when calculating A+B do you mean actual round or ceil?
    E.g if the result was 154 would you go to 160 or 150?
    Thanks for your videos, you explain complicated stuff in really understandable way

    • @Erhannis
      @Erhannis 10 місяців тому +1

      I think it was more of a ceil or truncation - not sure how it'd deal with negative numbers, but your example would be 150 (and 4 remainder in the other half of the computation) I think

  • @CubicSpline7713
    @CubicSpline7713 10 місяців тому

    Hi, great video as always. You present very well!
    Can you do one on QUATERNIONS please, in your inimitable style!!!

  • @rstewart2702
    @rstewart2702 Місяць тому

    Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?

  • @ianboard544
    @ianboard544 11 місяців тому

    I've done this for years - just never knew what it was called - especially in multiply-accumulate expressions like polynomial evaluation or FIR filters.

  • @GMHarpov
    @GMHarpov 27 днів тому

    great piano.

  • @sb_dunk
    @sb_dunk 11 місяців тому +1

    Gotta get eyes in the sky so that we can high in the sky, get a birds eye view.

  • @yash1152
    @yash1152 8 місяців тому

    5:26 that ligature with fl at the base line. while youtube making the fl ligature at their heads. woah.

  • @Crush_Netics
    @Crush_Netics 10 місяців тому

    You should upload videos on solving crackme(s). That would be awesome!

  • @zxborg9681
    @zxborg9681 11 місяців тому +2

    Very interesting,and well presented. I know of Dekker from the mutex work but had not seen this before. Fun fact: up here in Canada, a double double usually refers to a coffe with double sugar and double cream, usually from a coffe-and-donuts store like Tim Hortons.

    • @PaulPassarelli
      @PaulPassarelli 11 місяців тому +1

      You spel't "Dunkin Donuts" wrong. ;^)

  • @aonodensetsu
    @aonodensetsu 11 місяців тому

    now i need this extended to infinite (arbitrary) precision with some clever loops and arrays

  • @SteveBakerIsHere
    @SteveBakerIsHere 11 місяців тому

    A practical problem that I have is in using a GPU. It has double precision numbers - but in 3D graphics - we have to use the available hardware interpolators that can only cope with single precision. I'm trying to get my head around whether a similar Dekker trick would allow me to use hardware interpolators to interpolate double precision using single precision hardware.

  • @bbbl67
    @bbbl67 8 місяців тому

    In Canada, a double-double is a coffee with 2 cream and 2 sugar.

  • @yash1152
    @yash1152 8 місяців тому

    6:43 so, its like a tuple right? in the same way a complex number, and also a rational number is. its nice to think about how to do operations on 2 of these dd's. that the result of op on error has to be added to the main part, then leftover propagated back.
    feels like how cascading in electronic full adders work.

  • @OpenGL4ever
    @OpenGL4ever 10 місяців тому

    @Creel
    Could you make a comparison video between AMD's 3DNow! and Intel's SSE (1st SSE Version)? Which one was better? What were the pros and cons of each? Which was easier to use from a programming point of view, more powerful, more versatile etc.?

  • @macsmola
    @macsmola 11 місяців тому

    Now im thinking how do I go about outputting this nicely formatted?

  • @ZomB1986
    @ZomB1986 11 місяців тому

    This brings me back to 2006 when I ported dhbailey's doubledouble in Java and found and fixed a lot of bugs, like corner cases in exp() and found some very specific "Devil's values" as I'd like to call them, that really stress the algorithm.

    • @ZomB1986
      @ZomB1986 11 місяців тому

      After watching the video, the algorithm I have (and verified on paper) is not using IF statements, and is more complicated. It has many operations as in your p++ code, if the if and else statements are both counted, and then some.
      public DoubleDouble add(DoubleDouble y) {
      double a, b, c, d, e, f;
      e = this.hi + y.hi;
      d = this.hi - e;
      a = this.lo + y.lo;
      f = this.lo - a;
      d = ((this.hi - (d + e)) + (d + y.hi)) + a;
      b = e + d;
      c = ((this.lo - (f + a)) + (f + y.lo)) + (d + (e - b));
      a = b + c;
      return new DoubleDouble(a, c + (b - a));
      }

  • @user-xe8oi5oq6c
    @user-xe8oi5oq6c 11 місяців тому

    I'll watch the vid. It's interesting to compare with Kohan-Babushka-Neumaier and Shevchuk algorithms.
    P.S. Watched already. Nice. You actually use KBN algo for adding higher and lower parts of double double.

  • @Sonnentau1
    @Sonnentau1 11 місяців тому

    Question. At one point we made sure to have the bigger number at first for substracrion of R. But your c++ code did not show any parentheses to make sure the order ist carried out that way? Because if I am not mistaken either clang or gcc is doing its operations from right to left.... Or won't this matter?

  • @andrewporter1868
    @andrewporter1868 11 місяців тому

    1:48 I'm convinced I'm watching a legitimate madman, and this is him showing his true colors. If you haven't found it already, you've got to find yourself the "real instructions ||| mental disorders" meme image for x86. I think you've personified it 😂

  • @chickenduckhappy
    @chickenduckhappy 11 місяців тому

    Great video, thanks! 🙂 Just one tiny bit pick: why is your metronome obviously 17 times as expensive as your camera? 😎

  • @fredrikbergquist5734
    @fredrikbergquist5734 11 місяців тому

    Is this a way to better the result when subtracting two numbers that are very close?

  • @MaxCareyPlus
    @MaxCareyPlus 11 місяців тому +1

    Soo... All aboard the double Dekker bus? 🚌😂

  • @uplink-on-yt
    @uplink-on-yt 11 місяців тому

    Does that second number have a name? If it were integer division, it would be called the "remainder". This value in floating point rounding plays a similar role. I could see it being called a "remainder" too, as maths don't shy away from reusing terms as long as it's clear from context what the meaning is.

  • @c4ashley
    @c4ashley 11 місяців тому

    No but it does have something to do with playing piano, because... like, if you... well you see, the... it's like when you... the tone of the string contains harmonics of various factors of the fundamental frequency, which is analogous to the variable containing subvariables of various factors of precision of the number. Boom. Nailed it.

  • @aleksandersabak
    @aleksandersabak 11 місяців тому

    Does this addition algorithm always produce the most accurate double double result? One edge case that wasn't shown here is when the overflow in r happens it will lose information about the least significant digit of the sum due to the overlap between r and R. This will not be recovered by computation of c and we will be left with c which will definitely have a zero in the lowest digit, losing one bit of precision.

  • @unvergebeneid
    @unvergebeneid 11 місяців тому +1

    Wait, if _r_ overflows, don't we lose precision already? How are we going to get it back if we don't split _r_ itself into an upper and a lower part? I'm sure I missed something but I can't see it right now.

    • @Axman6
      @Axman6 11 місяців тому

      The idea is that if r ended up being 15.2 in the example, we’d need to round it to 15 losing the 0.3, but the 1 in the tens place can also be stored in R by adding it, and then finding that we’d be left with 5. Applying the final clean up,you’d recover the .3 and end up with R=170 and r=5.3

  • @GrayBlood1331
    @GrayBlood1331 11 місяців тому

    Surely there are op codes to handle this quickly. Just like how if you wanted multiplication (or floating point) for the 6502, it could be done, but slowly. So CPU designers added op codes to later chips to greatly speed these operations up.

  • @X_Baron
    @X_Baron 11 місяців тому

    There has to be a display/print/string conversion algorithm too, right?

  • @codenamelambda
    @codenamelambda 11 місяців тому

    I do wonder how it compares performance wise with emulated floating point using integer arithmetic to get the same precision.....

  •  11 місяців тому

    How does this compare to using 128-bit floating point numbers instead?

  • @Datamining101
    @Datamining101 9 місяців тому

    7:43 I was thinking Jemaine Clement from Flight of the Conchords

  • @dunda563
    @dunda563 11 місяців тому

    I wonder how double dekker doubles would compare to a single floating point number with double the bits? Are there any advantages perhaps? At the very least the method allows double the precision on hardware not built for it, and the two groups of bits don't have to be consecutive

  • @ktownjunkie
    @ktownjunkie 11 місяців тому

    Would truncation work instead of rounding?

  • @siquod
    @siquod 11 місяців тому

    The funniest part was the super subtle sponsor segue spoof..

  • @1verstapp
    @1verstapp 11 місяців тому +2

    >picture
    obviously fake - no pocket protector. 😀. your channel, you post when you want to.

  • @Lexxaro
    @Lexxaro 11 місяців тому

    Would double double work in a case like 5700.018 + 1800.053 ?
    Because then it would actually have really weird, specific positions where it is much more than double the precision of a normal double.

  • @FavoritoHJS
    @FavoritoHJS 11 місяців тому

    hmm, what happens if while accumulating all the lower half variables you get overflow? yes, that's handled... but you could still get some error that you can't resolve afterwards
    i'm thinking something like 5,5+4,9=10(.4), if the 1 could be stored in the upper half you could have stored the ,4 in the lower half but now can't
    EDIT: is that why this only works in binary? to trigger this edge case would require something like 11|5.5 + 20|4.9, and while that might not cause overflow of the upper half in decimal, it sure will in binary

    • @WhatsACreel
      @WhatsACreel  11 місяців тому

      Hmmm… there’s definitely a lot of edge cases to try. As far as I can tell, it seems to be accurate to about 104 bits or so (or whatever twice the mantissa is). There is a paper called something like ‘testing the robustness of 2 sum and fast 2 sum’, which looks into how accurate the algorithm is, along with a simplified version. I have to be honest, I’ve not tested any more than a few examples in base 10, but I would agree, there seems to be an extra correction missing.
      I’m not sure how it all plays out in the grand scheme, but it would certainly be an interesting topic for future study. Cheers for watching :)

  • @herrbonk3635
    @herrbonk3635 11 місяців тому

    For what problems do you need higher precision than a 53 or 64 bit mantissa?

    • @adiaphoros6842
      @adiaphoros6842 11 місяців тому

      Simulations in research level physics, maybe GR or fluid dynamics.

  • @dejaphoenix
    @dejaphoenix 11 місяців тому +9

    Sounds like a lot of toil and trouble.

    • @WhatsACreel
      @WhatsACreel  11 місяців тому

      Hahaha, nice! :)

    • @SameAsAnyOtherStranger
      @SameAsAnyOtherStranger 11 місяців тому

      Not really. It just seems like that when something so simple is dragged out into a 22 minute video.

  • @PaulPassarelli
    @PaulPassarelli 11 місяців тому

    Hmmm. Many years ago I wrote a floating point emulator for the 8087 family of Intel co-processors. Since it was a full blown but granular co-processor emulator, I had to implement the chip family's 80-bit registers. So it was just as accurate as common ISO 64-bit doubles. Since then I've also had to code numerous 'fixed point' math systems as well to process hard real-time data for ECU systems.
    If total flexibility was the goal, then I'd simply extend my co-processor library out to arbitrarily many bits in the registers increasing both the absolute magnitude and the bits of precision whether that'd be 128bits (double-double) 160 bits (double-80), 240 bits (triple-80), 256 bits (quad-double)or more, perhaps 512 bits (64 byte or octo-double) the code is almost unchanging, and with early out detectors for rational data the speed penalty for such super precision is quite minimal.
    What's even better, assuming the C compiler still supports the '/Emulator' switch, the original (simple) compiled code would run on any computer without additional modification, or could make use of the co-processor for faster first pass estimations.
    Anyway, thanks for the trip down memory lane -- no pun intended

  • @KipIngram
    @KipIngram 7 днів тому

    15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.

  • @cmyk8964
    @cmyk8964 11 місяців тому

    A “double” precision solution? Devised by a Mr. “Dekker”? If God exists, he must absolutely love puns.