Double it Like Dekker 2: Dekker Multiplication and Division

Creel

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 1 чер 2024
Second part of an introduction to Double-Double Arithmetic. In this video, we look at Dekker multiplication and Division, as well as the two little helper functions that are often used to compute those operations.
Support What's a Creel? on Patreon: / whatsacreel
FaceBook: / whatsacreel
Software used to make this vid: Visual Studio 2019 Community: www.visualstudio.com/downloads/
Audacity: www.audacityteam.org/
Davinci Resolve 16: www.blackmagicdesign.com/prod...
OpenOffice: www.openoffice.org/
Gimp: www.gimp.org/

КОМЕНТАРІ • 73

@j1952d 9 місяців тому ⁺⁹
At end of division, surely 6.161759... should round to 6.162 to get a result accurate to 4 digits?
@andersjjensen 9 місяців тому ⁺¹³
This is not just important for people who NEED to understand double-double arithmetic. It's also quite important to people who just want to be inspired to write better code. I already have double double available in C, but getting in the habit of truly thinking about not doing "fools work", trying to squeeze things in to bit fields instead of full variables, packing registers, etc, is a life long journey that these deep dives really help with.
Someone much wiser than me once said: "You don't become a great programmer by writing lots of code. You become a great programmer by READING great code". And it really stuck with me. The book "Expert C Programming: Deep C secrets" by Peter van der Linden was the first book I read on the subject, and I was instantly hooked.
@WhatsACreel 9 місяців тому ⁺⁶
Well said! That sounds like a great book! I'm love Robert Sedgewick's Algorithms in C and Knuth. Sedgewick's book is like a small, condensed version of Knuth's magnum opus. Those two have had easily the biggest impact on my own code. Cheers for watching mate :)
@andersjjensen 9 місяців тому ⁺³
@@WhatsACreel Thanks for the suggestion! It would appear that "the internet" provides it simply by searching "format:PDF Robert Sedgewick Algorithms in C". How convenient :P
@dennisrkb 9 місяців тому
Also, you can maximize the precision of your existing doubles by carefully re-arranging your calculations.
@larsfinlay7325 9 місяців тому ⁺⁵
stuff like this is still a little out of my depth, but way less out of my depth than it was when I started watching your x64 tutorial. thank you for all that you do.
@WhatsACreel 9 місяців тому ⁺²
Thanks for the kind words mate! And cheers for watching :)
@CubicSpline7713 9 місяців тому ⁺¹²
Like your style! Please do a video on quaternions - that would be great!!
@WhatsACreel 9 місяців тому ⁺⁴
That's a great idea! Cheers for the suggestion, and cheers for watching :)
@lapatatadelplato6520 9 місяців тому ⁺²
I don’t know how far you’re willing to take the quaternion rabbit hole, but, I would highly suggest looking into Clifford algebras. You can do all sorts of things, more so than quaternions.
@klikkolee 9 місяців тому ⁺¹⁹
A guess at the naming of mul12: the inputs are 1-machine-float representations and the output is a 2-machine-float representation
@WhatsACreel 9 місяців тому ⁺⁸
You might be right there!
@landspide 9 місяців тому
arxiv.org/pdf/cs/0603115&ved=2ahUKEwj6xbKzk9yAAxWZa2wGHQGiCfkQFnoECA0QAQ&usg=AOvVaw13oj1xHEGn3dicuOCvE5HW
@usuyus 9 місяців тому ⁺⁴
I'm confused about why we can omit the lower half of B while computing A/B. The brief argument on the video (appears on the screen at 18:37) claims that this is the case because A/B.lower won't fit into the result. However, I don't think it's valid to split A / (B.upper + B.lower) into A / B.upper and A / B.lower... What's the logic there?
@MrRyanroberson1 8 місяців тому
Find e where A/(B+b) = A/B + e, then AB = A(B+b) + e(B+b), and e(B+b) = -Ab, so e = -Ab/(B+b). b/(B+b) due to the relationship between b and B, is always between 0 and 99/1099 which means you're multiplying A by somewhere between 0 and 0.0909, which because A is only 2 digits in size, you catch with little t and Ub, a rounding error which matches what you find with the formula provided.
@gameofpj3286 9 місяців тому
Very cool, I dreamt that I looked you up, and now you're in my recommended 👍 Good stuff
@scorch855 9 місяців тому
Thank you for making such amazing content. Your enthusiasm for programming always brings a smile to my face and makes these informative videos extremely engaging.
Hope you have a great day and I'm keen to see what topics you cover next!
@franzlyonheart4362 9 місяців тому ⁺¹
Finally more from you, please do keep it up!
@graptemys4186 9 місяців тому ⁺¹
In Mul12 I prefer to compute R.upper = p + q + A.lower * B.lower. This does require one extra addition, but means that R.upper is a better approximation to the product, and then R.lower is smaller.
As an example take a_original as the 64-bit approximation to 53 * Ln (53) and b_original as the 64 bit approximation to 59 * Ln (59). So we have a_original ≈ 210.42547142, b_original ≈ 240.57470919.
Then R.upper becomes 50623.0465927092,0092239975,9292602539... rather than 50623.0465927091,9364644214,5109176636... , and R.lower becomes −0.0000000000,0320696677,5577911444 rather than 0.0000000000,0406899083,8605514460.
Either way, the sum of R.upper + R.lower is EXACTLY a_original × b_original.
I'm looking forward to part 3 where you explain Dekker's square root algorithm.
@Antonio-yy2ec 9 місяців тому
Pure gold, thank you creel
@MiseryKira 9 місяців тому ⁺¹
Why this kind of stuff was forgotten? This is AMAZING! Ty for bring it back to life (Y)
@miran248 9 місяців тому ⁺¹
Nobody seems to care about the quality nowadays. It's all about features and time to market.
@OrangeRauy 9 місяців тому ⁺²
At the end you show that the exact result of the division would have been 6.161759489. Wouldn't this, however, round to 6.162 in 4 digits? So does that mean, while achieving 4-digit representation, DD-division is not quite as accurate as a properly nearest-rounded exact result? Could this have been alleviated by respecting lower-b in the division, too? Or is this a general shortcoming of the representation?
@SiyuJiang 9 місяців тому ⁺²
There's no guarantee that the doublelength quotient will be accurate to doublelength precision. You can check pp. 238-239 of the linked reference for the error bounds. (If using base-10 instead of base-2, the polynomial in mu is the same, just replace 2^(-t) by 10^(-t).)
@klikkolee 9 місяців тому ⁺⁴
in the result of the decker division, the last digit is rounded in the wrong direction. Are decker operations only expected to be within 1 least significant digit of the real result (as opposed to the 0.5 least significant digits of normal rounding)? Do they truncate instead of rounding? Does the discrepancy not occur in base 2 floats?
@WhatsACreel 9 місяців тому ⁺³
Might be a mistake in my arithmetic, or my taking liberty with Split. Or it might be due to the floating point rounding. Usually, the FPU rounds to nearest, with 0.5 being rounded up half the time and down the other half (called round up on evens or something). You can set that in the MXCSR register of a real FPU. Though I think that default rounding is the one recommended for double-doubel arithmetic.
Long story short, I’m not sure why that happened. I actually checked my arithmetic a few times, as I was annoyed by that little rounding error. I ended up just hoping nobody would notice…
Hahaha, well spied! :)
@WhatsACreel 9 місяців тому ⁺²
Just to be clear: To the best of my knowledge, it should round to nearest! I am not sure why it did not, and I was a little confused too.
@michaeldamolsen 9 місяців тому ⁺²
In the Dekker paper (linked in description of part 1), it is required that the underlying floating point number system is binary, not decimal. To me that seems like the most reasonable cause of the rounding discrepancy here. The rounding rule is the same for decimal and binary, but the digit where the rounding occurs will be different. E.g. 1/16 has 3 mantissa digits in decimal, but only 1 mantissa digit in binary.
I'll admit I didn't have my morning coffee yet, so I could be missing something and thus be completely wrong.
@NewLondonMarshall 4 місяці тому
Great video as always! Can you do a video for noobs like us about quantum computing, and how assembly instructions and code would be different?
@zxuiji 9 місяців тому ⁺²
6:02, you could also just make a union for it and then mask out the bits via the alternate type, such as this gcc tailored variant (I skipped the math I'm not interested in doing for a off hand comment)
double floor( double n )
{
union { long double dbl; unsigned __int128 llu; } val = { n };
...
val.llu &= ~(maskout);
return val.dbl;
}
@Alex-jb5tb 2 місяці тому
I implemented your C procedures, but declaring the structures and doubles volatily to prevent GCC from optimising the code, multiplied 0.1 50 times to get ~1e-50 and compared it to a primitive multiplication (p *= 0.1). The error with DekkerMul is 2.8486703237279e-065, the one with the primitive approach is the slightly different margin 2.4925865332619e-065. What did I do wrong ? I summed up R.upper and R.lower before starting the next call to DekkerMul. Thank you.
@jaborl 9 місяців тому
nice video
@mig7287 9 місяців тому ⁺¹
👍👍👍
@uplink-on-yt 9 місяців тому ⁺¹
They even named a type of bus after this operation.
@JSorngard 9 місяців тому
Excellent video! I am implementing a Dekker float myself for fun and was wondering how to compare two different values. Could we theoretically end up in a situation where e.g. the upper parts are slightly different, but the lower parts are such that the numbers are actually the same?
@graptemys4186 9 місяців тому ⁺²
In my (limited) experience no - provided you do what I suggest for Mul12. After each operation I call a "Check" routine which adds upper to lower and checks that this sum equals upper, which is evidence that upper is the unique closest value. (There is some small print regarding the phrase "unique closest" as the sum might be midway between two representable values. But we regard the even value to be closer.)
@SiyuJiang 9 місяців тому ⁺¹
It's not surprising that the division result is a little off. On pp. 238-239 of Dekker's paper, he mentions that "the doublelength operations defined by our algorithms... are not faithful" (the term "faithful" defined as producing the smallest rounded number larger than the result or the largest rounded number smaller than the result).
The relative error bound for division is given as x^2 + 6 x + 5 + y multiples of 2^(-2 t) for a base-2 machine, where x < 1 for perfect singlelength rounding and y ≈ 0.1 when t > 10. This comes out to 12.1, showing that the lowest 3-4 bits of the tail are not guaranteed to be accurate.
@SiyuJiang 9 місяців тому ⁺¹
I did the error calculation for a base-10 system with t = 2 bits and found that the relative error for doublelength divisions is possibly as high as 0.0012.
@WhatsACreel 9 місяців тому ⁺¹
Well done mate! That's really great info :)
@SiyuJiang 9 місяців тому
@@WhatsACreel Thanks, really enjoyed this video and your content in general!
@user-pl5qc2du8n 5 місяців тому
This is the only video about the double-double that I have found so far! I've watched them in a single shot! Can this "double dekker" get me to the true "quadruple precision" station? I try to divide integers by a constant in a fast way (i.e. multiply by a reciprocal and shift right). To do so with a 64-bit integer I need to compute a "magic number", which is exactly the first 64 bits of a quadruple precision mantissa. Switching to doubles would make the code faster and more compact. But the problem is that the two double-double mantissas not always match the real quad one. Does anyone know anything to read about it?
@thomas_mertes 9 місяців тому
Some things I found:
- In Split_Fast the definition of MASK probably should end with ;
- In DekkerMul the first parameter probably should be: const DD& a
- In DekkerDiv the first parameter probably should be: const DD& a
- The definition of operator
@oresteszoupanos 9 місяців тому
Some of the fancy fonts in this video are creely... Dekkerdent.
@djyotta 9 місяців тому ⁺¹
Do modern languages such as Java and C# do arithmetic on doubles as double double then cast back to double?
It seems like without casting to double double first, code like...
double a = ((b*c)+d-e)/f
... will produce different results depending on the order of b,c and d,e - even though order shouldn't matter mathematically.
For integers, I've always used the rule of thumb, "rewrite equation to multiply and add before doing division" to maximise precision when casting back to int.
Thankfully, I never bothered to apply this style to floats knowing that they work completely differently... But now wonder if rule of thumb is potentially dangerous (if you don't know if you have an int or float *cough* python)... given that some weird stuff happens when adding...
Makes me afraid to use floats at all given you don't really have any idea how precise the result is... I was stung by this recently in C# dealing with time deltas on the nanosecond scale. I didn't need that much precision so just worked around by considering only milliseconds... but was not happy that some unknown number of lowest bits are basically garbage...
9 місяців тому
If you are using Python and want precision, you can consider using rational numbers. They are built into the standard library.
@7KeHek 9 місяців тому
Speaking seriously, long double == double in Visual Studio. So if you want to raise accuracy, you should use Dekker way or make asm code to access internal 80bit format of floats.
@7KeHek 9 місяців тому ⁺¹
Double Dekker sounds like a bus from London.
@KrakonosovoBabka 5 місяців тому
Hello, I am sorry this doesn't have anything to do with this video, but do you consider making your merch avalible in europe? I want to buy some stuff from you for long time, but it always said they don't deliver to my country.
@zxuiji 9 місяців тому
There's one thing I have to question, what's the value of these methods for floating point numbers? From what I remember multiplication & division is achieved by just doing it on the mantissa's as normal integer multiplication/division and adding/subtracting the exponents which is surely faster than any of this stuff.
@SiyuJiang 9 місяців тому
If your machine only handles 64-bit precision, say, then how are you going to do a 128-bit multiplication? These methods enable you to do FP arithmetic at higher precision than the machine precision.
@zxuiji 9 місяців тому
@@SiyuJiang My response hasn't changed, you just transplant the mantissa to a bigger integer - possibly emulated which is easier than emulating floats - and tack your exponent on the front of that in a struct. Since the exponent of 64bit floats is what, 10bits? That'll fit into a proper 32bit integer just fine, bias 'n' all, and have plenty of room for the sign bit of the emulated float. Standard float binary math applies at that point which is mostly easy when you don't need to mask out the exponent and sign bit prior to working with the mantissa.
@SiyuJiang 9 місяців тому
@@zxuiji True, you could do something like that, but practically speaking it might not be that efficient. Even though it's generally true that floating-point ops are slower compared to integer ops, using the FPU, which only supports certain FP types, to perform operations is (probably) a lot faster than emulation of a wider FP format even when you do consider the additional FP operations need to implement Dekker's doublewidth algorithms. (At a certain point though, I do think emulating the FP operations through the integer pipeline does become better than applying doublewidth multiple times.)
@michaeldeakin9492 9 місяців тому
@@zxuiji TLDR; just increasing the mantissa size won't always help you if you have catastrophic cancellation from subtraction, this type of representation does.
Some algorithms depend on the sign of floating point computations. If those signs are computed incorrectly or at least inconsistently, you can come up with pathological cases which prevent some algorithms from terminating. Delaunay triangulation is one algorithm that has this problem. Extending the mantissa doesn't solve this problem on its own due to the range of absolute precision FP can represent, to make it work you also need to limit the precision of your inputs, which may or may not be reasonable.
By arbitrarily increasing the mantissa size with additional floats, this method allows you to get a result with the exact sign without limiting the precision of inputs. This still isn't fast, cases where it's required increase the latency by an over order of magnitude; but you can also filter out the cases that need it.
@zxuiji 9 місяців тому
@@SiyuJiang Well they COULD be made equally fast if instead of letting the exponent represent where the dot is, instead let it represent how far away from the dot the smallest unit is, then the mantissa could remain a normal integer and be passed through simple integer operations that use double the width of the mantissa (for detecting shifting up/down from smallest unit size), it would also mean that 0.1 etc could be represented exactly instead of treating the last bit as a rounding bit
@suncrafterspielt9479 9 місяців тому ⁺¹
for the algorithm
@TheRedPython 9 місяців тому
Can you double a double-double?
@WhatsACreel 9 місяців тому ⁺¹
You certainly can! That's called Quad-Double :)
@richardfarmbrough2335 9 місяців тому
Dekker division answer should have been 6.162. So something that was discarded should have been kept, unless this is acceptable for your use case.
@typecasto 9 місяців тому ⁺¹
The title has a minor typo, "is" vs "it"
@WhatsACreel 9 місяців тому ⁺¹
Thanks for the heads up! I've fixed it now :)
@Neuroaktivo 9 місяців тому ⁺¹
hello world!
@WhatsACreel 9 місяців тому
Gosh, you're quick!! Thanks for watching :)
@andersjjensen 9 місяців тому ⁺¹
Segmentation fault.
@felchs 9 місяців тому
Are there jobs for assembly language yet?
@prezadent1 9 місяців тому
You have no amounts on your patreon page.
@susanmueller1819 7 місяців тому
Promo-SM 🎶
@davidhand9721 9 місяців тому
In science, we have a concept called significant figures, or sig figs. This is the number of digits that a number intrinsically has. Translated into binary, this would be significant bits. When you have some source of data, it reports its number to some fixed precision. That means that the actual value is the reported value + error < sig figs. When you multiply two measurements, you can only keep the upper sig figs digits. The lower digits have a significant contribution from the error term, so including them in your result is a lie.
What Dekker is doing here is pretty much _exactly_ ignoring the concept of significant figures. The number in your variable came from somewhere in the real world. Unless the device that produces that number has more precision than a double precision number, the number contains bits that did not come from the measurement itself, but from some combination of device error and conversion to floating point. Even if every bit of the double is significant, multiplying them results only in the same number of sig figs. In other words, the lower half of the double double returned from mul12 is almost entirely error and artifact.
Please do not use this concept in scientific applications. It might be okay to use as an intermediate calculation, but it really might not depending on the other operations in the calculation, because every one of them has an effect on the result's sig figs, and multiplication/division is the only operation that simply conserves them. Addition/subtraction, for example, can lose a whole lot of sig figs depending on the scale of the operands. If you don't understand these rules and can't track the changes as they occur, just don't use double doubles if there are numbers from the real world anywhere in your code. You might think you're helping make things more accurate, but you are most certainly not. You're just making it impossible to tell _how_ accurate scientific results are, and that's no good to anyone.
@theexplosionist2019 9 місяців тому
1.0 / 3.0 does not give full precision.
The lower value is supposed to be zero for division and multiplication? No good.
@AK-vx4dy 9 місяців тому
I hope you not called out any mathematical demon 😅

Наступне

Автоматичне відтворення

Double it Like Dekker: A Remarkable Technique to Double Floating Point Precision (Part 1)