Floating Point Numbers - This is Where Things Get Weird!

Gary Explains

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 тра 2024
Following on from my video about Fixed Point Numbers, now it is time to look at Floating Point Numbers. On the way we will look at binary fractions and why in C you can't increment a float above 16777216! Strap in, things are about to get weird!
---
Let Me Explain T-shirt: teespring.com/gary-explains-l...
Twitter: / garyexplains
Instagram: / garyexplains
#garyexplains
Наука та технологія

КОМЕНТАРІ • 45

@TheEulerID 27 днів тому
For transactional financial calculations decimal fixed point representation is generally preferred because of things like rounding errors which accumulate in such things as running totals, and balances. Sometimes these can involve tens or hundreds of thousands of accumulations. That's one of the reasons that many older computers had decimal number representations, usually packed decimal, along with the instructions to support it. Typically those could support up to 31 decimal digits of precision.
Also, IEEE754 does allow for base 10 as well as base 2 exponents.
I'd also add that cumulative FP rounding errors can quickly become apparent, even in something like an Excel spreadsheet and you have to be very careful to avoid those sort of problems.
@gregholloway2656 Місяць тому
Great video, Gary. Glad you pointed out the classic 7 sig-fig rounding problem. One other snag, for programmers, is:
If (floata == floatb) // dangerous comparison
👍
@Chalisque Місяць тому ⁺¹
While floating point numbers are stored as binary fractions between 1 and 2 with an exponent, it's actually equivalent to an integer multiplied by a power of 2. For example 1.5 is equal to 3×2^-1, and 1.25 is equal to 5×2^-2 and so on. So for example if we have a binary fraction with 8 bits after the 'decimal point', together with an exponent, we can shift the mantissa to the left by 8 places and subtract 8 from the exponent to get the same number. (I find this picture especially useful in digital audio when considering the difference between 24bit integer and 32bit float PCM.) The advantage of the binary fraction approach is one extra bit of mantissa precision for the same total number of bits. The clever observation is that for binary numbers that are not identically zero, the leading digit is _always_ 1. Thus if the number is nonzero, you don't have to actually use a bit to store the leading digit. (So they use the entirely-zero bit pattern as a special case to store zero.)
@aleksandardjurovic9203 Місяць тому
This is great. Thank you!
@drfrancintosh Місяць тому ⁺¹
Another great explainer. I recently read there is a new format specifically for numbers between -1 and +1 - used in AI / ML. Would you be able to get into that? It's supposed to be an order of magnitude faster than IEEE-754
@lalmuanpuiamizo Місяць тому ⁺³
1:42 picture is not correct, it should be 32768, not 32786
@GaryExplains Місяць тому ⁺¹
A typo.
@stuartajc8141 Місяць тому ⁺¹
There is FP64 too, for big scary numbers (AKA Float64, or Double-Precision)
@GaryExplains Місяць тому ⁺¹
I thought I covered that in the video as long double?
@alatnet Місяць тому
@@GaryExplains you did, around 5:20.
FP64 seems to just be a 64-bit representation of a floating point number, exactly how you described it in the video being Double Precision.
@stuartajc8141 Місяць тому
@@GaryExplains Whoops, I missed that
@Chalisque Місяць тому
And FP128 and FP256, but these are not currently implemented in hardware. And if you really totally want to crazy, there's arbitrary precision done in software (e.g. via the gmp library) so you can have as many bits as memory permits. (Those deep mandebrot zooms make copious use of this.)
@thelastofthemartians Місяць тому
I've only ever used floating point numbers as a last resort (for the reasons you have pointed out). If, for example, you want your microcontroller to monitor room air temperature, then you can easily represent -50°C to +50°C in a 2 byte integer with 2 decimal precision. As a bonus, your program will be smaller and faster (as if anyone cares about that these days :D ) and will exhibit a lesser "astonishment factor".
@toby9999 Місяць тому
You could do that, but the floating point numbers as we have them implemented in most languages are more than accurate enough for storing temperatures. For instance, the "double" type in C++ is 8 bytes and is accurate to at least 14 digits for whole numbers, with a 52 bit mantisa. I work on commercial software where all numbers are represented internally by the 64bit variant of the floating point type "double" with absolutely no problems. The said, I'll never use a 32bit "float" for the reasons presented in this video.
@IvanToshkov Місяць тому
10:08 Don't use floating point for currency! It's not going to be OK. Used fixed points instead.
Floating point is developed so that it can efficiently store both very big and very small numbers. The precision goes down when you store bigger numbers. The errors can easily crop up even with smaller numbers when you do arithmetic operations with them. This is not acceptable for monetary computations.
@Eugensson Місяць тому
Would you cover Posit floating point format?
@GaryExplains Місяць тому
I think that might be a little too niche. Sorry.
@Apocalymon Місяць тому
@@GaryExplainsit definitely is not if my local meathead cobbler knows about it & he dislikes STEM subjects
@lale5767 Місяць тому
11:53
@roysigurdkarlsbakk3842 Місяць тому
There's FP4 too ;)
@Benjiq8787 Місяць тому
I want to hear you say “if you want to understand 3body problem, please let me explain “
@GaryExplains Місяць тому ⁺¹
That would be after my Nobel prize ceremony!
@Benjiq8787 Місяць тому
@@GaryExplains look forward to that ;)
@GaryExplains Місяць тому ⁺¹
😂
@ernstoud Місяць тому
Whereby 42 is an integer, namely that what you get if you multiply 6 by 9.
@GaryExplains Місяць тому
eh?
@ernstoud Місяць тому
@@GaryExplains Your thumbnail. And the answer to life, the universe and everything. Douglas Adams’ book Hitchiker’s Guide to the Galaxy.
@GaryExplains Місяць тому
I know what 42 means, I just didn't understand your comment.
@ernstoud Місяць тому
@@GaryExplainsHumor, always difficult.
@Apocalymon Місяць тому
@@ernstoud42 is too cliché now as a nerdy joke
@electrodacus Місяць тому
2^15 is 32768 not 32786 :)
@GaryExplains Місяць тому
Yeah, it is called a typo.
@bpark10001 Місяць тому
Your 15 column is on error. It should be 32768, not 32786. Floats are used WAY TOO MUCH.
Currency is NOT represented as floats & rounded. That's why there continues to be packed BCD representation in computers.
@GaryExplains Місяць тому
Thanks for spotting the typo, but several other people have spotted it as well. But it is just a typo and not really worth much fuss.
@CStoph1979 Місяць тому
Only the most astute fans know the true meaning behind the number 42.
yes it has one.
no, you can't look it up.
the answer is hilarious, and profound, and superbly sublimely simple, its amazing that it wasnt figured out decades ago.
no i cant tell you, yes i can be persuaded to give a hint if you'd like to figure it for yourself
@doomVoxel Місяць тому
what if the reason we can't unify quantum mechanics and relativity is because the further away something gets from the observer, the less the precision of quantization?
@Chalisque Місяць тому
If the universe is a giant computer simulation, then possibly. Alas such considerations are metaphysics and beyond experiment.
@lezbriddon Місяць тому
I propose dropping all this complication and we store everything in bcd, thats 2 digits per byte and storage is cheap now. you can even assign a byte to state length of bcd string and a 2nd byte to give length of anything fractional so thats 255 bytes or 510 digits then '.' 510 digits ...... would make for interesting routines for math...
@Kabodanki Місяць тому ⁺¹
Storage is cheap, we are not talking about ram or ssd, but CPU registers
@lezbriddon Місяць тому
@@Kabodanki irrelevant, because in BCD you only deal in pairs of digits for any math, just like with pencil and paper...... 5 bit accumulator can hold the result of 2 digits as the biggest result you will see is 9+9. Of course division is tricky but not that hard.
@toby9999 Місяць тому
@@lezbriddonIt's not irrevent if you value performance. A 64 bit floating point value can be held in a 64bit cpu register, whereas something huge like you describe cannot. And calculations would require software emulation... even slower. But if performance is not an issue, yes, that or one of many other approaches can be and are used. I do rember using BCD way back in the 70s because registers were only 8bit, abd 8bit was insiffient for any decent sized numeric representation.
@lezbriddon Місяць тому
@@toby9999 yup my method is simply taken from clown computing 101, but it works and unlike bitwise math, has infinite granularity, if you have infinite storage.....

Наступне

Автоматичне відтворення

Fixed Point Decimal Numbers - Including Fixed Point Arithmetic