bro u are a life saver for real. Man i love u, thank you so much you saved my life, your explanation was perfect man, i get it all of this. Thanks so much again!!!
So what dithering is actually doing is cleaning up your signal to noise ratio. When you have a digitally generated sine wave, it's theoretically just about perfect, and so your noise floor is also just about perfect (near 0 signal). The issue is that when you reduce the bit depth of the digitally generated signal, those artifacts are ALSO above your "noise floor". So, even though they're "noise", their signal is high enough to be distinctly audible (and annoying to some of us). So, what dithering is doing, as you said Mr. Bill, is adding in noise... but what you're "accomplishing" by adding it is raising the noise floor in your signal to be at least "equal" to (you actually want slightly greater than) the intensity of your quantization error signals, thereby burying the artifacts. I'm actually a scientist and music is my hobby, we techniques like this all the time in scientific instrumentation, it's interesting to see how it's used in digital music processing. Edit: Mr. Bill basically says exactly this (very succinctly) at like 15:55.
For easier understanding I think about these terms like this: Sample Rate = Number of samples per unit of time (usually seconds). Bit Depth = Amount of data per sample. Bit Rate = Amount of data conveyed per unit of time (usually seconds).
I usually teach it as bit depth being the points of volume "resolution", similar to common knobs having 128 points of resolution or options to choose from.
@X-TV Almost true, with N bits you have 2^N values (levels) to approximate amplitude, so it's a lot more levels than just 16/24/32, but yeah I agree that thinking in terms of number of possible values is more intuitive
Dithering is only really super important when it comes to mastering, especially very quiet music such as classical or vocals. Bit depth is the Value of points on a "Vertical scale" on which your waveform sits. -0DB would be the top, but the bottom depends on how big the number is (16 bit is exponentially smaller than 24, which is exponentially smaller than 32, etc.). The bigger the number, the further "down" the noise is.. Just like the division of an odd number, the result of bit depth reduction is a fraction, which induces the artifacts that you were mentioning. Dither is a way of "dividing and rounding off the numbers" so to speak, to smooth off the artifacts. the artifacts/noise is the bottom of the signal basically. Dither finds a way to make the top of the signal stay as clean as possible whilst reducing the "Height" of the signal, bringing it closer to the bottom. The Dynamic meter you are using automatically centers the waveform, adjusting the values instead of the top/bottom of the signal so the example rendering of the rendering is kind of confusing distracting from the true point. The Gist is that for the purposes of working in a DAW, you don't need it. The final output media should be the target bit depth based on the media. Generally, it only matters as you reduce depth, not increase (increasing bit depth is basically like multiplying an odd number by 2, making the result an even number for example, so dither is not important, the overall signal to noise ratio is preserved at the new, upscaled bit depth) Hope that helps others understand it.
Great explainer! This really helped me understand what the purpose of dithering is. I do a lot of photography and the way the 32 bit file can be clipped but still retain its detail at lower volume reminded me of how newer cameras are able to take photos that look like the highlights and shadows are blown out to white and black, but then if you go into a photo editor you can darken the brights and lighten the darks and end up with a normal looking photo. The files are able to retain all that original information even if you can't see/hear it initially. The information is there, you just have to pull it out. It's pretty amazing really.
Nice one. I always figured dithering was masking artifacts caused by changing bit depths, like when converting a wav to mp3 or something, if converting from 24 to 16bit. Thanks for the visuals. They give a pretty clear understanding of whats going on.
3:55 very good illustration i'm not sure if I ever really need this but its nice to know this tutorial mainly has greatly understandable explanations, so you did a great job Bill gg!! thx dooood
Brooo the amount of times I've had noise coming from a bus and not understood why, i now realise is just a plugin with dithering that has obviously been compressed and had the output gain upped. Thanks Bill!
For some clarification: 16 bit integer means a number that's stored as 16 bits (bit is a 0 or 1 value). It can represent 2^16 different numbers which is 65,536. We have our 65,536 numbers to represent our amplitude. We want a -1 to 1 range so we map it to a range of -32,768 to 32,767. Now you have a sine wave which at some point in time has 0.816 of amplitude. That would map to 26737.872 on the 16 bit integer scale. And now the core of the issue: When rendering your audio to 16 bit file you have to round the value to the closest integer (26737.872 => 26738). That's the mentioned quantization, this is what causes stepping in the waveform and that's what ultimately introduces the harmonics to your audio. With 24 bit integers you have 16777215 possible values. More possible values, smaller steps, smoother audio wave, better sound. With 32 bits per number you can pretty accurately represent a floating point number. There's still some rounding happening but the steps are so small that they don't make any audible difference or more likely your speakers smoothen the steps by the sheer law of physics (momentum).
I do intend to watch all of this, soon. But in just the first couple of minutes, you cleared up my confusion with, "It's good noise to mask bad noise". Perfect...Understood. Now I might actually start to use it :) Thanks!
You pretty much got it bang on, words are just related to the size of the data chunks with 32-bit words being just that, each chunk of data being 32 bits (i.e. 32 binary digits) long. This is governed by the size of data chunks that the application is expecting when receiving data to process. When referring to the bit-width of the CPU, this governs what size data the CPU can accept and affects how the processor divides up (or addresses) the RAM so it knows the label of the box in which it is stores things that it will need to access in the course of running a program. The biggest number you can represent in 32 bits is 4,294,967,295 or (2^32)-1 which is 4GB and is the reason that 32-bit operating systems could only have 4GB of RAM as that was the largest memory address able to be stored by a 32-bit word and all the CPU could accept. Processors based on the x86 instruction set which is what your CPU (and the vast majority of PC CPUs) will be using is based on the Intel 8086 originally had a bit-width of 16-bit. Thus a "Word" in the x86 world refers to 16 bits of data. The architecture was extended to 32-bit with the Intel 80386, with the word size being referred to as a "Double Word" (DWORD) and later, to 64-bit with the x86-64/AMD64 standard utilising "Quad Words" (QWORD). Its worth noting that many years ago, RAM was sold as being able to hold an amount of words for the relevant CPU architecture rather than an amounts of data. This was before a lot of the standardisation was done and you had to buy specific RAM for your specific CPU. I have generalised a bit here and really have only scratched the surface but that's basic layman gist of it. I know its not overly relevant to the content of the video but its some interesting information for the technically inclined none-the less. Awesome video dude!
around 4:20, when you introduce the noise, when looking at a spectrogram, the harmonic distortion actually does go away, like it's not just psychoacoustic
Ad spectrum analyzer at around 18:00. The analyzer just shows the magnitudes for each measured frequency of the signal. The fader probably takes all of the signal into account, essentially using peak volume of the whole thing.
From what I understand (or have assumed) dithering is more of a modulation rather than just added noise or masking. If I remember correctly the noise is used to shift around phasing to make those unwanted harmonics sound like noise rather than a full spectrum of noise being layered in.
What I understand of 32-float specifically is that it uses exponents to describe a bit depth that wouldn’t otherwise be possible with fixed point (16 & 24bit), thus a higher decibel level. It still has to leave your converter under 0dbfs but the math is what allows you to recapture all the dynamics of the audio. That’s what you saw with your 32bit bounce.
Awesome tutorial. Loved the dithering and bit depth render examples. Bit depth gives you a bigger dynamic range because there is more room (in bits) to store amplitude information. In the extreme case, if you had 1-bit bit depth the amplitude of each sample would have to be stored as either a 0 or 1. Thus, a resolution of 2 (2^number_of_bits) possible amplitude values per sample. High bit depth = more memory space to store sample amplitude = higher resolution of amplitude information = larger dynamic range and sample precision
Kind of cool. Sort of opens up a question about how that noise actually masks the quantization artifacts. I may sound like an idiot here cuz this is beyond my understanding, but it would seem to me that the masking is perceptual, but in a remarkable way that is analogous to a lot of interesting stuff. It seems to me that it is physical, since wave phenomena HAS to be physical, overtaking our threshold for perception of the reproduction of the acoustic signal from the digital domain back out to the analog domain. Because of the noise, those low bitrate artifacts no longer have time to be noticed. You could break down with FFT that the artifacts are present, but they're effectively hidden not just like they were orcs hiding in the forest. Perceptually they are no longer visible on the analyzer waveform - based on its rate specs (I think DMG Equilibrium & some Meldaproduction plugins have settings to speed up this rate) and our computer display's frames per second specification. Because the granularity in which they exist, duration-wise, is insufficient for them to stand out as signals. Their existence is completely overshadowed. In this case they were like giant orcs, and they were shrunk down by the noise. I'm over my head but that seems like a deep topic. Next you can explain how double slit experiments work.
Bit rate is data rate/bandwidth. Like a 320kbps bitrate mp3. And the word you were looking for may be recalculate, as you need to re-quantize from 32bit float to 16bit when exporting again
floating-point numbers are stored in a scientific notation-ish form so even though [1 < sample < -1] it will sound normal when you turn the gain down. a sample over 1 or lower than -1 is clipping, but floats don't lose the actual level in relation to the track. (e.g. a sample is 1.1 and i want to turn it down by 10%, 1.1 * 0.9 = 0.99). integer data types are fully quantized numbers. there are a finite amount of numbers possible, and they are not stored with decimal points. for example 24 bits means you have 2^24 possibilities for the number (16777216). if you want positive and negative you divide that number by 2, and if you want a zero you have to use 9/10's complement... which is what i believe PCM is formatted that way
This was a very informative, well thought out video. I like that you prepared the entire thing before you started Edit: Damn. Was wondering why Serato wouldn't play a bunch of my music. I'll go change that now.
thanks for making this. good experiments! I've read that the dither is mostly important when taking your master file and encoding it down to mp3. The change in bit depth (distortion) is noticeable AFTER encoding the wav -> mp3 (introducing more distortion), and it's possible to make a more clean sounding mp3 from a wav that has a proper dither.
I remember having trouble with exporting my track in Renoise, since dithering was ruining some cool aliasing effects produced by the synth as is. and if I'm not mistaken these were the days when you could choose bit depth of DAW engine and as far as I can remember that was the case for Ableton as well. Thus I never used dithering since, so it doesn't kill any intended aliasing. But then again perhaps it could add unintended aliasing as well, but that should be obvious when you listen to result.
Someone just hipped me to your tutorials. I dig what you're doing and how you present it. I'll be signing up for the paid version soon. *edit - I missed you mentioning some of this Bit Depth essentially represents exponentials of 2: as in 2 to the (whatever number 8, 16, 24, 32) power, so with 8-Bit = 256 (steps); 16 Bit = 65536 (steps); 24 Bit = 16.7 million (steps); 32 Bit = 4.3 billion (steps) More steps, more possible volumes --- (not sure if volume is correct term) 0 (complete silence) to the max steps (from 256 with 8-bit to 4.3 billion with 32-bit) is the dynamic range, essentially the resolution of volume. Higher bit depth allows for more headroom and less likelihood for clipping --- (Similar to how higher sampling rate allows for more samples over time which produces a higher resolution of frequencies. Also, higher bit depth in images make them look more realistic and less blocky)
Little bit of context on 'Psycho-acoustically Optimized Word-length reduction': In computing science a 'word' often refers to a certain length of bits. For instance a protocol in a system could have a word-length of 32 bits. This would mean that it will mostly work with 32 bit input and/or output. In this case if you are reducing the word-length that basically means reducing the bit-depth. So essentially POW-r means it's producing noise that is optimized for dithering/masking quantization errors.
13:20 this is because each bit depth is changing your dynamic range. Each bit is 6db. So 16bit gives you 96db. 24bit gives you 144db and 32 gives you 196db. Interesting that there wasn't must change between 16 and 24 bit though. Also this dither can maintain your dynamic range when downsampling. When going to 16bit depth you have 96db dynamic range and either dither you can have a theoretical 120db dynamic range. Yay white noise!
Dithering helps most when you have to go to a very low kbps mp3. Ozone has some nice tools to preview your audio in various codecs with and without dithering.
And for you Mr. Bill: Sample rate: Number of samples per second Sample/bit depth: How single sample is represented. How many bits is used to store one sample. Bit rate: Sample rate * bit depth. Basically bits per second. 32bit audio will have 2x bit rate than 16bit audio which results in 2x size of the complete file.
For all intensive purpouses should we render tracks at 16 or 32? You breifly touched on a point that 32 may not be the best for CDjs, just wondering what works best for most most usecases. Thanks Mr.Bill !
Thanx a lot man! There IS a lot of info on the topic but honestly I haven't run in to any1 who really knows what it is and what to do with is since 2001 when I read the book about Reason 1.0. So, we have to put the Bitter VST on EVERY VST-synthesizer/sampler we use because if any of them operates in anything but 32Bit that means it's already degrading bit-depth and possibly not using dithering algorithm. Is that right? Because look: imagine you are producing in Ableton, then exporting STEMS to...Logic. so each track which has VST operating at 16bit will already has that quantization-noise, so then it accumulates & if I have 40 tracks the noise will be 40 times louder then it was initially. OK, now let's say our VSTs operate in 24bits, it still means that each of them will have noise which then will be accumulated. Now, let's suppose that we got no VSTs involved but standard Ableton's synths (no samples), so all 32 bits but we ARE going to export that to Logic or ProTools because we want to mix it there for whatever reason and we Do export STEMS - each track individually. Let's say our Logic/ProTools can't import 32bit files and we have to render it at 24 so then it means that even without any 3rd party plugins we have to use dither while exporting otherwise we'll have the quantization noises accumulated. Why I'm asking that is because there's a thing which I don't know if you've mentioned but dithering twice is not better then adding quantization-noises twice - dithering adds noise (well psycho-acoustically pleasing but still not an audio-drug right?), so since in example I'm using we are going to dither twice (exporting from Live 1st and then the final Mix/Master from the 2nd DAW where we are going to be mixing), so I think there should be very good reason for us to over-dither. The other question is - are you sure that we don't need to use dithering if we already have dithering plugin on the master-chain? I mean...You know what I'm saying :) Xenon, FG-X - there are a lot of plugins out there for dithering/noise shaping. Then almost the last one - are you sure there's no point to upload 24Bit files to SoundCloud, BandCamp etc? How about using 24Bit files for a music-video? What about digital distribution - if you'll upload 24Bit (as they request) they are still going to degrade it supposedly using dithering as well? It all looks pretty horrible since we all know platforms like Spotify have 128kb quality (or at least they had) which means it's like very very low quality mp3. I mean with all of it being said it looks like ppl are dithering on and on many times during the process from production to the final release. And again over-dithering it something what no1 says is good, so... you feel me? And now very last question - what is the Bit Depth of mp3? Any suggestions I dont think it's nerds stuff, cuz remember how EDM becomes popular and many people complain about how it all sounds on the High-Frequencies. What if what we are dealing with here is 1000times multiplied quantization noises in majority of music simply caused by lack of knowledge on the topic. What does it do then if you are listening to it on 16 KiloWatts sound-system for hours? What if it's causing brain-damage? I'm over-saturating just to emphasize what it all is so important. Add the whole mastering dilemma on top of it ("loudness wars" so called) and then all of it starts to look very important. Remember how ppl use to sample mp3, the upload it online, then other ppl were sampling it again and uploading again? What was wrong with that that there was no way to track how many times certain audio signal was previously degraded. I mean, just listen to Scrillex ) Will appreciate any answers or suggestions, cuz this thing bothers me :) THANX FOR THE VIDEO! MUCH LOVE p.s.BTW 17:34 isn't it because that chart shows mono and you got stereo? Maybe the chart shows stereo and you got mono, that's why your gain is x2 or 1/2 of what the chart says? Blind guess)))
Mate... I was at the dentist a few years ago get a tooth drilled. I hate jabs, and I love numbing gel. This time, the dentist didnt put numbing gel on the place he was planning to jab my jaw. I was quietly peaking coz I was anticipating the pain. But just before he jabbed me, he grabbed my lip and gently squeezed it, then jabbed. Because I was preoccupied with the squeeze on my lip, I felt ZERO pain with the jab. This seems similar, but musically, durrr
so, from my understanding, because the 32 bit render holds more information and keeps the quality of the track, you should always bounce at that bit depth before you master...?
i have a question. if my project is peaking/clipping and i render it out as a 16 bit wav file, then i import that file into a fresh project - why is that file causing the mixer to clip again? surely the clipping has been baked into the wav form and signals above 0db cannot exist. halp. is some part of the software lying to me? like the discrepancy between the spectrum and the fader at 17:28 of this vid. or is it something to do with the bit depth? like how 32 bits seem to have more headroom??
that's why I do -0.2 dB and true peak mode on limiter. Sometimes even might need -0.5 dB.. Limiter with oversapling could help too (I use Izotope, but FabFilter have x32 oversampling), and putting in look-ahead mode.
2 questions: 1) Should we use dithering at all when rendering or leave that to the mastering (engineer) ? 2) If using dither, which option to use for which scenario ? Thanks !
For bitdepth, in Live, the 32bit export option uses floating point numbers, where-as the other two options use integers (whole numbers). For a 16bit integer, each sample of audio would be a number between -32768 to 32767. However, as soon as you go to floating point the range is typically -1.0 to 1.0 (which you would be 0 dB in volume). However, floating point gives us more flexibility and we can go outside this range without too many problems. That's why your 32bit export didn't actually clip, and why you can go over 0 dB in tracks and other internal routings within Live. You can still run into problems if you push it to the extreme, but it just behaves differently compared to working with integers. (If the 32bit export option was integers it would have also clipped). As for dither, some extra analysis by some guy: innerportalstudio.com/new-dither-examples/ - I also heard that you should choose Triangular if the output may end up being resampled/reprocessed in the future, otherwise one of the pow-r modes for a final master (pow-r1 probably).
Great video. The only comment I'd offer is that dither doesn't mask quantization noise, it actually removes it. This video ( ua-cam.com/video/cIQ9IXSUzuM/v-deo.htmlm36s ) does a great job at explaining and demonstrating why.
This is fascinating, but why would we care about any of it if, as he says in the video, this is all occurring far below the range of human hearing? What is the "just in case" scenario we're preparing for by having that tiny bit of extra headroom?
Hey @@joshhanselman7618, it's a hard question to answer in a UA-cam comment. My comment was specifically regarding dither, but you mention headroom, so I'm not exactly sure what you're asking about. Generally though, with respect to dither, distortion compounds. In other words, if a signal accumulates some distortion, and then that distorted signal accumulates some more distortion, all the distortion products from the first pass are also getting distorted-the distortion gets distorted. In this manner, it can quickly get into the realm of audibility. The same goes for aliasing, or any non-linearity really. So, when there's a right way to do something-dither, oversampling, etc.-it really pays to just get it right so that you're giving everything down the line the best possible chance. Happy to expand on any of that if you have specific questions.
@@FlotownMastering Gotcha, thanks, this makes sense. I could swear he mentions headroom in the video, but maybe I just assumed that if you're removing noise you're giving yourself more headroom. But your point is well taken with compounding distortion.
So if I am just bouncing drum samples I made out of Ableton to a folder to be used in other projects, am I better of selecting a 32 bit bounce so that there no bit reduction artifacts or dithering noise at all?
Thanks for this explanation Mr. Bill. Im curious what happens when you play a 32 bit mp3 file on a 24 bit sound card. Would there be a difference if you played a 32 bit file or a 24 bit file? I guess my question is if soundcards automatically adjust bit rates, and if so, does it change the quality of the audio?
Is there any issue using 24bit samples (say drum samples you may have purchased online) in ableton 10?? I understand ableton converts samples that are edited to 32bit automatically like you demonstrated so I'm wondering if that process would lower the quality of my song. Hope my question makes sense.
works exactly the same in images, 8bit jpg and 32bit exr. 0 is black and 1 is white. with 8 bit you have a limited amount of "steps" you can have between 0 and 1 and I believe data is clipped below 0 and above 1. with 32bit you have a lot more range between 0 and 1 and data is not clipped.
bit rate is number of bits per second... so sample rate * bit depth, then divide out 1000 and thats kbps. if you divide 8 out of that you get kiloBYTES per second
okay so when exporting should you go 16 or 32 (not sending to an engineer). i know it all depends on what you’re making but is there really a purpose in rendering at 32 if you just upload to sites that convert to 16 bit, such as youtube, soundcloud, instagram etc.
I'm going to simplify it for non-math people. Please forgive any absolute inaccuracies. Applying DSP: EQ, compression, gain changes, etc. requires calculations to be made. often, these calculations create large values or large fractional values. The more detail in a number, the more space it takes to store it. The more space you allocate, the more detail you get to keep. More often than not, the detail of the calculation exceeds what 16, 24, or 32 bits can handle. The computer has to round off the last few bits of detail to get it to fit. This creates a quantization "error." The computer rounds up or down to the "least significant bit" or the smallest amount of detail it can store. Every time you run the signal through more DSP, this error grows. A single plug-in can easily have multiple stages of DSP itself. Then you stack 10 of them together. A 16 bit signal is going to break down pretty quick. A 32 bit signal is going to stay accurate a whole lot longer in the signal chain. Okay... light math analogy: I have a part I sell for 25 cents. $0.25. But I have to charge you 7% tax on that. What's the total price? $0.2675. How many pennies do you give me? 26 and 3/4 of one using a pair of tin snips? No, we just round it up to $0.27 and you give me my 27 cents. If I bought 100 of these parts in 100 individual transactions, I would have spent $27.00. If I paid for 100 all at once, I'd have spent $26.75. There's $0.25 worth of error in $27.00 worth of purchases depending on how you buy something due to truncation errors in a currency system. But wait a second. If I FLOAT the $0.2675 price and multiply by 100, I get the same price as adding up the cost of 100 units ($25) and then just taxing that. Floating means fewer errors. Fewer errors means higher quality.
Take home message: higher bit-depth = lower noise floor thus greater dynamic range, and digital is awesome {^_^} P.S. Bill, did you take Ivan Zawada's sound recording course?
For music distribution 16-bit / 44.1kHz - for sample distribution 24-bit / 44.1kHz (although after doing this video, I realise I should probably distribute samples at 32-bit - however, as far as I know, not all DAW's can accept 32-bit wav files)
Give this man his Grammy
Give my dad grammy now.
17:30 the operator's oscillator is at -12dB.
This was a great video. Very helpful. I understand now.
bro u are a life saver for real. Man i love u, thank you so much you saved my life, your explanation was perfect man, i get it all of this. Thanks so much again!!!
So what dithering is actually doing is cleaning up your signal to noise ratio.
When you have a digitally generated sine wave, it's theoretically just about perfect, and so your noise floor is also just about perfect (near 0 signal). The issue is that when you reduce the bit depth of the digitally generated signal, those artifacts are ALSO above your "noise floor". So, even though they're "noise", their signal is high enough to be distinctly audible (and annoying to some of us).
So, what dithering is doing, as you said Mr. Bill, is adding in noise... but what you're "accomplishing" by adding it is raising the noise floor in your signal to be at least "equal" to (you actually want slightly greater than) the intensity of your quantization error signals, thereby burying the artifacts.
I'm actually a scientist and music is my hobby, we techniques like this all the time in scientific instrumentation, it's interesting to see how it's used in digital music processing.
Edit: Mr. Bill basically says exactly this (very succinctly) at like 15:55.
For easier understanding I think about these terms like this:
Sample Rate = Number of samples per unit of time (usually seconds).
Bit Depth = Amount of data per sample.
Bit Rate = Amount of data conveyed per unit of time (usually seconds).
I usually teach it as bit depth being the points of volume "resolution", similar to common knobs having 128 points of resolution or options to choose from.
@X-TV Almost true, with N bits you have 2^N values (levels) to approximate amplitude, so it's a lot more levels than just 16/24/32, but yeah I agree that thinking in terms of number of possible values is more intuitive
Bit rate = Bit depth × sample rate.
Dithering is only really super important when it comes to mastering, especially very quiet music such as classical or vocals. Bit depth is the Value of points on a "Vertical scale" on which your waveform sits. -0DB would be the top, but the bottom depends on how big the number is (16 bit is exponentially smaller than 24, which is exponentially smaller than 32, etc.). The bigger the number, the further "down" the noise is.. Just like the division of an odd number, the result of bit depth reduction is a fraction, which induces the artifacts that you were mentioning. Dither is a way of "dividing and rounding off the numbers" so to speak, to smooth off the artifacts. the artifacts/noise is the bottom of the signal basically. Dither finds a way to make the top of the signal stay as clean as possible whilst reducing the "Height" of the signal, bringing it closer to the bottom. The Dynamic meter you are using automatically centers the waveform, adjusting the values instead of the top/bottom of the signal so the example rendering of the rendering is kind of confusing distracting from the true point. The Gist is that for the purposes of working in a DAW, you don't need it. The final output media should be the target bit depth based on the media. Generally, it only matters as you reduce depth, not increase (increasing bit depth is basically like multiplying an odd number by 2, making the result an even number for example, so dither is not important, the overall signal to noise ratio is preserved at the new, upscaled bit depth) Hope that helps others understand it.
You are just an outstanding educator.
He makes music too 😂
Great explainer! This really helped me understand what the purpose of dithering is. I do a lot of photography and the way the 32 bit file can be clipped but still retain its detail at lower volume reminded me of how newer cameras are able to take photos that look like the highlights and shadows are blown out to white and black, but then if you go into a photo editor you can darken the brights and lighten the darks and end up with a normal looking photo. The files are able to retain all that original information even if you can't see/hear it initially. The information is there, you just have to pull it out. It's pretty amazing really.
I recommend checking out what visual dithering looks like, it really helped me grasp it better!
awesome dude. great in-depth explanations. THANK YOU.
Nice one. I always figured dithering was masking artifacts caused by changing bit depths, like when converting a wav to mp3 or something, if converting from 24 to 16bit. Thanks for the visuals. They give a pretty clear understanding of whats going on.
3:55 very good illustration
i'm not sure if I ever really need this but its nice to know
this tutorial mainly has greatly understandable explanations, so you did a great job Bill gg!! thx dooood
Brooo the amount of times I've had noise coming from a bus and not understood why, i now realise is just a plugin with dithering that has obviously been compressed and had the output gain upped. Thanks Bill!
So stoked on all these in depth tutorials you've been putting out! thank you
This is by far the best video on dithering that I have come across. Great practical explanation. Thanks a lot!
For some clarification:
16 bit integer means a number that's stored as 16 bits (bit is a 0 or 1 value). It can represent 2^16 different numbers which is 65,536.
We have our 65,536 numbers to represent our amplitude. We want a -1 to 1 range so we map it to a range of -32,768 to 32,767.
Now you have a sine wave which at some point in time has 0.816 of amplitude. That would map to 26737.872 on the 16 bit integer scale.
And now the core of the issue: When rendering your audio to 16 bit file you have to round the value to the closest integer (26737.872 => 26738). That's the mentioned quantization, this is what causes stepping in the waveform and that's what ultimately introduces the harmonics to your audio.
With 24 bit integers you have 16777215 possible values. More possible values, smaller steps, smoother audio wave, better sound.
With 32 bits per number you can pretty accurately represent a floating point number. There's still some rounding happening but the steps are so small that they don't make any audible difference or more likely your speakers smoothen the steps by the sheer law of physics (momentum).
Very less boring than I learned in sound engineer class back in the days.
Great job.
I do intend to watch all of this, soon. But in just the first couple of minutes, you cleared up my confusion with, "It's good noise to mask bad noise". Perfect...Understood. Now I might actually start to use it :)
Thanks!
You pretty much got it bang on, words are just related to the size of the data chunks with 32-bit words being just that, each chunk of data being 32 bits (i.e. 32 binary digits) long. This is governed by the size of data chunks that the application is expecting when receiving data to process.
When referring to the bit-width of the CPU, this governs what size data the CPU can accept and affects how the processor divides up (or addresses) the RAM so it knows the label of the box in which it is stores things that it will need to access in the course of running a program. The biggest number you can represent in 32 bits is 4,294,967,295 or (2^32)-1 which is 4GB and is the reason that 32-bit operating systems could only have 4GB of RAM as that was the largest memory address able to be stored by a 32-bit word and all the CPU could accept.
Processors based on the x86 instruction set which is what your CPU (and the vast majority of PC CPUs) will be using is based on the Intel 8086 originally had a bit-width of 16-bit. Thus a "Word" in the x86 world refers to 16 bits of data. The architecture was extended to 32-bit with the Intel 80386, with the word size being referred to as a "Double Word" (DWORD) and later, to 64-bit with the x86-64/AMD64 standard utilising "Quad Words" (QWORD).
Its worth noting that many years ago, RAM was sold as being able to hold an amount of words for the relevant CPU architecture rather than an amounts of data. This was before a lot of the standardisation was done and you had to buy specific RAM for your specific CPU.
I have generalised a bit here and really have only scratched the surface but that's basic layman gist of it. I know its not overly relevant to the content of the video but its some interesting information for the technically inclined none-the less.
Awesome video dude!
@Mr.Bill you're always clearing stuff up for me, I appreciate all the information you post. Thank you!
around 4:20, when you introduce the noise, when looking at a spectrogram, the harmonic distortion actually does go away, like it's not just psychoacoustic
yes... I know some of these words.
Yeah the higher harmonics' amplitude on average is actually reducing, so not really just psychoacoustics.. Nice observation 👍
it psychovisually also, lol
Ad spectrum analyzer at around 18:00. The analyzer just shows the magnitudes for each measured frequency of the signal. The fader probably takes all of the signal into account, essentially using peak volume of the whole thing.
From what I understand (or have assumed) dithering is more of a modulation rather than just added noise or masking. If I remember correctly the noise is used to shift around phasing to make those unwanted harmonics sound like noise rather than a full spectrum of noise being layered in.
First video ever, 3:42 is where I subscribed in case you care about those kinda things.
What I understand of 32-float specifically is that it uses exponents to describe a bit depth that wouldn’t otherwise be possible with fixed point (16 & 24bit), thus a higher decibel level. It still has to leave your converter under 0dbfs but the math is what allows you to recapture all the dynamics of the audio. That’s what you saw with your 32bit bounce.
Awesome tutorial. Loved the dithering and bit depth render examples.
Bit depth gives you a bigger dynamic range because there is more room (in bits) to store amplitude information. In the extreme case, if you had 1-bit bit depth the amplitude of each sample would have to be stored as either a 0 or 1. Thus, a resolution of 2 (2^number_of_bits) possible amplitude values per sample. High bit depth = more memory space to store sample amplitude = higher resolution of amplitude information = larger dynamic range and sample precision
Kind of cool. Sort of opens up a question about how that noise actually masks the quantization artifacts. I may sound like an idiot here cuz this is beyond my understanding, but it would seem to me that the masking is perceptual, but in a remarkable way that is analogous to a lot of interesting stuff. It seems to me that it is physical, since wave phenomena HAS to be physical, overtaking our threshold for perception of the reproduction of the acoustic signal from the digital domain back out to the analog domain. Because of the noise, those low bitrate artifacts no longer have time to be noticed. You could break down with FFT that the artifacts are present, but they're effectively hidden not just like they were orcs hiding in the forest. Perceptually they are no longer visible on the analyzer waveform - based on its rate specs (I think DMG Equilibrium & some Meldaproduction plugins have settings to speed up this rate) and our computer display's frames per second specification. Because the granularity in which they exist, duration-wise, is insufficient for them to stand out as signals. Their existence is completely overshadowed. In this case they were like giant orcs, and they were shrunk down by the noise. I'm over my head but that seems like a deep topic. Next you can explain how double slit experiments work.
that 16, 24 and 32 bit conversion blew my mind
Bit rate is data rate/bandwidth. Like a 320kbps bitrate mp3. And the word you were looking for may be recalculate, as you need to re-quantize from 32bit float to 16bit when exporting again
floating-point numbers are stored in a scientific notation-ish form so even though [1 < sample < -1] it will sound normal when you turn the gain down. a sample over 1 or lower than -1 is clipping, but floats don't lose the actual level in relation to the track. (e.g. a sample is 1.1 and i want to turn it down by 10%, 1.1 * 0.9 = 0.99).
integer data types are fully quantized numbers. there are a finite amount of numbers possible, and they are not stored with decimal points. for example 24 bits means you have 2^24 possibilities for the number (16777216). if you want positive and negative you divide that number by 2, and if you want a zero you have to use 9/10's complement... which is what i believe PCM is formatted that way
So many people don't know about this stuff, glad you did a video on this dude
Thank you for this video, this is a hard topic to explain I think and you nailed it, now I understand what all of these things are for. Peace!
the spectrum shows -114 db bc the fader is just subtracting 69 db from whatever the initial signal is, which is not necessarily 0
loving reviewing you videos bill your very talented
This was a very informative, well thought out video. I like that you prepared the entire thing before you started
Edit: Damn. Was wondering why Serato wouldn't play a bunch of my music. I'll go change that now.
solid refresher thank you!
Thank you so so so much! I've been looking for such a simple and understandable video for a while
thanks for making this. good experiments!
I've read that the dither is mostly important when taking your master file and encoding it down to mp3. The change in bit depth (distortion) is noticeable AFTER encoding the wav -> mp3 (introducing more distortion), and it's possible to make a more clean sounding mp3 from a wav that has a proper dither.
Happy Birthday Mr.Bill! disabled my adblock just for this video! :)
This tutorial is gold. You did a great job on explaining it
Thx you really good explanation helped a lot
I remember having trouble with exporting my track in Renoise, since dithering was ruining some cool aliasing effects produced by the synth as is. and if I'm not mistaken these were the days when you could choose bit depth of DAW engine and as far as I can remember that was the case for Ableton as well. Thus I never used dithering since, so it doesn't kill any intended aliasing. But then again perhaps it could add unintended aliasing as well, but that should be obvious when you listen to result.
Happy Birthday! Very well explained video. Also, +1 for the Clyp shirt!
Awesome tutorial
Someone just hipped me to your tutorials. I dig what you're doing and how you present it. I'll be signing up for the paid version soon. *edit - I missed you mentioning some of this
Bit Depth essentially represents exponentials of 2: as in 2 to the (whatever number 8, 16, 24, 32) power, so with 8-Bit = 256 (steps); 16 Bit = 65536 (steps); 24 Bit = 16.7 million (steps); 32 Bit = 4.3 billion (steps)
More steps, more possible volumes --- (not sure if volume is correct term)
0 (complete silence) to the max steps (from 256 with 8-bit to 4.3 billion with 32-bit) is the dynamic range, essentially the resolution of volume. Higher bit depth allows for more headroom and less likelihood for clipping --- (Similar to how higher sampling rate allows for more samples over time which produces a higher resolution of frequencies. Also, higher bit depth in images make them look more realistic and less blocky)
Just what I was looking for. Tēnā rawa atu koe
loving your tutorials man. got a good way of explaining concepts :) keep it up
Little bit of context on 'Psycho-acoustically Optimized Word-length reduction':
In computing science a 'word' often refers to a certain length of bits. For instance a protocol in a system could have a word-length of 32 bits. This would mean that it will mostly work with 32 bit input and/or output. In this case if you are reducing the word-length that basically means reducing the bit-depth.
So essentially POW-r means it's producing noise that is optimized for dithering/masking quantization errors.
13:20 this is because each bit depth is changing your dynamic range. Each bit is 6db. So 16bit gives you 96db. 24bit gives you 144db and 32 gives you 196db. Interesting that there wasn't must change between 16 and 24 bit though. Also this dither can maintain your dynamic range when downsampling. When going to 16bit depth you have 96db dynamic range and either dither you can have a theoretical 120db dynamic range. Yay white noise!
great vid..thanks...always something new to learn.
Dithering helps most when you have to go to a very low kbps mp3. Ozone has some nice tools to preview your audio in various codecs with and without dithering.
great explanation!
modular synth ..... BILL BALLIN!!!
And for you Mr. Bill:
Sample rate: Number of samples per second
Sample/bit depth: How single sample is represented. How many bits is used to store one sample.
Bit rate: Sample rate * bit depth. Basically bits per second.
32bit audio will have 2x bit rate than 16bit audio which results in 2x size of the complete file.
nice job, love your tutorial, very very helpfull
For all intensive purpouses should we render tracks at 16 or 32? You breifly touched on a point that 32 may not be the best for CDjs, just wondering what works best for most most usecases. Thanks Mr.Bill !
Thanx a lot man! There IS a lot of info on the topic but honestly I haven't run in to any1 who really knows what it is and what to do with is since 2001 when I read the book about Reason 1.0. So, we have to put the Bitter VST on EVERY VST-synthesizer/sampler we use because if any of them operates in anything but 32Bit that means it's already degrading bit-depth and possibly not using dithering algorithm. Is that right? Because look: imagine you are producing in Ableton, then exporting STEMS to...Logic. so each track which has VST operating at 16bit will already has that quantization-noise, so then it accumulates & if I have 40 tracks the noise will be 40 times louder then it was initially. OK, now let's say our VSTs operate in 24bits, it still means that each of them will have noise which then will be accumulated. Now, let's suppose that we got no VSTs involved but standard Ableton's synths (no samples), so all 32 bits but we ARE going to export that to Logic or ProTools because we want to mix it there for whatever reason and we Do export STEMS - each track individually. Let's say our Logic/ProTools can't import 32bit files and we have to render it at 24 so then it means that even without any 3rd party plugins we have to use dither while exporting otherwise we'll have the quantization noises accumulated.
Why I'm asking that is because there's a thing which I don't know if you've mentioned but dithering twice is not better then adding quantization-noises twice - dithering adds noise (well psycho-acoustically pleasing but still not an audio-drug right?), so since in example I'm using we are going to dither twice (exporting from Live 1st and then the final Mix/Master from the 2nd DAW where we are going to be mixing), so I think there should be very good reason for us to over-dither.
The other question is - are you sure that we don't need to use dithering if we already have dithering plugin on the master-chain? I mean...You know what I'm saying :) Xenon, FG-X - there are a lot of plugins out there for dithering/noise shaping.
Then almost the last one - are you sure there's no point to upload 24Bit files to SoundCloud, BandCamp etc? How about using 24Bit files for a music-video? What about digital distribution - if you'll upload 24Bit (as they request) they are still going to degrade it supposedly using dithering as well? It all looks pretty horrible since we all know platforms like Spotify have 128kb quality (or at least they had) which means it's like very very low quality mp3. I mean with all of it being said it looks like ppl are dithering on and on many times during the process from production to the final release. And again over-dithering it something what no1 says is good, so... you feel me?
And now very last question - what is the Bit Depth of mp3? Any suggestions
I dont think it's nerds stuff, cuz remember how EDM becomes popular and many people complain about how it all sounds on the High-Frequencies. What if what we are dealing with here is 1000times multiplied quantization noises in majority of music simply caused by lack of knowledge on the topic. What does it do then if you are listening to it on 16 KiloWatts sound-system for hours? What if it's causing brain-damage? I'm over-saturating just to emphasize what it all is so important. Add the whole mastering dilemma on top of it ("loudness wars" so called) and then all of it starts to look very important. Remember how ppl use to sample mp3, the upload it online, then other ppl were sampling it again and uploading again? What was wrong with that that there was no way to track how many times certain audio signal was previously degraded. I mean, just listen to Scrillex )
Will appreciate any answers or suggestions, cuz this thing bothers me :)
THANX FOR THE VIDEO! MUCH LOVE
p.s.BTW 17:34 isn't it because that chart shows mono and you got stereo? Maybe the chart shows stereo and you got mono, that's why your gain is x2 or 1/2 of what the chart says? Blind guess)))
Mate... I was at the dentist a few years ago get a tooth drilled. I hate jabs, and I love numbing gel. This time, the dentist didnt put numbing gel on the place he was planning to jab my jaw. I was quietly peaking coz I was anticipating the pain. But just before he jabbed me, he grabbed my lip and gently squeezed it, then jabbed. Because I was preoccupied with the squeeze on my lip, I felt ZERO pain with the jab.
This seems similar, but musically, durrr
Very informative!
so, from my understanding, because the 32 bit render holds more information and keeps the quality of the track, you should always bounce at that bit depth before you master...?
i have a question. if my project is peaking/clipping and i render it out as a 16 bit wav file, then i import that file into a fresh project - why is that file causing the mixer to clip again? surely the clipping has been baked into the wav form and signals above 0db cannot exist. halp.
is some part of the software lying to me? like the discrepancy between the spectrum and the fader at 17:28 of this vid.
or is it something to do with the bit depth? like how 32 bits seem to have more headroom??
that's why I do -0.2 dB and true peak mode on limiter. Sometimes even might need -0.5 dB.. Limiter with oversapling could help too (I use Izotope, but FabFilter have x32 oversampling), and putting in look-ahead mode.
2:33... When are you gonna release that one?
2 questions: 1) Should we use dithering at all when rendering or leave that to the mastering (engineer) ? 2) If using dither, which option to use for which scenario ? Thanks !
For bitdepth, in Live, the 32bit export option uses floating point numbers, where-as the other two options use integers (whole numbers).
For a 16bit integer, each sample of audio would be a number between -32768 to 32767. However, as soon as you go to floating point the range is typically -1.0 to 1.0 (which you would be 0 dB in volume).
However, floating point gives us more flexibility and we can go outside this range without too many problems. That's why your 32bit export didn't actually clip, and why you can go over 0 dB in tracks and other internal routings within Live.
You can still run into problems if you push it to the extreme, but it just behaves differently compared to working with integers. (If the 32bit export option was integers it would have also clipped).
As for dither, some extra analysis by some guy: innerportalstudio.com/new-dither-examples/ - I also heard that you should choose Triangular if the output may end up being resampled/reprocessed in the future, otherwise one of the pow-r modes for a final master (pow-r1 probably).
Great video. The only comment I'd offer is that dither doesn't mask quantization noise, it actually removes it. This video ( ua-cam.com/video/cIQ9IXSUzuM/v-deo.htmlm36s ) does a great job at explaining and demonstrating why.
This answers the question I had, which was "why does the noise disappear from the spectroscope as well". Thanks for that!
everyone should watch this video.
But do you know where to find a video explaining the difference between the different types of POW?
This is fascinating, but why would we care about any of it if, as he says in the video, this is all occurring far below the range of human hearing? What is the "just in case" scenario we're preparing for by having that tiny bit of extra headroom?
Hey @@joshhanselman7618, it's a hard question to answer in a UA-cam comment. My comment was specifically regarding dither, but you mention headroom, so I'm not exactly sure what you're asking about. Generally though, with respect to dither, distortion compounds. In other words, if a signal accumulates some distortion, and then that distorted signal accumulates some more distortion, all the distortion products from the first pass are also getting distorted-the distortion gets distorted. In this manner, it can quickly get into the realm of audibility. The same goes for aliasing, or any non-linearity really. So, when there's a right way to do something-dither, oversampling, etc.-it really pays to just get it right so that you're giving everything down the line the best possible chance. Happy to expand on any of that if you have specific questions.
@@FlotownMastering Gotcha, thanks, this makes sense. I could swear he mentions headroom in the video, but maybe I just assumed that if you're removing noise you're giving yourself more headroom. But your point is well taken with compounding distortion.
Happy Birthday! :D
So if I am just bouncing drum samples I made out of Ableton to a folder to be used in other projects, am I better of selecting a 32 bit bounce so that there no bit reduction artifacts or dithering noise at all?
I want to why people apply multiple stages of EQ & compression, why and how it's done?
I would say though at 09:25 you would actually need to go to 32bit anyway, to make a gain reduction as small as -0.00001dB lol
Thanks for this explanation Mr. Bill. Im curious what happens when you play a 32 bit mp3 file on a 24 bit sound card. Would there be a difference if you played a 32 bit file or a 24 bit file? I guess my question is if soundcards automatically adjust bit rates, and if so, does it change the quality of the audio?
13:31 "oh, my dear lord. that's so loud." :D haha
Can you make a video (tour) of your eurorack?
Is there any issue using 24bit samples (say drum samples you may have purchased online) in ableton 10?? I understand ableton converts samples that are edited to 32bit automatically like you demonstrated so I'm wondering if that process would lower the quality of my song. Hope my question makes sense.
works exactly the same in images, 8bit jpg and 32bit exr. 0 is black and 1 is white. with 8 bit you have a limited amount of "steps" you can have between 0 and 1 and I believe data is clipped below 0 and above 1. with 32bit you have a lot more range between 0 and 1 and data is not clipped.
bit rate is number of bits per second... so sample rate * bit depth, then divide out 1000 and thats kbps. if you divide 8 out of that you get kiloBYTES per second
do you want to chose the dither sound you like or hate? 🤔
YEAhh .. the Grammys come to Mr bill!
Dror Levi U think that what he do is nothing?
the movie was not crap, and helps alot of ppl that wants to know about , resolution, and Dither ... anyway,,, give hin the fucking grammy :)
congratulation :) u are amazing
You can have my Grammy. But I've gotta warn you, she gets kind of cranky if you don't change her depends regularly.
Me: *realizes the joke*
Would the best dither option for exporting a single acoustic guitar and vocals be the powR1?
do an a/b test with a triangular and powr1 and please let me know if one sound better or different at all
Pe Co will do thanks guy
Loving the title :D
thanks for that!
okay so when exporting should you go 16 or 32 (not sending to an engineer). i know it all depends on what you’re making but is there really a purpose in rendering at 32 if you just upload to sites that convert to 16 bit, such as youtube, soundcloud, instagram etc.
Why do you have Reaper in your taskbar?
NinjaViking1337 for 3d graphics probably
Thank you.
Thank you ! :)
I'm going to simplify it for non-math people. Please forgive any absolute inaccuracies.
Applying DSP: EQ, compression, gain changes, etc. requires calculations to be made. often, these calculations create large values or large fractional values. The more detail in a number, the more space it takes to store it. The more space you allocate, the more detail you get to keep. More often than not, the detail of the calculation exceeds what 16, 24, or 32 bits can handle. The computer has to round off the last few bits of detail to get it to fit. This creates a quantization "error." The computer rounds up or down to the "least significant bit" or the smallest amount of detail it can store. Every time you run the signal through more DSP, this error grows. A single plug-in can easily have multiple stages of DSP itself. Then you stack 10 of them together. A 16 bit signal is going to break down pretty quick. A 32 bit signal is going to stay accurate a whole lot longer in the signal chain.
Okay... light math analogy: I have a part I sell for 25 cents. $0.25. But I have to charge you 7% tax on that. What's the total price? $0.2675. How many pennies do you give me? 26 and 3/4 of one using a pair of tin snips? No, we just round it up to $0.27 and you give me my 27 cents. If I bought 100 of these parts in 100 individual transactions, I would have spent $27.00. If I paid for 100 all at once, I'd have spent $26.75. There's $0.25 worth of error in $27.00 worth of purchases depending on how you buy something due to truncation errors in a currency system. But wait a second. If I FLOAT the $0.2675 price and multiply by 100, I get the same price as adding up the cost of 100 units ($25) and then just taxing that.
Floating means fewer errors. Fewer errors means higher quality.
Take home message: higher bit-depth = lower noise floor thus greater dynamic range, and digital is awesome {^_^}
P.S. Bill, did you take Ivan Zawada's sound recording course?
What is INTERNETTING?
Bit Rate is like the horizontal resolution of a file across time, whereas Bit Depth is the vertical resolution as I understand it.
Nah, horizontal is sample rate. Bit rate is sample rate * bit depth.
which one do you personally export at? 16/24/32?
For music distribution 16-bit / 44.1kHz - for sample distribution 24-bit / 44.1kHz (although after doing this video, I realise I should probably distribute samples at 32-bit - however, as far as I know, not all DAW's can accept 32-bit wav files)
Mr. Bill 16 BIT because the file is smaller, or is there more behind it ?
thnks
05:20 xd
18:25 "this dithering this is not that important"
What mic are you using though?
RE-20 maybe?
It’s inevitable mr bill anderson...
Give this man a grandmother asap
It's part of a classical composition. Dah diky dak diky dak dah tittiy. It actually continues.
Let's go.
Alon Mor - Dithering. Great tune
POWr 1 sounds the most flat but personally POWr 2 seems to be the most useful, as far as electrohouse type of music goes.