John Hunter Good point, the series won’t last an infinite amount of time and they only have a finite amount of time they can actually film. My suggestion is handle the videos as a supertask with each video taking half the time to make and coming out twice as quickly as the last one. That way they could release an infinite series of videos using only a finite amount of material. :)
Good god. That is a brilliant idea to keep the show alive for infinite time. I love it. Only problem is the host and the technical staff need to be able to handle it (if people were able to do it then humanity would reach a whole new level).
Sripad Kowshik Subramanyam Reminds me of the old joke about the chemist, the engineer and the mathematician who go vacationing in a cabin in the woods. One night the chemist wakes up and sees a fire on the stove. He quickly looks around, finds some inert powder in the cupboard and dumps it on the fire to put it out. The next night the engineer comes in and sees another fire on the stove (none of them are apparently great with kitchen safety). The engineer thinks for a bit about how much water he’ll need to dowse it, goes and grabs their water canteens and pours out just enough of them to douse the flame. The third night the mathematician comes in and sees yet another fire on the stove. He quickly grabs his notepad and starts vigorously writing down equations. A few minutes later he smiles, puts the pad down on the table and says “A solution exists, QED,” and goes back to bed. 😄
Who says their need to be infinitely many *distinct* episodes? Let n be the number of the last episode. Then every episode e_k with k>n is defined to be equal to episode e_n. Voilà, infinite series. :^)
It is not that precise, it is stated in the video that radiation and things like that are random, while in reality they are not random, it is close to impossible to replicate and things like that, but it is not truly random. Maybe Veritasium or Vsauce had a video about true randomness where it is described a lot more precise. Even a service like random.org which collects electromagnetic background noise from antennas is not truly random, that noise has a source. I don't think true randomness exists, but we can get fairly close.
This is such a massively important topic. This issue alone has contributed to literally hundreds of thousands of dollars of my lifetime income. You've done a fantastic job of explaining a topic that, in interviews, I've found software engineers with over a decade of experience manage to get completely wrong.
@@konstantin7596 implementing telecom encryption from spec, writing and verifying secure code through static analysis, reverse engineering, and automated testing, that sort of thing.
I just put the numbers 0 - 9999 into a spreadsheet and ran the Middle Square Algorithm on them. A few curiosities arose, which signify reasons it's Bad News for purpose.... 1) There are 20 ways to produce a 0000 value. As 0000 has period 1, there are multiple ways to end up in this sinkhole. 2) Nearly 39% of the seed numbers can NEVER be produced this way. 3) A further 35% of sees numbers can only be produced by one other four-digit value 4) The number 5625 can be generated twelve ways. 2500 can be generated eleven ways (including itself). You might expect a number or two out of 10,000 seeds to be very popular, but knowing what they are means they're very predictable. In reality, you're not producing random numbers [0,9999] : you're producing numbers [0,6109] with a ton of spaces and biases scattered within....!
MS is an old algorithm, not the best one. But in April of 2017 people have implemented a Middle Square Weyl Sequence PRNG. It is a modified version of MS and instead of [0;9999] produces [0;2^32-1] possible values. It fixes alot of problems of MS, and passes every statistical test. Just take a modulo of the returned value (for example mod 16384), and check the distribution. It should be uniformal. Note: `s` is the seed. To initialise `s` take a random even number and add 0xb5ad4eceda1ce2a9 to it. en.wikipedia.org/wiki/Middle-square_method#Middle_Square_Weyl_Sequence_PRNG
Knuth (volume 2 of The Art of Computer Programming) has lots of analysis and practical algorithms for pseudo-random numbers. Simple recipe: pseudo-random numbers (one of his algorithms) followed by another neat procedure of Knuth's which is shuffling the numbers in an output buffer. Another book, "Numerical Recipes", refers to this. Numerical Recipes has references to Knuth. You can order the Knuth book and Numerical Recipes at any good bookshop. Beware: these are weighty books with a heavy price tag. Also look up "primitive polynomial modulo 2" which gives you a sequence which only repeats after a looooooooooong time. It's fairly easy to find tables of these, i.e. someone has calculated some of these "primitive polynomials modulo 2". (Knuth describes these.) Pick one which has a long repeat period, and the result is a trivially-simple recipe to make a stream of bits which repeats after an amazingly long time, i.e. longer than the age of the universe. Using this as the basis of a random-number generator gives a very fast and simple algorithm. Look at tests of random number sequences: if you think you have a random number generator, there are ways of testing it, so hook up your RNG (Random Number Generator) to one of these tests to see how it behaves. The web has lots of information on this. Wikipedia, for example, has descriptions of ways of testing random numbers. So do other mathematics-related web sites. The tests are generally easy to perform. I've also experimented with prime numbers. Find a sequence which can be quite simple, based on addition modulo the prime number, and (importantly) repeats with a period of your favorite prime number. Have several of these in parallel, each based on a different prime number, and the period of the resulting generator will be something like the product of all of the prime numbers. If you have 6 prime numbers which are of size about 1,000,000 then the composite behaviour will repeat with a period of about 1,000,000^6.
AFIK - The best random number generators are based on cryptology functions. Or, one could just as easily say that cryptology methods are based on random number generators, since the whole point of cryptology is to take a message and create a patternless sequence of numbers. This video barely scratched the surface of random number generators. I asked a predoctoral math teacher if she had considered random number research for her doctoral thesis. Her reply was that she should have. There was a lot of research opportunities available. And that was maybe 20 years ago, so more research must still be available.
Inverse transform sampling is great in theory; in fact, when I needed to generate numbers from a distribution a few years back, it was my first instinct despite not knowing what to call it at the time. Turns out, though, that - well, I'll let Wikipedia tell you: "For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions and other methods are preferred." And that's how I discovered the Box--Muller Transform. Marvelous bit of work that actually will give you a normal distribution.
It's true that a pseudo random generator must eventually repeat. However, it's not true that it must happen as soon as it hits the same output again. SRGs can have hidden state. Stuff inside the black box that isn't part of the output. That way they may effectively loop through all possible outputs not in just one way but, in fact, potentially in all possible orders. Similarly you can build them such that they occasionally spit out the same value twice in a row without falling into a loop where they spit out ONLY that one value. They have to repeat only once their entire state (hidden state plus output) is hit again. And that could take a LOT longer than hitting every possible output once.
adding irrational number (or in actual computer case, very very accurate decimal one) to the seed on every step will cause the cycle be so long the existence of humanity will come to end faster.
You're right, but that's just deferring the problem - if you had N non-hidden states and H hidden ones, then your period is going to be (at best) N*H, so you're essentially back to square one.
I don't really see a problem here... I just pointed out that a PRNG can through the same outputs a *lot* of times before starting to repeat. Of course if you want to go through all possible sequences you'll need to have H*N >= N! or, if you want to allow for arbitrary duplicates, it ought to be H*N >= N^N. That's quite a tall ask, usually. (that's just numerically. You'll also want to satisfy some properties to actually generate the desired sequences) But why stop there? What if, not only you want to run through all permutations, but you want to permute the order you run through the permutations? The numbers obviously become insanely huge long before we even get to that point. Too large to contain on a computer. (By N=59, N! exceeds the estimated number of atoms in the observable universe, N^N clears that by 48)
Ooh! That drifting numbers background really did a number on the video quality. No pun intended. Tom Scott did a video on the topic called "Why Snow and Confetti Ruin UA-cam Video Quality" (at least I think that was the title) which gives a basic explanation of the phenomenon if you're interested.
This was a great video, but i feel like the definition of randomness was a little vague. It would be great to see a video on Kolmogorov complexity to get a more in-depth insight on randomness. Im loving the computer science direction of this channel and i can't think of a better topic to cover!
I'm sure the definition of randomness was vague because the video focused chiefly on psuedorandomness. However, it's pretty difficult to define randomness. When I took mathematical statistics for my BS in mathematics, we focused on randomness being the tendency of the image of a random variable to resemble the composition of its PDF/PMFs as the number of samples grows arbitrarily large. For instance, the PMF of flipping a fair coin is Pr(X=x) = 1/2, x∈{Heads, Tails}, so if a flip of a fair coin truly is random, then flipping it an arbitrarily large number of times should give a 1/2 chance for each side. However, if there were another process involved, such as having to choose one of four coins with replacement each flip, where one coin is fair, two gives give heads 75% of the time and the last gives tails 75% of the time, then if the process truly is random, the outcome should resemble the composition of those functions. Additionally, as mentioned in the video, we should expect randomness to not exhibit any identifiable pattern. However, this isn't immediately disqualifying of something as being random. For instance, I could roll a d6 18 times and get 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6. This is extremely unlikely, but can happen under a truly random system. Ultimately, though, this pattern should not continue for arbitrarily large sequences. Also, again mentioned in the video, an important aspect of randomness is irreproducibility. You should expect no two random processes to give the exact same output, though if you are extremely unlucky (or extremely lucky!) you might have to observe for a bit before the outputs diverge. In this last season of Doctor Who, there was an episode where people are living in a simulated reality, and they learn of this because any group of people calling out random numbers at the same time always calls out the same sequence of numbers because, as explained in the episode, computers can't generate truly random numbers (quantum computers should allow true random number generation, but that's beside the point). Ultimately, though, randomness by its very nature is difficult to precisely define. It is easier to identify what *is not* random and trim it away, like an asymptotic game of mathematical jezzball.
John D. Cook wrote an interesting blog post on the topic: www.johndcook.com/blog/2012/04/19/random-is-as-random-does/
7 років тому+2
Why not define randomness in relation to an observer? If an observer can't predict the sequence, then the sequence is random for that observer. That definition does away with the whole problem. The old "true randomness" requires a sequence to be unpredictable to all possible observers. That is problematic, as we don't know what observers there can be. If there is a god, he would be an observer?
Lars Pensjö This is very similar to Kolmogorov Complexity. A series would be considered random if the shortest program that could generate the series is the length of the series plus a constant
65 days quarentined. frist weed i get, i came to watch your videos. Amazing. I understand the Maths, but not why I'm doing this to myself. Congratulations on the video.
Knuth in his book pointed out that a randomly generated random number generator does not necessarily generate good random numbers. He discusses an algorithm he designed using things like a human's height and other imponderables. He wrote the code for it and set it to work. Though it had a space of 32 bits (if I remember correctly), if cycled with a period of 12.
One thing I'd add to this (very good) video is that uniform random numbers, besides having a constant probability density, have to be also *independent*. So every number has to be equally likely regardless what preceded it. Congruential generators might have a hard time achieving this since they essentially loop over the domain defined by M - and it's of course heavily dependent on the chosen A and C. Selecting primes for these parameters is usually a good start...
1. The state of a PRNG can be larger than the last bits of randomness it generated. 2. Cryptographically secure pseudo-random numbers cannot be distinguished from truly random numbers with limited resources, so unless your program is galactical size it is guaranteed you won't notice a difference. CSPRNG are typically significantly slower than PRNG in general. 3. The period of the most used Mersenne Twister is much larger than 2^32.
The Mersenne Twister is really generating 19937 bit numbers. The code only gives you 32 bits at a time, giving the appearance that the numbers are repeating within a cycle but that isn't the real size of the output. If you consider the actual output value, then the period is exactly 2^19937 - 1 which is what she meant. If you consider all the inputs to the generator, then there can be no hidden state. If there was state that wasn't dependent on the input, then you wouldn't have a PRNG, since it could produce different outputs for the same input. With these technical considerations, what she said is correct, you can't cycle longer that 2^outputbits. It is just a bit more nuanced.
The speed thing rather depends on your implementation constraints. For example a CSPRNG based on a block cipher in CTR mode can be implemented in a parallel form that can be extended as far as is required to get the speed that is required. For many non cryptographic RNGs, like for example XORSHIFT or PCGs, this is not the case. The CSPRNG in most modern desktop and server CPUs is designed to exploit this parallelism. The worlds fastest production RNG is a CSPRNG that is frequently reseeded (0.5 to 1 millions times a second). I noticed that the video states that the Mersenne Twister is the gold standard, which is laughably false. Also the stated procedure for random fractions doesn't yield a uniform distribution when tried with floating point arithmetic. Especially if you do the multiply before the divide pushing the first term into the lowest resolution end of the floating point representation. It's easy enough to make uniform floats, but the method in the video is simply wrong.
I think its important to point out that the internal state of a PRNG is not required to be the same size as the output value. In any "good" implementation your period will be much larger then the resulting set. IE using a 64bit (or larger) internal state and returning a 32bit value. It is also important to note that many PRNGs apply functions to the output bits in such a way to protect that internal state from leaking out, such as a hash. X_n+1 = (X_n * a + c) % 2^64 return hash(X_n) & (2^32 - 1)
Though storage was briefly alluded to, the statement that using a “natural” sequence (e.g. via a Geiger counter) isn’t repeatable, isn’t strictly correct, e.g. if one stores the sequence thus generated. While it’s obviously more efficient to use a pseudo-random generator, storing and retrieving a sequence that’s many times the period of a typical pseudo-random generator (e.g. 2^32) isn’t all that impractical in this age of relatively fast storage. And some simulations do require much longer sequences than the periods of typical pseudo-random generators.
Ok so this might also be nitpicky but a parabola may follow a quite a bit more complex parametrization than that. For instance: a / b² x² - sin(2α) (x-x0) + (1 - cos(2α)) (y-y0) = 0 This is the parabola you get if you have a constant gravitational field pointing down with strenght "a" (if a is negative it points up instead), and you throw a ball with starting velocity "b" in a direction dictated by "α" when you at first stand on "(x0, y0)". It includes all 2D parabolas that are aligned with the x and y axis. (In fact it has more parameters than necessary. You could do the same with y = a x² + b x + c) But of course you could also want freely rotated parabolas (which would involve an extra parameter giving that angle) or you could look parabolas in higher dimensional space...
LegendLength - maybe nit picky but try to estimate e.g. the likelihood of generating a Higgs at the LHC via the standard model using, e.g. the Merssene twister - only pointing out that when a sequence longer than the period of the standard pseudo-random generators is required one can in fact generate it and also have repeatability via storage and that can be done feasibly.
Afaik, since prng needs a seed seuence to extract entropy from, you must have a cache of real random inputs to run one. Windows supposedly does this by logging assorteed junk data like time keystrokes mous pointer position and various network info at predetermined times into an ever expanding cache of just random strings.
Mersenne Twister isn't a linear congruential generator. It's a linear feedback shift register. For the normal distribution, one normally does not use the inverse distribution function, because that's not expressible in closed form. Instead, one generates two normal variates at once, because the distribution function of the radius (in polar coordinates) of a 2D normal distribution is in closed form. There are at least two ways of doing this.
Another nice thing about picking 2^32 as your m is that 32-bit computers would do it automatically. That's just how many bits they have, so any bits after that would just get lopped off. No additional computation required.
9:56 the Uniform distribution: wouldn't the value in the y-axis be a value y, such that y approaches 0? Not y=1, because there would be an uncountably infinite values all with probability equal to 1? Or rather in your example y=10,000?
So when I use R or Stata to simulate a normally distributed set of numbers, even if I simulate a large number it often deviates from what a standard normal distribution is supposed to look like--sometimes by pretty noticeable amounts. Why is that so if the program is just basing its values off of the equation for the CDF?
Two possible reasons. (1) It takes a lot of deviates for the curve to start looking nice, like a few million. (2) Not all algorithms are perfect.I use Zignor. I don't know what R uses.
Fun side note, in the 1960s and 1970s IBM had a random number generator call “RANDU”. Donald Knuth, the author of the seminal book “ The Art of Computer Programing”, call it “Truly Horrible”. Once the flaw came out, it called into question a large number publish papers that used this function for simulations.
" in the 1960s and 1970s IBM had a random number generator call “RANDU”. " Yes, I think IBM's reply to a query about that was "We guarantee that each number individually is random, but we cannot guarantee that more than one of them is random."
Ah, Linear Feedback Shift Registers! The Playstation 2 had them on the VPU1 (sort of a vector shader before there were GPUs) and I naively used them to create x,y,z coordinates for snow and rain sprites. It looked fine as long as just a few dozen were generated, but as more were added a pattern would show; all points confined to a few slanted columns. Caused a lot of head scratching; ended up using LFSR for x, a Linear Congruential Generators for y and for z something like z' = frac( z + GoldenRatio). Thought it was a hardware issue but found out later it was a problem inherent to LFSR.
Good job! You explained the idea clearly and precisely. One thing I am still wondering is why the inverse distribution of uniformly sampled cumulative probability densities of a normal distribution becomes a normal distribution again. You have given us a very good example using the heights of US women in the video. I think I will do some more research to find a mathematical proof to convince myself.
Good point. And just to unpack your comment a bit for other people: The period of the common Mersenne Twister generator is 2^19937-1, which is called a Mersenne Prime.
In the case of the linear-congruential generator, you could increment the value of c after every pass, so with the right values of m, a, and c, you could have a period of m^2 instead of m.
How would you go about proving that a given SRNG is uniformly distributed? how does the Mersenne twister improve upon a linear congruential generator? What are some common randomness tests? I feel like there's a lot this video just glossed over; some further reading would be appreciated.
at 8:20 you say that the distributions of the random number generators is uniform. But isn't this only the case when the period of the sequence is equal to the size of the interval? If so then the Middle-Square Algorithm will never have a uniform distribution (the maximum period was less than the interval), and the Linear Congruential Generator will only have one when "good" values are chosen (such that the period=m). I would be interested to see another episode going into more detail about how these "good" values are found, and about the actual distribution of the Middle-Square Algorithm (as it is not uniform).
how do computers pick the a, m, and c for the linear congruential generator? do companies all use the same one? do they use a different pseudorandom number generator to pick them?
I wrote a random number program in an old HP 41CX calculator. It was simple. Extract the 1/10th of a sec digit, set it next to the previous extraction in the alpha register, and when you have the number of digits wanted, execute AMUN and presto, a real random number appears in the X register. Requires a, or several, random execution or presses of the key assigned to extract the digit from the clock.
I actually came up with a random algorithm called XSR for XOR with Shifting Right. It passes many random tests, and is faster and even better than the Mersenne Twister when stacked up.
Could the Cumulative Distribution Function trick (mapping from uniform [0,1] to some other distribution) be expressed in matrix multiplication? Not sure what benefit that could have, but it seem potentially powerful.
I love the show! Keep it up guys and galls! Although the reasoning for obtaining a normal RV from a uniform RV follows from the inverse transform method (as you've illustrated), I'm a bit scared that this might be advocating its use for this case rather than the Box-Muller transform in order to obtain normal variates, which is much easier (and safer) to implement on a range of platforms and in fact much more elegant! (Save for the fact that you might be throwing an unneeded normal RV away or donating it to charity, but who in this simulation driven world only requires a single normal RV - am I right?) For the Inverse Transform demonstration I would've used a Pareto / exponential distribution (granted, it would not produce nice S curved graphs as the normal CDF does, but it does give the opportunity of quickly doing an on-screen algebraic manipulation to arrive at the desired RV, which always makes my heart sing :-) )
Surely comparing the Linear Congruential Generator with the Middle Square Algorithm at 5:49, and asking which sequence looks the most random, is an unfair comparison? They have different moduli- the MSA has mod 9999, but the LCG has mod 7829. Small sample nonwithstanding, the two sets provided are inherently different.
This demonstrates the difference between a mathematician and an engineer. The problem was defined with 2 significant digits. The answer can only possibly contain 2 significant digits. You don't pick up more. So, the baseball rulebook is correct. The correct answer to this problem is "12". The trailing ".019" is meaningless. Great videos, sir. Keep them coming!
Are there applications in which a pseudo random sequences are required that never repeat. One sequence that never repeats would be a Thue-Morse sequence, however it does not look very random. However one could use it as a seed to make it look "more random". Or are there better alternatives for this?
LegendLength It is also relatively simple to get an extremely long period by using two long sequences, whose lengths are relatively prime (share no common factor other then one). And then combine the two, for example by addition. The result should have a period equal to the product of the two periods of the original sequences.
No deterministic procedure which can be implemented in a physical system and has a finite size can give an infinite sequence that never goes into a loop, because there are only so many possible physical states for the implementation, by the bekenstein bound.
A counter example would be a Thue-Morse sequence. For example if you start of with some finite sequence X consisting out of numbers between 0 and 1 and not all equal to 0.5 (and maybe some other conditions), then you can recursively extend this sequence such that it is never repeating itself using X:=[X, 1-X]. However this sequence probably does not have very good properties regarding appearing random.
My point was in the "and has a finite size". Of course, if you have an unlimited amount of memory, then you can make a program that makes a sequence that never repeats, by doing something as simple as 01001000100001... etc. . But for any machine with a finite physical size, it has finitely many possibly physical states (because of the Bekenstein bound), and therefore must eventually repeat.
+drdca I was wondering what you exactly meant by that, because I am absolutely no expert, but I am rather certain that a simple number line will never repeat, and it can be sescribed with xn=xn-1 + 1. But I guess memory couldnt hold the numbers after they exceed its size.
I think it’s diabolical that this is the _only_ video I found that _actually gives a valid equation._ Like, I’m looking up random number generator, why aren’t you giving me one?
Was it just a coincidence that the probability density function looked like the derivative of that cumulative distribution of heights? It appeared to be exactly that and I think that makes sense but I've not seen such an elegant link between statistics and calculus before that.
9:00 just to clarafy that it will never actually be 1. The way it's implemented on practice doesn't allow it, and if it would happend all sorts of os-crashing level of glitches would happend.
So why is it that the shuffle feature on apple music/ spotify is so bad? I have well over a thousand songs downloaded but will hear the same 10-20 songs played several times and never hear the majority of songs on my phone.
A few years ago apple had true randomness in their shuffle function but people complained that sometimes the same song being played twice(randomness can repeat itself, you can get to continuous 5s on a die and it's still random), so they removed the feature with who knows what algorithm.
One problem alluded to, but not mentioned explicitly with pseudorandom vs random numbers, is that truly random integer sequences could have exactly two (or "n") identical numbers in a row (for example with p=1/1e6 for rolling a thousand-sided die twice) , whereas for pseudorandom sequences, if there are two identical numbers in a row, there must be ONLY that number forever after. Therefore using pseudorandoms pairwise (over the entire range of possible numbers) would never give you the same distribution you would see in actual random sequence.
In the case of banking systems for example with tokens, the seed is the current time since we assume that the laws of physics at the surface of the Earth remain constant such that if your phone clock and the bank systems clocks are synchronized you both get the same generated number through the implemented algorithm. If however one would go a bit higher, let's say low Earth orbit, then the time will differ due to general relativity. You will be unable to login to your bank account from space if you don't correct for the time difference due to earth gravity.
There were a lot of issues regarding pseudo random generators (Java 2013, Sony 2010, OpenSSL 2008), but that doesn't mean hardware generators are better and humans definitely aren't. Hardware generator even those which uses radiation reader could be "hacked" by putting another device near it (ofc it requires physical access to that machine but that's always the case). Numbers/phrases generated by humans are even less random that simple PRNG. It could be determine base on that person knowledge, habits, friends how probabilistic is that he will chose exact number/character in sequence.
Lots of RNG engines for programs being activley interacted with (say, a game) use user input as part of the sequence. Is this still psudeorandom, or does the realtime user input make it totally random?
It seems to be a fallacy when the spiraling lengths (1,2,..infinity) ate said to be a cop win when the length of the last one is - infinity. At what point is the overwhelming exception recognized?
The cycle repeats every time you get the same number? Wouldn't it make more sense to have the algorithm be based on the last several numbers instead of the last one? For example, in the linear congruential generator, the increment could be changed to the seed number we got on the previous turn, and the modulo could be the seed we got two turns ago. That leads to the problem of how to actually start the algorithm (How can you start something that relies on previous turns?) but at the very most, this just means we need multiple seeds, and the modulo, increment, etc. become their own seeds.
If you have X_n = f(X_n-1) and f is the function to define the new psuedorandom number could we just change it to X_n = f(X_n-1 * n) to avoid cycling. Or why dont we slice up sqrt(2) into packs of four and use them as a random sequence?
I hope there's a follow up episode on some of the tests used to gauge a PRNG's statistical randomness. Looking at the 3-dimensional planes generated by LCGs is a pretty direct way to see how bad they are (in that respect).
No, it's not. It's the integral from -∞ to x of the probability distribution. In this case, the probability distribution is a gaussian; its corresponding integral doesn't have a closed form in terms of elementary functions.
No. It's the 'erf' function. That's short for "error function". In some situations you can cheat and use any function with that general form, but usually it pays to be accurate.
What is "truly" random in every day life is more of a philosophical question than a mathematical one. If a poker website has a RNG that combines seed from micro temperature fluctuations, internet traffic, number of players on website and other variables, I'm going to call it random for the purpose of me trusting the website to deal me cards "randomly".
Custom hardware for randomness is less and less of a problem, now that basically every consumer computer (including phones) comes with a camera in it. Random movement of electrons in the CCD sensor produces thermal noise that you can use. Note that this does not depend on actual light entering the camera.
Hi Nathan, you are right, a web cam can be used. I made a web base tool some years ago that works inside the browser an captures the web cam to generate random numbers. You can put a sticker on the cam and the sensor still it generates pixel noise. For poeple interested in it: retura.de/camrng/camrng.html Its in german...klick on "web cam aktivieren" to start the capturing and then on START. The graph then shows the quality or "randomness" of the data, by calculating PI with a monte-carlo simulation and the rate between ones and zeros. Greetings David
You had a video about normal numbers half a year ago. These are exactly the numbers x in [0,1] such that the sequence x_{n+1}=10 x_n mod 1 equidistribute in [0,1]. I'm not sure how close is the connection to pseudo random numbers, but it is true that almost every number in [0,1] is normal (so it is a "good seed").
(Weird comment from game programmer on getting away with sampling a PRNG with a small period. Ignore as needed :P) Interesting thing WRT period for procedurally generated content (eg game levels) is it doesn't necessarily have to be particularly large. You are advancing the PRNG through it's sequence every time it is sampled, but you can sample it in such a way that the repetition is not apparent to the player. For example, if we have a small period PRNG determining the heights of tiles in a level, it will become apparent if the period is significantly less than the number of tiles generated. But if we generate a tile's height and then before we sample for the next tile we sample the PRNG a second time or more ("how many blades of grass are on this tile?" or "Shall we place a monster here?" - "Which direction is the monster facing?") by the time we get to calculating the next tile's hight, the PRNG could be at a (psuedo) random point in it's sequence. The sequence repeats but less noticeably for something the playing will recognise (tile heights) unless the player can view a lot more of the generated level. This can work against us too though; a program sampling a PRNG can request random numbers in step with it's period, so every tile would have the same height, and the same number of blades of grass, the same monster spawned. Or if the program made periodLength/2 samples for a tile, every ~other~ tile will be the same. This is largely irrelevant these days since most computers can comfortably generate very long pseudorandom sequences, and do it very quickly. But it is a trick that can be used if careful and when computation power is extremely limited :)
I'm showing this to all my coworkers. Really good way to explain how things work behind the scenes for all those different distributions in various random modules/libraries
But what if we want to create pseudorandom sequence that seem to contain all numbers - that is, any real number between for instance 0 and 1 can be produced? Wouldn't it be possible to acheive this by using a formula that typically produces transcendental numbers? It is true that only certain numbers will be contained, but we don't know which creating the impression that any number can accur. But... Functions that produce transcendental numbers can also contain rational numbers for certain input values. However, I wonder if it is possible to include irrational, non-transcendental numbers as well.
I had a disagreement with my college professor about the randomness of a register on the Atari 8-bit computer. The POKEY chip includes an internal 17-bit polynomial counter from which 8 bits can be read. POKEY advances the counter on every clock cycle, so it's completely independent of the main CPU. In normal operation, the timing of when the CPU reads this register is impacted not just by the number of clock cycles it takes to run the software that reads it but also by interrupts from such things as video refresh and keyboard events. Can this register be considered truly random or not?
I have seen some pseudo random number generators that use a bell-curve distribution, given its mean and the standard deviation. However, I don't quite understand how they generate the numbers, because they seem to be unbounded, differently from the first two algorithms shown in this video.
What about a version of the Linear Congruential Generator that has an increment increment -- ie the increment value is increased periodically. Surely, this would make having a repeated output a more harmless outcome. If you ended up back on a number you've already visited, you're unlikely to repeat a complete chain - just revisit a single number and move on. Would this be more, or less likely to return to a duplicate number though, even if it doesn't cycle in the same way?
It's dangerous getting your randomness from something other users can affect. If I want to push around your random number generation, all I have to do is calculate the right things at the right time to delay your calculation. On computers with spinning-platter hard drives, air turbulence in the drive affects the seek time, so that's a pretty good source of randomness -- as long as you're getting the timing from your own seeks. If you'll take any seek, again all I have to do is flood the system with disk access requests of my choice, whatever that particular drive can do most consistently.
[00:59] but you also want the other statistical moments to be enforced... [01:55] "not reproducible"-a wrong context-the seed can be recorded... [05:41] the MSA appears to have strong low-frequency spectral energy...
Very cool, I had never heard of inverse transform sampling before. I can think of a few programming applications where it would have been useful. Thanks!
Couldn't you increase the periodicity of any PRNG greatly by making the parameters (e.g. the a, c and maybe m in the linear congruential generator) also outputs of their own PRNG, initialised with their own seeds? It would still be deterministic, but determined by the combination of initial seed values of all inputs.
The modulus and multiplier parameters are very important in determining the period of the function. Bad values can result in a very short period, so they need to be chosen carefully; randomly chosen values aren't necessarily good enough. I think you get better periods from primes, but I'd have to look it up to confirm.
So if there are certain values that a Pseudo RNG can never get, how does it get around this to appear truly random? How do the RNG people make the PDF as close to uniform as possible and what does the PDF of those actually look like? What about repeating values? For example, how do I randomly generate two or more numbers in a row? What's the method behind that?
Great questions. These are exactly the properties we make sure an RNG has. If you are following NISTs prescriptions, you can look at the CTR-DRBG algorithm. This uses AES in CTR mode, which never returns the same value twice (because it's passing a counter into a bijective mapping). The state is the key K and counter V (for vector). To make it able to return values from the full set of output possibilities the K & V state is randomized by passing it through AES a couple of times (the details are in the spec) so the bijective mapping changes to a new one and so values can be repeated with probabilities equal to that of fully entropic bits. There are many other designs. This is the most common one.
Inverse Transform Sampling should also generate pseudo normal distribution because probability at each point is different but in this case, it will be constant for given ranges however short we go.
In the middle square algorithm, wouln't a small value 'jam' it (leading to an endless string of zeroes)? Any number less than 100 would yield a number lower than itself, eventually leading to zero. 100 itself also yealds itself, creating another loop.
And I was watching a video from GDC of some game devs talking about procedural content, which usually makes use of pseudorandom numbers just this afternoon.
But how is the seed obtained in the first place? There are calculators which produce (pseudo)random numbers and don't have a clock or anything like that from which they could get a seed
Hi, I’m trying to simulate gas particles in a 2D box, and would like to generate velocities from the 2D Maxwell-Boltzmann distribution. To do this I’ve changed the probability density function into the cumulative density function, which I found to be -e^(-av**2), with a = m/2kT. However plotting this function against velocity I get probabilities between -1 and 0, which can’t be right. I was wondering if maybe you knew either why the function goes from -1 to 0 and/or if there is something I could do to ‘fix’ it. Thank you in advance and great video btw!
Why not having 2 seeds (or more) and the sequence depending on 2 (or more) terms of the series? Wouldn't that increase the period of the pseudorandom pattern? I mean if a part of the series shows up as 257, 469 and another part shows up as 319 and 469, these are two different parts of the series. Wouldn't that improve the algorithm?
It's going to be really awkward when Infinite Series, eventually ends.
John Hunter Good point, the series won’t last an infinite amount of time and they only have a finite amount of time they can actually film. My suggestion is handle the videos as a supertask with each video taking half the time to make and coming out twice as quickly as the last one. That way they could release an infinite series of videos using only a finite amount of material. :)
Good god. That is a brilliant idea to keep the show alive for infinite time. I love it.
Only problem is the host and the technical staff need to be able to handle it (if people were able to do it then humanity would reach a whole new level).
Sripad Kowshik Subramanyam Reminds me of the old joke about the chemist, the engineer and the mathematician who go vacationing in a cabin in the woods. One night the chemist wakes up and sees a fire on the stove. He quickly looks around, finds some inert powder in the cupboard and dumps it on the fire to put it out. The next night the engineer comes in and sees another fire on the stove (none of them are apparently great with kitchen safety). The engineer thinks for a bit about how much water he’ll need to dowse it, goes and grabs their water canteens and pours out just enough of them to douse the flame. The third night the mathematician comes in and sees yet another fire on the stove. He quickly grabs his notepad and starts vigorously writing down equations. A few minutes later he smiles, puts the pad down on the table and says “A solution exists, QED,” and goes back to bed. 😄
Doug Rosengard good one.
Who says their need to be infinitely many *distinct* episodes?
Let n be the number of the last episode. Then every episode e_k with k>n is defined to be equal to episode e_n. Voilà, infinite series. :^)
Kinda funny watching the video quality getting slaughtered at 2:21. Thanks to Tom Scott, I now know why!
Well spotted. I was concentrating on the audio not the video so I missed Dr Houston-Edwards face pixelating. Tom Scott: UA-cam's cool uncle.
Thanks for that reference. I was wondering what happened there.
add more confeity add more confeity !! lol
On a mobile screen at 1080p it's hardly noticeable. Never been this disappointed about a sharp picture.
Kailei R m.ua-cam.com/video/r6Rp-uo6HmI/v-deo.html found it!
PBS explains these crazy topics with such precision and clarity, expecially infinite series. kudos for an exellent channel!
omg lmao this ego
r/iamverysmart
+
Also this thread made me giggle~
It is not that precise, it is stated in the video that radiation and things like that are random, while in reality they are not random, it is close to impossible to replicate and things like that, but it is not truly random. Maybe Veritasium or Vsauce had a video about true randomness where it is described a lot more precise. Even a service like random.org which collects electromagnetic background noise from antennas is not truly random, that noise has a source. I don't think true randomness exists, but we can get fairly close.
This right down my alley. Computer Science and Math. My goodness.
She sings her siren's song!
unfortunately this channel is gone :(
This is such a massively important topic. This issue alone has contributed to literally hundreds of thousands of dollars of my lifetime income. You've done a fantastic job of explaining a topic that, in interviews, I've found software engineers with over a decade of experience manage to get completely wrong.
What are you doing? :)
@@konstantin7596 implementing telecom encryption from spec, writing and verifying secure code through static analysis, reverse engineering, and automated testing, that sort of thing.
I just put the numbers 0 - 9999 into a spreadsheet and ran the Middle Square Algorithm on them. A few curiosities arose, which signify reasons it's Bad News for purpose....
1) There are 20 ways to produce a 0000 value. As 0000 has period 1, there are multiple ways to end up in this sinkhole.
2) Nearly 39% of the seed numbers can NEVER be produced this way.
3) A further 35% of sees numbers can only be produced by one other four-digit value
4) The number 5625 can be generated twelve ways. 2500 can be generated eleven ways (including itself). You might expect a number or two out of 10,000 seeds to be very popular, but knowing what they are means they're very predictable.
In reality, you're not producing random numbers [0,9999] : you're producing numbers [0,6109] with a ton of spaces and biases scattered within....!
that's why a good pseudorandom number generator should be used and not a bad one
MS is an old algorithm, not the best one. But in April of 2017 people have implemented a Middle Square Weyl Sequence PRNG. It is a modified version of MS and instead of [0;9999] produces [0;2^32-1] possible values. It fixes alot of problems of MS, and passes every statistical test. Just take a modulo of the returned value (for example mod 16384), and check the distribution. It should be uniformal.
Note: `s` is the seed. To initialise `s` take a random even number and add 0xb5ad4eceda1ce2a9 to it.
en.wikipedia.org/wiki/Middle-square_method#Middle_Square_Weyl_Sequence_PRNG
Knuth (volume 2 of The Art of Computer Programming) has lots of analysis and practical algorithms for pseudo-random numbers. Simple recipe: pseudo-random numbers (one of his algorithms) followed by another neat procedure of Knuth's which is shuffling the numbers in an output buffer.
Another book, "Numerical Recipes", refers to this. Numerical Recipes has references to Knuth. You can order the Knuth book and Numerical Recipes at any good bookshop. Beware: these are weighty books with a heavy price tag. Also look up "primitive polynomial modulo 2" which gives you a sequence which only repeats after a looooooooooong time. It's fairly easy to find tables of these, i.e. someone has calculated some of these "primitive polynomials modulo 2". (Knuth describes these.) Pick one which has a long repeat period, and the result is a trivially-simple recipe to make a stream of bits which repeats after an amazingly long time, i.e. longer than the age of the universe. Using this as the basis of a random-number generator gives a very fast and simple algorithm.
Look at tests of random number sequences: if you think you have a random number generator, there are ways of testing it, so hook up your RNG (Random Number Generator) to one of these tests to see how it behaves. The web has lots of information on this. Wikipedia, for example, has descriptions of ways of testing random numbers. So do other mathematics-related web sites. The tests are generally easy to perform.
I've also experimented with prime numbers. Find a sequence which can be quite simple, based on addition modulo the prime number, and (importantly) repeats with a period of your favorite prime number. Have several of these in parallel, each based on a different prime number, and the period of the resulting generator will be something like the product of all of the prime numbers. If you have 6 prime numbers which are of size about 1,000,000 then the composite behaviour will repeat with a period of about 1,000,000^6.
AFIK - The best random number generators are based on cryptology functions. Or, one could just as easily say that cryptology methods are based on random number generators, since the whole point of cryptology is to take a message and create a patternless sequence of numbers.
This video barely scratched the surface of random number generators. I asked a predoctoral math teacher if she had considered random number research for her doctoral thesis. Her reply was that she should have. There was a lot of research opportunities available. And that was maybe 20 years ago, so more research must still be available.
Random-number-generating algorithms are so interesting.
Inverse transform sampling is great in theory; in fact, when I needed to generate numbers from a distribution a few years back, it was my first instinct despite not knowing what to call it at the time.
Turns out, though, that - well, I'll let Wikipedia tell you: "For a continuous distribution, however, we need to integrate the probability density function (PDF) of the distribution, which is impossible to do analytically for most distributions (including the normal distribution). As a result, this method may be computationally inefficient for many distributions and other methods are preferred."
And that's how I discovered the Box--Muller Transform. Marvelous bit of work that actually will give you a normal distribution.
It's true that a pseudo random generator must eventually repeat. However, it's not true that it must happen as soon as it hits the same output again.
SRGs can have hidden state. Stuff inside the black box that isn't part of the output. That way they may effectively loop through all possible outputs not in just one way but, in fact, potentially in all possible orders. Similarly you can build them such that they occasionally spit out the same value twice in a row without falling into a loop where they spit out ONLY that one value.
They have to repeat only once their entire state (hidden state plus output) is hit again. And that could take a LOT longer than hitting every possible output once.
adding irrational number (or in actual computer case, very very accurate decimal one) to the seed on every step will cause the cycle be so long the existence of humanity will come to end faster.
Reaaaally depends on the rest of your setup. But yes, irrational numbers are a nice source of entropy.
You're right, but that's just deferring the problem - if you had N non-hidden states and H hidden ones, then your period is going to be (at best) N*H, so you're essentially back to square one.
I don't really see a problem here...
I just pointed out that a PRNG can through the same outputs a *lot* of times before starting to repeat.
Of course if you want to go through all possible sequences you'll need to have H*N >= N! or, if you want to allow for arbitrary duplicates, it ought to be H*N >= N^N. That's quite a tall ask, usually. (that's just numerically. You'll also want to satisfy some properties to actually generate the desired sequences)
But why stop there? What if, not only you want to run through all permutations, but you want to permute the order you run through the permutations?
The numbers obviously become insanely huge long before we even get to that point. Too large to contain on a computer. (By N=59, N! exceeds the estimated number of atoms in the observable universe, N^N clears that by 48)
Hidden states or not once your back to the same "exact" state your started with you are bound to repeat
Ooh! That drifting numbers background really did a number on the video quality.
No pun intended.
Tom Scott did a video on the topic called "Why Snow and Confetti Ruin UA-cam Video Quality" (at least I think that was the title) which gives a basic explanation of the phenomenon if you're interested.
This was a great video, but i feel like the definition of randomness was a little vague. It would be great to see a video on Kolmogorov complexity to get a more in-depth insight on randomness. Im loving the computer science direction of this channel and i can't think of a better topic to cover!
I'm sure the definition of randomness was vague because the video focused chiefly on psuedorandomness. However, it's pretty difficult to define randomness.
When I took mathematical statistics for my BS in mathematics, we focused on randomness being the tendency of the image of a random variable to resemble the composition of its PDF/PMFs as the number of samples grows arbitrarily large.
For instance, the PMF of flipping a fair coin is Pr(X=x) = 1/2, x∈{Heads, Tails}, so if a flip of a fair coin truly is random, then flipping it an arbitrarily large number of times should give a 1/2 chance for each side.
However, if there were another process involved, such as having to choose one of four coins with replacement each flip, where one coin is fair, two gives give heads 75% of the time and the last gives tails 75% of the time, then if the process truly is random, the outcome should resemble the composition of those functions.
Additionally, as mentioned in the video, we should expect randomness to not exhibit any identifiable pattern. However, this isn't immediately disqualifying of something as being random. For instance, I could roll a d6 18 times and get 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6. This is extremely unlikely, but can happen under a truly random system. Ultimately, though, this pattern should not continue for arbitrarily large sequences.
Also, again mentioned in the video, an important aspect of randomness is irreproducibility. You should expect no two random processes to give the exact same output, though if you are extremely unlucky (or extremely lucky!) you might have to observe for a bit before the outputs diverge. In this last season of Doctor Who, there was an episode where people are living in a simulated reality, and they learn of this because any group of people calling out random numbers at the same time always calls out the same sequence of numbers because, as explained in the episode, computers can't generate truly random numbers (quantum computers should allow true random number generation, but that's beside the point).
Ultimately, though, randomness by its very nature is difficult to precisely define. It is easier to identify what *is not* random and trim it away, like an asymptotic game of mathematical jezzball.
John D. Cook wrote an interesting blog post on the topic: www.johndcook.com/blog/2012/04/19/random-is-as-random-does/
Why not define randomness in relation to an observer? If an observer can't predict the sequence, then the sequence is random for that observer. That definition does away with the whole problem.
The old "true randomness" requires a sequence to be unpredictable to all possible observers. That is problematic, as we don't know what observers there can be. If there is a god, he would be an observer?
Lars Pensjö This is very similar to Kolmogorov Complexity. A series would be considered random if the shortest program that could generate the series is the length of the series plus a constant
Interesting, I will look it up!
65 days quarentined. frist weed i get, i came to watch your videos. Amazing. I understand the Maths, but not why I'm doing this to myself. Congratulations on the video.
Knuth in his book pointed out that a randomly generated random number generator does not necessarily generate good random numbers. He discusses an algorithm he designed using things like a human's height and other imponderables. He wrote the code for it and set it to work. Though it had a space of 32 bits (if I remember correctly), if cycled with a period of 12.
One thing I'd add to this (very good) video is that uniform random numbers, besides having a constant probability density, have to be also *independent*. So every number has to be equally likely regardless what preceded it. Congruential generators might have a hard time achieving this since they essentially loop over the domain defined by M - and it's of course heavily dependent on the chosen A and C. Selecting primes for these parameters is usually a good start...
1. The state of a PRNG can be larger than the last bits of randomness it generated.
2. Cryptographically secure pseudo-random numbers cannot be distinguished from truly random numbers with limited resources, so unless your program is galactical size it is guaranteed you won't notice a difference. CSPRNG are typically significantly slower than PRNG in general.
3. The period of the most used Mersenne Twister is much larger than 2^32.
The Mersenne Twister is really generating 19937 bit numbers. The code only gives you 32 bits at a time, giving the appearance that the numbers are repeating within a cycle but that isn't the real size of the output. If you consider the actual output value, then the period is exactly 2^19937 - 1 which is what she meant.
If you consider all the inputs to the generator, then there can be no hidden state. If there was state that wasn't dependent on the input, then you wouldn't have a PRNG, since it could produce different outputs for the same input. With these technical considerations, what she said is correct, you can't cycle longer that 2^outputbits. It is just a bit more nuanced.
The speed thing rather depends on your implementation constraints. For example a CSPRNG based on a block cipher in CTR mode can be implemented in a parallel form that can be extended as far as is required to get the speed that is required. For many non cryptographic RNGs, like for example XORSHIFT or PCGs, this is not the case. The CSPRNG in most modern desktop and server CPUs is designed to exploit this parallelism. The worlds fastest production RNG is a CSPRNG that is frequently reseeded (0.5 to 1 millions times a second). I noticed that the video states that the Mersenne Twister is the gold standard, which is laughably false. Also the stated procedure for random fractions doesn't yield a uniform distribution when tried with floating point arithmetic. Especially if you do the multiply before the divide pushing the first term into the lowest resolution end of the floating point representation. It's easy enough to make uniform floats, but the method in the video is simply wrong.
Finally a complete explanation as to why pseudo random exists :D
I think its important to point out that the internal state of a PRNG is not required to be the same size as the output value. In any "good" implementation your period will be much larger then the resulting set. IE using a 64bit (or larger) internal state and returning a 32bit value. It is also important to note that many PRNGs apply functions to the output bits in such a way to protect that internal state from leaking out, such as a hash.
X_n+1 = (X_n * a + c) % 2^64
return hash(X_n) & (2^32 - 1)
Persi Diaconis does an excellent lecture series on what it means to be random.
Though storage was briefly alluded to, the statement that using a “natural” sequence (e.g. via a Geiger counter) isn’t repeatable, isn’t strictly correct, e.g. if one stores the sequence thus generated. While it’s obviously more efficient to use a pseudo-random generator, storing and retrieving a sequence that’s many times the period of a typical pseudo-random generator (e.g. 2^32) isn’t all that impractical in this age of relatively fast storage. And some simulations do require much longer sequences than the periods of typical pseudo-random generators.
Ok so this might also be nitpicky but a parabola may follow a quite a bit more complex parametrization than that.
For instance:
a / b² x² - sin(2α) (x-x0) + (1 - cos(2α)) (y-y0) = 0
This is the parabola you get if you have a constant gravitational field pointing down with strenght "a" (if a is negative it points up instead), and you throw a ball with starting velocity "b" in a direction dictated by "α" when you at first stand on "(x0, y0)".
It includes all 2D parabolas that are aligned with the x and y axis. (In fact it has more parameters than necessary. You could do the same with y = a x² + b x + c) But of course you could also want freely rotated parabolas (which would involve an extra parameter giving that angle) or you could look parabolas in higher dimensional space...
LegendLength - maybe nit picky but try to estimate e.g. the likelihood of generating a Higgs at the LHC via the standard model using, e.g. the Merssene twister - only pointing out that when a sequence longer than the period of the standard pseudo-random generators is required one can in fact generate it and also have repeatability via storage and that can be done feasibly.
Afaik, since prng needs a seed seuence to extract entropy from, you must have a cache of real random inputs to run one. Windows supposedly does this by logging assorteed junk data like time keystrokes mous pointer position and various network info at predetermined times into an ever expanding cache of just random strings.
Mersenne Twister isn't a linear congruential generator. It's a linear feedback shift register.
For the normal distribution, one normally does not use the inverse distribution function, because that's not expressible in closed form. Instead, one generates two normal variates at once, because the distribution function of the radius (in polar coordinates) of a 2D normal distribution is in closed form. There are at least two ways of doing this.
Pierre Abbat, it actually is essentially an lcg, but has a tremendous amount of state.
Couldn't agree more, but morons might say No, it’s based on a shift register, not an LCG.
Another nice thing about picking 2^32 as your m is that 32-bit computers would do it automatically. That's just how many bits they have, so any bits after that would just get lopped off. No additional computation required.
this is the best explanation I have ever seen for random number algorithms
I love Infinite Series but I gotta say, the Comp-Sci related videos are above and beyond spectacular
9:56 the Uniform distribution: wouldn't the value in the y-axis be a value y, such that y approaches 0?
Not y=1, because there would be an uncountably infinite values all with probability equal to 1?
Or rather in your example y=10,000?
So when I use R or Stata to simulate a normally distributed set of numbers, even if I simulate a large number it often deviates from what a standard normal distribution is supposed to look like--sometimes by pretty noticeable amounts. Why is that so if the program is just basing its values off of the equation for the CDF?
Two possible reasons. (1) It takes a lot of deviates for the curve to start looking nice, like a few million. (2) Not all algorithms are perfect.I use Zignor. I don't know what R uses.
Fun side note, in the 1960s and 1970s IBM had a random number generator call “RANDU”. Donald Knuth, the author of the seminal book “ The Art of Computer Programing”, call it “Truly Horrible”. Once the flaw came out, it called into question a large number publish papers that used this function for simulations.
Interesting, thanks!
" in the 1960s and 1970s IBM had a random number generator call “RANDU”. "
Yes, I think IBM's reply to a query about that was "We guarantee that each number individually is random, but we cannot guarantee that more than one of them is random."
You should've talked about Linear Feedback Shift Registers... And output bias/properties. But this is still a great video.
Ah, Linear Feedback Shift Registers! The Playstation 2 had them on the VPU1 (sort of a vector shader before there were GPUs) and I naively used them to create x,y,z coordinates for snow and rain sprites. It looked fine as long as just a few dozen were generated, but as more were added a pattern would show; all points confined to a few slanted columns. Caused a lot of head scratching; ended up using LFSR for x, a Linear Congruential Generators for y and for z something like z' = frac( z + GoldenRatio). Thought it was a hardware issue but found out later it was a problem inherent to LFSR.
First time i've seen a video this interesting that I couln't focus on the content because the speaker was too beautiful.
Good job! You explained the idea clearly and precisely. One thing I am still wondering is why the inverse distribution of uniformly sampled cumulative probability densities of a normal distribution becomes a normal distribution again. You have given us a very good example using the heights of US women in the video. I think I will do some more research to find a mathematical proof to convince myself.
No reference to why it is called Mersenne Twister? 2^19937-1
Good point. And just to unpack your comment a bit for other people: The period of the common Mersenne Twister generator is 2^19937-1, which is called a Mersenne Prime.
In the case of the linear-congruential generator, you could increment the value of c after every pass, so with the right values of m, a, and c, you could have a period of m^2 instead of m.
but there can only be m states so the period cannot be greater than m
Complete and to the point! Great video.
Her videos are always so well made
How would you go about proving that a given SRNG is uniformly distributed? how does the Mersenne twister improve upon a linear congruential generator? What are some common randomness tests? I feel like there's a lot this video just glossed over; some further reading would be appreciated.
at 8:20 you say that the distributions of the random number generators is uniform. But isn't this only the case when the period of the sequence is equal to the size of the interval? If so then the Middle-Square Algorithm will never have a uniform distribution (the maximum period was less than the interval), and the Linear Congruential Generator will only have one when "good" values are chosen (such that the period=m). I would be interested to see another episode going into more detail about how these "good" values are found, and about the actual distribution of the Middle-Square Algorithm (as it is not uniform).
how do computers pick the a, m, and c for the linear congruential generator? do companies all use the same one? do they use a different pseudorandom number generator to pick them?
WIkipedia has a thorough page on Linear Congruential Generator
I wrote a random number program in an old HP 41CX calculator. It was simple. Extract the 1/10th of a sec digit, set it next to the previous extraction in the alpha register, and when you have the number of digits wanted, execute AMUN and presto, a real random number appears in the X register. Requires a, or several, random execution or presses of the key assigned to extract the digit from the clock.
I actually came up with a random algorithm called XSR for XOR with Shifting Right. It passes many random tests, and is faster and even better than the Mersenne Twister when stacked up.
Could the Cumulative Distribution Function trick (mapping from uniform [0,1] to some other distribution) be expressed in matrix multiplication? Not sure what benefit that could have, but it seem potentially powerful.
It depends on the curve. Some curves are not analytic and so need iterative numerical methods. So the process is quite inefficient.
I love the show! Keep it up guys and galls!
Although the reasoning for obtaining a normal RV from a uniform RV follows from the inverse transform method (as you've illustrated), I'm a bit scared that this might be advocating its use for this case rather than the Box-Muller transform in order to obtain normal variates, which is much easier (and safer) to implement on a range of platforms and in fact much more elegant! (Save for the fact that you might be throwing an unneeded normal RV away or donating it to charity, but who in this simulation driven world only requires a single normal RV - am I right?)
For the Inverse Transform demonstration I would've used a Pareto / exponential distribution (granted, it would not produce nice S curved graphs as the normal CDF does, but it does give the opportunity of quickly doing an on-screen algebraic manipulation to arrive at the desired RV, which always makes my heart sing :-) )
On that graph on the right at 10:20, the line from 0 to 1 is not parallel with the axis
Surely comparing the Linear Congruential Generator with the Middle Square Algorithm at 5:49, and asking which sequence looks the most random, is an unfair comparison?
They have different moduli- the MSA has mod 9999, but the LCG has mod 7829. Small sample nonwithstanding, the two sets provided are inherently different.
This demonstrates the difference between a mathematician and an engineer. The problem was defined with 2 significant digits. The answer can only possibly contain 2 significant digits. You don't pick up more. So, the baseball rulebook is correct. The correct answer to this problem is "12". The trailing ".019" is meaningless.
Great videos, sir. Keep them coming!
Are there applications in which a pseudo random sequences are required that never repeat. One sequence that never repeats would be a Thue-Morse sequence, however it does not look very random. However one could use it as a seed to make it look "more random". Or are there better alternatives for this?
LegendLength It is also relatively simple to get an extremely long period by using two long sequences, whose lengths are relatively prime (share no common factor other then one). And then combine the two, for example by addition. The result should have a period equal to the product of the two periods of the original sequences.
No deterministic procedure which can be implemented in a physical system and has a finite size can give an infinite sequence that never goes into a loop, because there are only so many possible physical states for the implementation, by the bekenstein bound.
A counter example would be a Thue-Morse sequence. For example if you start of with some finite sequence X consisting out of numbers between 0 and 1 and not all equal to 0.5 (and maybe some other conditions), then you can recursively extend this sequence such that it is never repeating itself using X:=[X, 1-X]. However this sequence probably does not have very good properties regarding appearing random.
My point was in the "and has a finite size". Of course, if you have an unlimited amount of memory, then you can make a program that makes a sequence that never repeats, by doing something as simple as 01001000100001... etc. . But for any machine with a finite physical size, it has finitely many possibly physical states (because of the Bekenstein bound), and therefore must eventually repeat.
+drdca I was wondering what you exactly meant by that, because I am absolutely no expert, but I am rather certain that a simple number line will never repeat, and it can be sescribed with xn=xn-1 + 1. But I guess memory couldnt hold the numbers after they exceed its size.
I really want to watch these videos, but I'm waiting for the series to finish so I can binge watch the entire series.
Can Inverse Transform Sampling be used to transform a different distribution into a uniform distribution?
I think it’s diabolical that this is the _only_ video I found that _actually gives a valid equation._ Like, I’m looking up random number generator, why aren’t you giving me one?
Someone should recommend this video to the guys who 'programmed' the randomizer for my car music player
Was it just a coincidence that the probability density function looked like the derivative of that cumulative distribution of heights? It appeared to be exactly that and I think that makes sense but I've not seen such an elegant link between statistics and calculus before that.
PDF is always the derivative of the cumulative function. The exact opposite of a coincidence!
Clear and precise explanations, thank you! Could you show me the literature used for this video or recommend some further literature?
This video's topic was _so random!_
9:00 just to clarafy that it will never actually be 1. The way it's implemented on practice doesn't allow it, and if it would happend all sorts of os-crashing level of glitches would happend.
So why is it that the shuffle feature on apple music/ spotify is so bad? I have well over a thousand songs downloaded but will hear the same 10-20 songs played several times and never hear the majority of songs on my phone.
A few years ago apple had true randomness in their shuffle function but people complained that sometimes the same song being played twice(randomness can repeat itself, you can get to continuous 5s on a die and it's still random), so they removed the feature with who knows what algorithm.
One problem alluded to, but not mentioned explicitly with pseudorandom vs random numbers, is that truly random integer sequences could have exactly two (or "n") identical numbers in a row (for example with p=1/1e6 for rolling a thousand-sided die twice) , whereas for pseudorandom sequences, if there are two identical numbers in a row, there must be ONLY that number forever after. Therefore using pseudorandoms pairwise (over the entire range of possible numbers) would never give you the same distribution you would see in actual random sequence.
Only if the the state size = the output size. This is not true of sensible PRNGs.
In the case of banking systems for example with tokens, the seed is the current time since we assume that the laws of physics at the surface of the Earth remain constant such that if your phone clock and the bank systems clocks are synchronized you both get the same generated number through the implemented algorithm. If however one would go a bit higher, let's say low Earth orbit, then the time will differ due to general relativity. You will be unable to login to your bank account from space if you don't correct for the time difference due to earth gravity.
This channel is perfect.
There were a lot of issues regarding pseudo random generators (Java 2013, Sony 2010, OpenSSL 2008), but that doesn't mean hardware generators are better and humans definitely aren't. Hardware generator even those which uses radiation reader could be "hacked" by putting another device near it (ofc it requires physical access to that machine but that's always the case). Numbers/phrases generated by humans are even less random that simple PRNG. It could be determine base on that person knowledge, habits, friends how probabilistic is that he will chose exact number/character in sequence.
Thank god for this video... Teaching me this subject way better than my college professor!
9:58 I have completed a probability course at university and only now I discover this?
As a computer scientist this is unbelievably important.
Lots of RNG engines for programs being activley interacted with (say, a game) use user input as part of the sequence. Is this still psudeorandom, or does the realtime user input make it totally random?
It seems to be a fallacy when the spiraling lengths (1,2,..infinity) ate said to be a cop win when the length of the last one is - infinity. At what point is the overwhelming exception recognized?
The cycle repeats every time you get the same number? Wouldn't it make more sense to have the algorithm be based on the last several numbers instead of the last one? For example, in the linear congruential generator, the increment could be changed to the seed number we got on the previous turn, and the modulo could be the seed we got two turns ago.
That leads to the problem of how to actually start the algorithm (How can you start something that relies on previous turns?) but at the very most, this just means we need multiple seeds, and the modulo, increment, etc. become their own seeds.
If you have X_n = f(X_n-1) and f is the function to define the new psuedorandom number could we just change it to X_n = f(X_n-1 * n) to avoid cycling. Or why dont we slice up sqrt(2) into packs of four and use them as a random sequence?
I hope there's a follow up episode on some of the tests used to gauge a PRNG's statistical randomness. Looking at the 3-dimensional planes generated by LCGs is a pretty direct way to see how bad they are (in that respect).
And then a follow up on how awful the tests are.
10:43 isn't that the sigmoid function? 1 / (1 + e^-x)
Throw some parameters in there to account for scale, mean, and standard deviation, and yes. The sigmoid fits it.
No, it's not. It's the integral from -∞ to x of the probability distribution. In this case, the probability distribution is a gaussian; its corresponding integral doesn't have a closed form in terms of elementary functions.
No. It's the 'erf' function. That's short for "error function". In some situations you can cheat and use any function with that general form, but usually it pays to be accurate.
What is "truly" random in every day life is more of a philosophical question than a mathematical one. If a poker website has a RNG that combines seed from micro temperature fluctuations, internet traffic, number of players on website and other variables, I'm going to call it random for the purpose of me trusting the website to deal me cards "randomly".
Custom hardware for randomness is less and less of a problem, now that basically every consumer computer (including phones) comes with a camera in it. Random movement of electrons in the CCD sensor produces thermal noise that you can use. Note that this does not depend on actual light entering the camera.
Hi Nathan,
you are right, a web cam can be used. I made a web base tool some years ago that works inside the browser an captures the web cam
to generate random numbers.
You can put a sticker on the cam and the sensor still it generates pixel noise.
For poeple interested in it:
retura.de/camrng/camrng.html
Its in german...klick on "web cam aktivieren" to start the capturing and then on START.
The graph then shows the quality or "randomness" of the data, by calculating PI with a monte-carlo simulation and the rate between ones and zeros.
Greetings David
oh I forgot I made an english version too:
retura.de/camrng/camrng_en.html
Nice DIN font choice, fits in nicely.
4:57 Save the random numbers on disk, then on bugtesting, pipe them into the program.
You had a video about normal numbers half a year ago. These are exactly the numbers x in [0,1] such that the sequence x_{n+1}=10 x_n mod 1 equidistribute in [0,1]. I'm not sure how close is the connection to pseudo random numbers, but it is true that almost every number in [0,1] is normal (so it is a "good seed").
(Weird comment from game programmer on getting away with sampling a PRNG with a small period. Ignore as needed :P)
Interesting thing WRT period for procedurally generated content (eg game levels) is it doesn't necessarily have to be particularly large. You are advancing the PRNG through it's sequence every time it is sampled, but you can sample it in such a way that the repetition is not apparent to the player.
For example, if we have a small period PRNG determining the heights of tiles in a level, it will become apparent if the period is significantly less than the number of tiles generated. But if we generate a tile's height and then before we sample for the next tile we sample the PRNG a second time or more ("how many blades of grass are on this tile?" or "Shall we place a monster here?" - "Which direction is the monster facing?") by the time we get to calculating the next tile's hight, the PRNG could be at a (psuedo) random point in it's sequence. The sequence repeats but less noticeably for something the playing will recognise (tile heights) unless the player can view a lot more of the generated level.
This can work against us too though; a program sampling a PRNG can request random numbers in step with it's period, so every tile would have the same height, and the same number of blades of grass, the same monster spawned. Or if the program made periodLength/2 samples for a tile, every ~other~ tile will be the same.
This is largely irrelevant these days since most computers can comfortably generate very long pseudorandom sequences, and do it very quickly. But it is a trick that can be used if careful and when computation power is extremely limited :)
I'm showing this to all my coworkers. Really good way to explain how things work behind the scenes for all those different distributions in various random modules/libraries
But what if we want to create pseudorandom sequence that seem to contain all numbers - that is, any real number between for instance 0 and 1 can be produced? Wouldn't it be possible to acheive this by using a formula that typically produces transcendental numbers? It is true that only certain numbers will be contained, but we don't know which creating the impression that any number can accur. But... Functions that produce transcendental numbers can also contain rational numbers for certain input values. However, I wonder if it is possible to include irrational, non-transcendental numbers as well.
great! worth a second watch to really understand pseudo RNG for newbies like myself
I had a disagreement with my college professor about the randomness of a register on the Atari 8-bit computer. The POKEY chip includes an internal 17-bit polynomial counter from which 8 bits can be read. POKEY advances the counter on every clock cycle, so it's completely independent of the main CPU. In normal operation, the timing of when the CPU reads this register is impacted not just by the number of clock cycles it takes to run the software that reads it but also by interrupts from such things as video refresh and keyboard events. Can this register be considered truly random or not?
Nope. Set an interrupt on a timer at (2^17)-1 clocks and read the value on each interrupt.
I have seen some pseudo random number generators that use a bell-curve distribution, given its mean and the standard deviation. However, I don't quite understand how they generate the numbers, because they seem to be unbounded, differently from the first two algorithms shown in this video.
What about a version of the Linear Congruential Generator that has an increment increment -- ie the increment value is increased periodically.
Surely, this would make having a repeated output a more harmless outcome. If you ended up back on a number you've already visited, you're unlikely to repeat a complete chain - just revisit a single number and move on.
Would this be more, or less likely to return to a duplicate number though, even if it doesn't cycle in the same way?
QuannanHade theeeen
Its not a LINEAR congruential gradient anymore, more like
Quadratic congruential gradient
Thanks indeed,, could we have an episode on extractor functions? like how to simulate a fair dice using unfair one.
Ahmed Almutairi, oh, that's a great topic. Yes please!
Thank you very much for this video! It was very clear and informative for me :)
For random number generation in computers, can we use the modulus of no. of processor cycles passed during a long calculation?
It's dangerous getting your randomness from something other users can affect. If I want to push around your random number generation, all I have to do is calculate the right things at the right time to delay your calculation.
On computers with spinning-platter hard drives, air turbulence in the drive affects the seek time, so that's a pretty good source of randomness -- as long as you're getting the timing from your own seeks. If you'll take any seek, again all I have to do is flood the system with disk access requests of my choice, whatever that particular drive can do most consistently.
[00:59] but you also want the other statistical moments to be enforced...
[01:55] "not reproducible"-a wrong context-the seed can be recorded...
[05:41] the MSA appears to have strong low-frequency spectral energy...
First of all - Thanks for making such great videos for free. I have a question - How can we generate non-uniform distribution between [0,1]?
Very cool, I had never heard of inverse transform sampling before. I can think of a few programming applications where it would have been useful. Thanks!
Couldn't you increase the periodicity of any PRNG greatly by making the parameters (e.g. the a, c and maybe m in the linear congruential generator) also outputs of their own PRNG, initialised with their own seeds? It would still be deterministic, but determined by the combination of initial seed values of all inputs.
Yes you can have hidden state or internal counters that increment.
Sometimes. Other times, you end up with the whole combination falling into a (much) shorter orbit, while also being significantly slower...
The modulus and multiplier parameters are very important in determining the period of the function. Bad values can result in a very short period, so they need to be chosen carefully; randomly chosen values aren't necessarily good enough. I think you get better periods from primes, but I'd have to look it up to confirm.
That was a great video!
Love your videos!🤗
So if there are certain values that a Pseudo RNG can never get, how does it get around this to appear truly random? How do the RNG people make the PDF as close to uniform as possible and what does the PDF of those actually look like? What about repeating values? For example, how do I randomly generate two or more numbers in a row? What's the method behind that?
Great questions. These are exactly the properties we make sure an RNG has. If you are following NISTs prescriptions, you can look at the CTR-DRBG algorithm. This uses AES in CTR mode, which never returns the same value twice (because it's passing a counter into a bijective mapping). The state is the key K and counter V (for vector). To make it able to return values from the full set of output possibilities the K & V state is randomized by passing it through AES a couple of times (the details are in the spec) so the bijective mapping changes to a new one and so values can be repeated with probabilities equal to that of fully entropic bits. There are many other designs. This is the most common one.
Inverse Transform Sampling should also generate pseudo normal distribution because probability at each point is different but in this case, it will be constant for given ranges however short we go.
Awesome video🔥
In the middle square algorithm, wouln't a small value 'jam' it (leading to an endless string of zeroes)? Any number less than 100 would yield a number lower than itself, eventually leading to zero. 100 itself also yealds itself, creating another loop.
Why don't you use a random number generator to choose between spelling "yield" and spelling "yeald" ?
@@simonmultiverse6349 Man, you went through all that trouble to comment on a typo from three years ago?
@@whycantiremainanonymous8091 I've been waiting for three years and I couldn't take it any more!
I wish I paid attention in maths.
what variables do i have to use to get the uniform [0, 1] in the linear congruential generator ?
That's weird, I was just thinking about this this morning. Perfect!
And I was watching a video from GDC of some game devs talking about procedural content, which usually makes use of pseudorandom numbers just this afternoon.
But how is the seed obtained in the first place? There are calculators which produce (pseudo)random numbers and don't have a clock or anything like that from which they could get a seed
Question: Would a truly random number generator have to use quantum mechanical randomness? Even thermodynamical noise is not truly random. Right?
Hi, I’m trying to simulate gas particles in a 2D box, and would like to generate velocities from the 2D Maxwell-Boltzmann distribution. To do this I’ve changed the probability density function into the cumulative density function, which I found to be -e^(-av**2), with a = m/2kT. However plotting this function against velocity I get probabilities between -1 and 0, which can’t be right. I was wondering if maybe you knew either why the function goes from -1 to 0 and/or if there is something I could do to ‘fix’ it. Thank you in advance and great video btw!
Well, I was planning on doing Perlin Noise today anyway. I guess I'll write my own PRNG too.
I have a group of random numbers but I will like to know how to calculate so I can get what next random number to appear
Why not having 2 seeds (or more) and the sequence depending on 2 (or more) terms of the series? Wouldn't that increase the period of the pseudorandom pattern? I mean if a part of the series shows up as 257, 469 and another part shows up as 319 and 469, these are two different parts of the series. Wouldn't that improve the algorithm?