**Corrections** Slide 5: Due to a mistake in my code, plot shows PDF of Pareto with α = -0.84 instead of 1.16 😅 (GitHub repo and blog have been corrected) Slide 6: Power Law PDF should have exponent -(α+1) not -(α-1). Accordingly α values in legend are off by 2 i.e. α = -0.84, 0, 1 Slide 18: fuhgetaboutit (α
Nice. Power laws are modelled very well by graphs - as you probably are aware of as a physicist. There is a lot of pioneering work in this area by Mark Newman and Lara Adamic. From a practical standpoint, when you perform learning over graphs that correspond to fat tails, you have some nodes that are very highly connected and most others have small degrees. The crux of learning over graphs is that the prediction or model for each node is only influenced by its neighbors. For high degree nodes, you sample a fraction of the nodes to predict, but for regular nodes with few neighbors, you consider a 1 or 2 hop neighborhood. There are reasonably good graph ML libraries now - pytorch Geometric, for example, or, DGL. An algorithm called GraphSAGE can be a good candidate. Two really good, practical papers to read on this subject are: "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction", by Wiess/Provost and "Thresholding for Making Classifiers Cost-sensitive" by Sheng/Ling. Cheers, enjoyed this. @ShawlinTalebi
Thanks for the recs Rajiv! Newman's "Networks" textbook was actually the first time I learned about power law distributions. I added those to my reading list :)
At 4:22 what exactly is x and what is P(X=x)? In the Gaussian example I would expect x to be wealth that increases from left to right and y to be the number of observations. Yet with that in mind, shouldn't the Pareto curve be such that x=w from low to high, with at the bottom a lot of observations and only very few at the high end of wealth? How is it that it typically explained that the 20% of the bottom end of the x-axis have 80% of everything?
Good question. Supposing wealth follows the given Pareto distribution, X could be net worth and P(X=x) is the probability that a random selection from the population will have net worth equal to x. The long tail here reflects the intuition that there are many more people with net worths around $100,000 than $100,000,000, since the as x increases P(X=x) decreases. The 80-20 rule comes in when you add up all the wealth of to the top 20%, you would get about 80% of the total wealth in the population. Hope that clears things up, happy to expand on any point.
@@ShawhinTalebi thanks for your reply Shawhin. Adding up would mean an integral between certain values of w. So P(w)dw between w_min and some cut off w*. Wouldn't this then say that 80% of all people (high P(w)s) have only 20% of all wealth (correlating to the small bandwidth of w's on the x-axis from left to right? (and not vice versa which is what I see everywhere) btw shouldn't the alpha be negative? If it was positive the curve would be upward sloping. Sorry for all the questions. I am writing a book called Influencers and Followers. I believe that the way neurons fire when we choose (using power law) also influence wealth distribution at the society level (also a power law with alpha -1.16).
That's right, 80% would have only 20% of wealth. Good question. This is a matter of convention. Here, I define a power law as: ~x^-(a-1), so the negative is baked into the definition. Sounds like an interesting book!
Nice to hear about this idea is such a short video which took Taleb volumes and most of readers like us never getting that this is what he was talking about 😜
@@ShawhinTalebi already watched probably 40% of your videos, but almost all of the long ones except the causality playlist which im saving for a time i can sit and really watch.
@@ShawhinTalebi off the top of my head anything on the intersection of time series/signal analysis and stats/machine learning, and also more info on what exactly you do as a contract data scientist. Maybe some info on if/how fresh undergrads could get into that or find a mentor in that space (without breaking the bank, i wish i could do what you did and hire out time to talk to people but its not in the cards just yet). I especially like your nice mix of formal treatment and personability, so many channels that treat things more rigorously are impossible to listen to for a whole lecture without backgrounding the audio and missing most of it. I also really like how you mention your specific sources vocally and actually put them all in the description.
@@chrstfer2452 Thanks this is super helpful! I've been putting together ideas on a times series forecasting/classification series. Any suggestions are appreciated. For breaking into freelance, I have an article on the subject which I'll make into a video based on your feedback: medium.com/the-data-entrepreneurs/how-to-start-freelancing-in-data-science-150551f25fda We also recently put out videos about this on The Data Entrepreneurs channel: www.youtube.com/@TheDataEntrepreneurs/videos If you ever want to chat about data science/entrepreneurship, feel free to set up some office hours: calendly.com/shawhintalebi/office-hours
**Corrections**
Slide 5: Due to a mistake in my code, plot shows PDF of Pareto with α = -0.84 instead of 1.16 😅 (GitHub repo and blog have been corrected)
Slide 6: Power Law PDF should have exponent -(α+1) not -(α-1). Accordingly α values in legend are off by 2 i.e. α = -0.84, 0, 1
Slide 18: fuhgetaboutit (α
This was probably one of the best explanations I could've viewed on stats as someone with mediocre understanding of the subject.
Glad it was clear :)
Nice. Power laws are modelled very well by graphs - as you probably are aware of as a physicist. There is a lot of pioneering work in this area by Mark Newman and Lara Adamic. From a practical standpoint, when you perform learning over graphs that correspond to fat tails, you have some nodes that are very highly connected and most others have small degrees. The crux of learning over graphs is that the prediction or model for each node is only influenced by its neighbors. For high degree nodes, you sample a fraction of the nodes to predict, but for regular nodes with few neighbors, you consider a 1 or 2 hop neighborhood. There are reasonably good graph ML libraries now - pytorch Geometric, for example, or, DGL. An algorithm called GraphSAGE can be a good candidate. Two really good, practical papers to read on this subject are: "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction", by Wiess/Provost and "Thresholding for Making Classifiers Cost-sensitive" by Sheng/Ling. Cheers, enjoyed this. @ShawlinTalebi
Thanks for the recs Rajiv! Newman's "Networks" textbook was actually the first time I learned about power law distributions. I added those to my reading list :)
At 4:22 what exactly is x and what is P(X=x)? In the Gaussian example I would expect x to be wealth that increases from left to right and y to be the number of observations. Yet with that in mind, shouldn't the Pareto curve be such that x=w from low to high, with at the bottom a lot of observations and only very few at the high end of wealth? How is it that it typically explained that the 20% of the bottom end of the x-axis have 80% of everything?
Good question. Supposing wealth follows the given Pareto distribution, X could be net worth and P(X=x) is the probability that a random selection from the population will have net worth equal to x. The long tail here reflects the intuition that there are many more people with net worths around $100,000 than $100,000,000, since the as x increases P(X=x) decreases.
The 80-20 rule comes in when you add up all the wealth of to the top 20%, you would get about 80% of the total wealth in the population.
Hope that clears things up, happy to expand on any point.
@@ShawhinTalebi thanks for your reply Shawhin. Adding up would mean an integral between certain values of w. So P(w)dw between w_min and some cut off w*. Wouldn't this then say that 80% of all people (high P(w)s) have only 20% of all wealth (correlating to the small bandwidth of w's on the x-axis from left to right? (and not vice versa which is what I see everywhere)
btw shouldn't the alpha be negative? If it was positive the curve would be upward sloping. Sorry for all the questions. I am writing a book called Influencers and Followers. I believe that the way neurons fire when we choose (using power law) also influence wealth distribution at the society level (also a power law with alpha -1.16).
That's right, 80% would have only 20% of wealth.
Good question. This is a matter of convention. Here, I define a power law as: ~x^-(a-1), so the negative is baked into the definition. Sounds like an interesting book!
This is some really good stuff! Thanks!
Nice to hear about this idea is such a short video which took Taleb volumes and most of readers like us never getting that this is what he was talking about 😜
😂😂 The volumes are usually a prerequisite for a more concise description.
Most of this video is me parroting things I learned from Taleb.
Absolutely @@ShawhinTalebi hardly anyone reads original works of even paradigm-shifters, all we have is regurgitate versions, we just Kant do it 🤪
15:25 years later I understand why my model wouldn't fit / regress to the data, thank you 🤣🤣
We have @nntalebproba to thank XD
Just so I’m clear Regression doesn’t work well in any situation or just power laws?
Regression doesn't work for Power Laws. There are many situations where it works great, especially if the data are normally distributed.
great video, thank you, sound very low though!
Thanks! Sorry about the audio 😬
How can we financially support you? Do you have Patreon?
Thank you for your generosity :)
I currently accept caffeinated beverages here: www.buymeacoffee.com/shawhint
Amazing video
Addicted to this channel.
I hope it's one of those good addictions 😅
@@ShawhinTalebi already watched probably 40% of your videos, but almost all of the long ones except the causality playlist which im saving for a time i can sit and really watch.
Thanks for watching. Feel free to reach out with any questions or suggestions for future content :)
@@ShawhinTalebi off the top of my head anything on the intersection of time series/signal analysis and stats/machine learning, and also more info on what exactly you do as a contract data scientist. Maybe some info on if/how fresh undergrads could get into that or find a mentor in that space (without breaking the bank, i wish i could do what you did and hire out time to talk to people but its not in the cards just yet).
I especially like your nice mix of formal treatment and personability, so many channels that treat things more rigorously are impossible to listen to for a whole lecture without backgrounding the audio and missing most of it. I also really like how you mention your specific sources vocally and actually put them all in the description.
@@chrstfer2452 Thanks this is super helpful! I've been putting together ideas on a times series forecasting/classification series. Any suggestions are appreciated.
For breaking into freelance, I have an article on the subject which I'll make into a video based on your feedback: medium.com/the-data-entrepreneurs/how-to-start-freelancing-in-data-science-150551f25fda
We also recently put out videos about this on The Data Entrepreneurs channel: www.youtube.com/@TheDataEntrepreneurs/videos
If you ever want to chat about data science/entrepreneurship, feel free to set up some office hours: calendly.com/shawhintalebi/office-hours
21:00 Give me that meme
😂😂😂
I shared it here: www.linkedin.com/posts/shawhintalebi_statistics-8020rule-fattails-activity-7132748486512447488-waTm?
@@ShawhinTalebi Love it!