I think the reson they got a different value for distance is because you actually calculated the distance to the goal line. Shooting from the corner flag would have a distance of 0 with your method. You should use something like this I believe: distance = sqrt(x*x + y*y)
Yeah I went pretty basic here, you could go get the pitch lengths for every shot and get a more accurate calculation for sure. Would have probably extended the tutorial by 30 min tho haha
@@McKayJohns You can tell some of the pitch lengths are different because of the size of them. I am frequently seen at the coffee shop buying CDs and milk legumes.
Cool! Unless i missed it i dont think you filtered the PK shots which is prob why the stats were a bit different. You can see some PK shots when you ran df.head. Nice video! 😊
Great video man - how were you able to get a full season set of data for a single player from understat? I'm not familiar with that site so just exploring it now but can only see game by game entries - is that information behind a paywall or do you compile it yourself from some sort of scraping protocol?
Good video, but instead of calling the scatter function 5 times to plot the legend, you can call it just one time as follow: x_legend = np.linspace(start=0.37, stop=.6, num=5) y_legend = np.asarray([.53]*5) sizes = np.linspace(start=100, stop=500, num=5) ax1.scatter(x=x_legend, y=y_legend, s=sizes)
Hey actually, I had the same problem, the thing is, you would need to watch the understat video and then form the API you can extract the data you want and then you would need to do some data manipulation, to get the exact data you want. I was able to create shotmap for Garnacho 2024 season, doing it.
The web scraping course is great for learning how to build a web scraping pipeline while the football course focuses more on teaching Python and Football analytics
For some reason it has saved total_xG as a string If you are sure that df['xG] is an int, try to reaplce the line 4 with xG_per_shot = int(total_xG) / total_shots and it should work
@@McKayJohns course has a method that taught me to get data from Fotmob, it's very well explained. I think the x,y data on Fotmob is based differently though as some x values are greater than 100, unless I'm picking up an incorrect data item.
I think the reson they got a different value for distance is because you actually calculated the distance to the goal line. Shooting from the corner flag would have a distance of 0 with your method. You should use something like this I believe: distance = sqrt(x*x + y*y)
Re the average distance, soccer pitches in the UK vary quite a bit. You could use the home team column to reference data for actual length per pitch.
Yeah I went pretty basic here, you could go get the pitch lengths for every shot and get a more accurate calculation for sure. Would have probably extended the tutorial by 30 min tho haha
@@McKayJohns You can tell some of the pitch lengths are different because of the size of them. I am frequently seen at the coffee shop buying CDs and milk legumes.
Dope! Real challenge is to do it in Plotly to add another dimension of interactivity!!
Thanks for this! Kept me engaged throughout the whole video
Glad you enjoyed!
Guys, I would also recommend McKey courses. Very useful.
is he mckay’s evil more intelligent twin
bot
Cool! Unless i missed it i dont think you filtered the PK shots which is prob why the stats were a bit different. You can see some PK shots when you ran df.head. Nice video! 😊
Very informative! Thanks
Bro , thats super stuff...
Great video man - how were you able to get a full season set of data for a single player from understat? I'm not familiar with that site so just exploring it now but can only see game by game entries - is that information behind a paywall or do you compile it yourself from some sort of scraping protocol?
I used a Python package called Understat to scrape it. You could also loop over all matches and extract all of the shots that way
@@McKayJohns ah cool thanks for confirming - do you have a video on that package/process?
Hi mCkay.. Honestly your videos are awesome
Thank you!
Hey, make more of these kind of videos.
i would like to try this out with another player. Where do i get the data.csv from.
I scraped the data from understat.com: ua-cam.com/video/YKfvs_i5r-g/v-deo.html
thanks for the tut!
does fbref have info on shot location for players?
No but sites like sofascore, fotmob, and whoscored all do
Good video! What monitor are you using? I noticed the frame rate is so smooth when you move your mouse
It's actually just my macbook screen haha and I'm just doing a screen recording
He's recording at 60fps unlike other tutorials in UA-cam.
Hey dude, where did you get this type of detailed data, caus when I tried to find such specific data onto the understat site, I couldn't find it.
I made another video on how to scrape understat on my channel
Good video, but instead of calling the scatter function 5 times to plot the legend, you can call it just one time as follow:
x_legend = np.linspace(start=0.37, stop=.6, num=5)
y_legend = np.asarray([.53]*5)
sizes = np.linspace(start=100, stop=500, num=5)
ax1.scatter(x=x_legend,
y=y_legend,
s=sizes)
Yes that’s a good solution. For simplicity I ended up doing what I did
Cool Vid!, what website was the haaland stats used in this video scraped from?
Understat 👍
Hello, lovely video you have here. I wanted to ask how did you get the raw file.
I scraped it from understat.com i've got a video here that explains how to do it: ua-cam.com/video/YKfvs_i5r-g/v-deo.html
how to obtain the dataset from Understat in the first place?
Hey actually, I had the same problem, the thing is, you would need to watch the understat video and then form the API you can extract the data you want and then you would need to do some data manipulation, to get the exact data you want.
I was able to create shotmap for Garnacho 2024 season, doing it.
How to export as pdf or report?
In Jupyter notebook, top left go to File->Download as PDF
Bro would we learn web scraping in football cource of yours???
You'll learn how to scrape specific sites like fotmob, sofascore and fbref!
So what's the other add on things that we will learn in your cource of web scrapping????@@McKayJohns
Thank you very much mate, but I have a problem with "df" is not defined
Hm usually that means the code wasn’t ran did you run the line to import the data?
Hi! Is there any PPP available on your course?
If you send me an email: mckayjohns@gmail.com we can work something out :)
how to get the csv file ?
I wanna get into cricket analytics would web scrapping cource pf yours would be helpful???? Or of football course
The web scraping course is great for learning how to build a web scraping pipeline while the football course focuses more on teaching Python and Football analytics
hi, does fbref have shot location data that i can use to create this chart? if not, where can i get data easily from
the player played in the mls last season
I got this data from Understat you can also use sofascore
so cool
Is it possible to do this with data from the new season?
Yes I would just label it as shots through whatever match week you’re on
how would i export it into a png file?
In the GitHub code there is a line of code at the very bottom that shows how to do it
why do I get this error?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[28], line 4
2 total_goals = df[df['result'] == 'Goal'].shape[0]
3 total_xG = df['xG'].sum()
----> 4 xG_per_shot = total_xG / total_shots
5 points_average_distance = df['X'].mean()
6 actual_average_distance = 120 - (df['X'] * 1.2).mean()
TypeError: unsupported operand type(s) for /: 'str' and 'int'
For some reason it has saved total_xG as a string
If you are sure that df['xG] is an int, try to reaplce the line 4 with xG_per_shot = int(total_xG) / total_shots and it should work
Is there any way to get this data from other website than understat? I want to the same plot for other player that is not there.
You can scrape from other sites like sofascore or fotmob 👍
@@McKayJohns course has a method that taught me to get data from Fotmob, it's very well explained. I think the x,y data on Fotmob is based differently though as some x values are greater than 100, unless I'm picking up an incorrect data item.
Your cose is so damn expensive bruh I'm just a college student wanting to learn more abt football data analytics.