- 194
- 102 029
TidyX
United States
Приєднався 1 бер 2020
TidyX is a screen cast where we discuss how Data Science topics and code work line-by-line, explaining what they did and how the functions they used work. We also break down the visualizations they create and talk about how to apply similar approaches to other data sets. The objective is to help more people learn R and get involved in the TidyTuesday community.
The hosts are Ellis Hughes (@ellis_hughes) and Patrick Ward (@OSPpatrick).
Ellis has been working with R since 2015 and has a background working as a statistical programmer in support of both Statistical Genetics and HIV Vaccines. He also runs the Seattle UseR Group.
Patrick's current work centers on research and development in professional sport with an emphasis on data analysis in American football. Previously, he was a sport scientist within the Nike Sports Research Lab. Research interests include training and competition analysis as they apply to athlete health, injury, and performance.
The hosts are Ellis Hughes (@ellis_hughes) and Patrick Ward (@OSPpatrick).
Ellis has been working with R since 2015 and has a background working as a statistical programmer in support of both Statistical Genetics and HIV Vaccines. He also runs the Seattle UseR Group.
Patrick's current work centers on research and development in professional sport with an emphasis on data analysis in American football. Previously, he was a sport scientist within the Nike Sports Research Lab. Research interests include training and competition analysis as they apply to athlete health, injury, and performance.
Gapminder Camcorder - Be Kind Rewind Code Explanation | TidyX Episode 186
TidyX Episode 186: Gapminder Camcorder - Be Kind Rewind Code Explanation
In this episode, we start a series explaining the examples Ellis showed from his "Be Kind, Rewind" talk at posit::conf(2024). First up, we jump into creating a captivating animated visualization of the Gapminder dataset using R and the {camcorder} package. We break down the code step-by-step, from setting up the animation recording to customizing the plot aesthetics. Learn how to generate smooth and informative animations that tell a compelling story about global trends in GDP per capita and life expectancy.
Join us for an episode that will have your simulations running at the speed of light! ️
Like, Subscribe, and find us on social media! (@ellis_hughes, @OSPpatrick, @tidy_explained).
If you like what we are doing, please sign up to be a patron on Patreon!
www.patreon.com/Tidy_Explained
Email us with any comments, questions, or suggestions at tidy.explained@gmail.com.
Links:
Open an issue on the TidyX Github page!
github.com/thebioengineer/TidyX/issues
Patreon:
www.patreon.com/Tidy_Explained
TidyX Code:
github.com/thebioengineer/TidyX/tree/master/TidyTuesday_Explained/186-Gapminder_Camcorder
In this episode, we start a series explaining the examples Ellis showed from his "Be Kind, Rewind" talk at posit::conf(2024). First up, we jump into creating a captivating animated visualization of the Gapminder dataset using R and the {camcorder} package. We break down the code step-by-step, from setting up the animation recording to customizing the plot aesthetics. Learn how to generate smooth and informative animations that tell a compelling story about global trends in GDP per capita and life expectancy.
Join us for an episode that will have your simulations running at the speed of light! ️
Like, Subscribe, and find us on social media! (@ellis_hughes, @OSPpatrick, @tidy_explained).
If you like what we are doing, please sign up to be a patron on Patreon!
www.patreon.com/Tidy_Explained
Email us with any comments, questions, or suggestions at tidy.explained@gmail.com.
Links:
Open an issue on the TidyX Github page!
github.com/thebioengineer/TidyX/issues
Patreon:
www.patreon.com/Tidy_Explained
TidyX Code:
github.com/thebioengineer/TidyX/tree/master/TidyTuesday_Explained/186-Gapminder_Camcorder
Переглядів: 255
Відео
Independence Days with {purrr} | TidyX Episode 185
Переглядів 4964 місяці тому
TidyX Episode 185: Independence Days with {purrr} Using Wikipedia's list of independence days, we'll show you have to use some advanced {purrr} to work with the data, construct new functions, and work with extracted data from webpages to transform it into usable formats. We aim to answer the amusing quip that every 4 days a country celebrates its independence from the UK with this dataset! Join...
Hello Kitty: Intro to {purrr} | TidyX Episode 184
Переглядів 5265 місяців тому
TidyX Episode 184: Hello Kitty: Intro to {purrr} This intro highlights purrr's core functionalities and different ways to write the functions, from named to anonymous functions, keeping types consistent, or even applying functions to filter and pull out contents from lists. Learn the basics to understand how we can apply these techniques to more complicated structures! Join us for an episode th...
Within-group regression using {purrr} | TidyX Episode 183
Переглядів 4225 місяців тому
TidyX Episode 183: Within-group regression using {purrr} Unleash the power of {purrr} to perform within-group regressions! This episode we'll explore fitting separate linear models for different groups in your data, using the Palmer Penguins dataset as an example. Using map(), we'll quickly build models, extract key statistics, and visualize how groups differ. Join us to start the journey on ma...
Turbocharge Your Simulations with Parallel Processing! ⚡️ | TidyX Episode 182
Переглядів 2365 місяців тому
TidyX Episode 182: Turbocharge Your Simulations with Parallel Processing! ⚡️ Ever feel like your simulations take forever to run? This TidyX episode injects a dose of speed with parallel processing using the snowfall package! We'll revisit nested for loops for simulation, then supercharge them to run across multiple cores. Learn how to run simulations in parallel for faster results using the sn...
I Likert Coffee | TidyX Episode 181
Переглядів 3395 місяців тому
TidyX Episode 181: I Likert Coffee Calling all coffee lovers! ☕️ This episode of TidyX gets to the grounds of coffee expertise with a TidyTuesday survey. We'll brew up some data analysis to see if age affects how people rate their coffee knowledge. Get ready for Likert scales, wrangling data, and statistical throwdowns to see which age group claims coffee crown! Like, Subscribe, and find us on ...
How much stuff have we sent to Space? | TidyX Episode 180
Переглядів 1746 місяців тому
TidyX Episode 180: How much stuff have we sent to Space? Ever wondered how much stuff has rocketed into space? This episode we do a 180 and look at how we started TidyX by looking at a TidyTuesday dataset to explore objects launched into space! We'll learn how to wrangle the data, calculate launch counts by year, and create visualizations with ggplot2. Plus, we'll discover a cool trick for face...
How many SpaghettiOs does it take to write LOTR? | TidyX Episode 179
Переглядів 5226 місяців тому
TidyX Episode 179: How many SpaghettiOs does it take to write LOTR? we embark on a hilarious journey to answer the age-old question: how many SpaghettiOs would it take to write a whole book? Inspired by abstract_tyler's instagram reel ( pC6hUeRVp24H/), the we use the power of R to find out! Prepare for some serious spaghetti-fueled fun as we delve into the world of R for data wran...
Player Time Chart - FIBA API Part 2 | TidyX Episode 178
Переглядів 1507 місяців тому
TidyX Episode 178: Player Time Chart - FIBA API Part 2 In this follow-up to Episode 177, we dive deeper into the intricacies of FIBA basketball game data. Building upon our previous exploration, we refine our methods to generate insightful player time charts. Join us as we unravel the complexities of lineup analysis and visualize player dynamics over the course of a game. Get ready for another ...
Who's Next? FIBA API Viewer Question | TidyX Episode 177
Переглядів 1588 місяців тому
TidyX Episode 177: Who's Next? FIBA API Viewer Question We tackle a real-world challenge brought to us by our viewer, Cohen MacDonald. Coehn found an undocumented API that has a bunch of game data from FIBA and has some great ideas on what to do with it. However, theres one problem: the dataset does not contain which players are on the court at what time, just who subs in or out. With an intrig...
Are you sure? | TidyX Episode 176
Переглядів 3558 місяців тому
TidyX Episode 176: Are you sure? In this episode, we're comparing pitchers using the power of Random Forests and bayesian statistics to make comparisons between pitchers likelihood of making it into the Hall of Fame! We show how to make simple simulations of individual player performance and differences, and finally make a function to let you easily compare players. Like, Subscribe, and find us...
Random Strike Zone Forest: Tidyverse Takes on Hall of Fame Hurlers | TidyX Episode 175
Переглядів 2758 місяців тому
TidyX Episode 175: Random Strike Zone Forest: Tidyverse Takes on Hall of Fame Hurlers We explored the world of data modeling using Tidyverse and Purrr to predict the next MLB Hall of Fame pitchers. Stay tuned for some fascinating insights into our modeling process! We use the same datasets as we have the last several weeks, and apply logic and code to create, evaluate, and tune our models. Like...
AI Speed Ball: Predicting the 2024 Pitcher HOF Class in 20 Minutes | TidyX Episode 174
Переглядів 2039 місяців тому
TidyX Episode 174: AI Speed Ball: Predicting the 2024 Pitcher HOF Class in 20 Minutes We're bringing the heat with AI! Join us as we step up to the plate and predict the next MLB Hall of Fame pitchers using the power of TensorFlow and Keras. With a killer convolutional neural network in our arsenal, we're ready to knock it out of the park! We go over normalization techniques, how to set up your...
Pitch into the Bayes - "20" minute MLB Hall of Fame Pitchers predictions | TidyX Episode 173
Переглядів 3349 місяців тому
TidyX Episode 173: Pitch into the Bayes - "20" minute MLB Hall of Fame Pitchers predictions Step up to the mound in TidyX Episode 173 as we predict MLB Hall of Fame pitchers using the power of Bayesian models! Join us as we switch up our game plan, leaving no curveball unturned with rstanarm. We inspect the models and results with prediction intervals and probabilities, bringing a new dimension...
20 minutes to Predict MLB HOF Pitchers - Class of 2024 | TidyX Episode 172
Переглядів 2529 місяців тому
TidyX Episode 172: 20 minutes to Predict MLB HOF Pitchers - Class of 2024 Join us as we look into the numbers behind predicting MLB Hall of Fame pitchers! This episode includes crafting a dataset from the {Lahman} package, creating logistic regression models, and finally assessing them via model summary tools and visualizing techniques. Stay tuned for insights and adjustments as we navigate the...
Bae in the Fast Lane - Bayesian linear regression in 20-Minutes |TidyX Episode 171
Переглядів 54210 місяців тому
Bae in the Fast Lane - Bayesian linear regression in 20-Minutes |TidyX Episode 171
Beyond Basic For loops: Tidy Expressions | TidyX Episode 170
Переглядів 60310 місяців тому
Beyond Basic For loops: Tidy Expressions | TidyX Episode 170
Predicting Hall Of Famers in 20 Minutes | TidyX Episode 169
Переглядів 60010 місяців тому
Predicting Hall Of Famers in 20 Minutes | TidyX Episode 169
Hall of Fame Showdown - Base R Plot Edition | TidyX Episode 168
Переглядів 21311 місяців тому
Hall of Fame Showdown - Base R Plot Edition | TidyX Episode 168
Grand Slam - Knocking it Out of the Park with Base R Density Plots | TidyX Episode 167
Переглядів 22911 місяців тому
Grand Slam - Knocking it Out of the Park with Base R Density Plots | TidyX Episode 167
The Line Plot Saga | TidyX Episode 166
Переглядів 25311 місяців тому
The Line Plot Saga | TidyX Episode 166
The Power of Plotting Compels You | TidyX Episode 165
Переглядів 47311 місяців тому
The Power of Plotting Compels You | TidyX Episode 165
Advanced Shiny - Running Multiple Linked Shiny Apps | TidyX Episode 164
Переглядів 872Рік тому
Advanced Shiny - Running Multiple Linked Shiny Apps | TidyX Episode 164
Creating Player url links in datatable and Shiny | TidyX Episode 163
Переглядів 505Рік тому
Creating Player url links in datatable and Shiny | TidyX Episode 163
Advanced Shiny - Web Scraping and Dynamic Linking | TidyX Episode 162
Переглядів 630Рік тому
Advanced Shiny - Web Scraping and Dynamic Linking | TidyX Episode 162
Shinylive - Is this thing on? | TidyX Episode 161
Переглядів 2 тис.Рік тому
Shinylive - Is this thing on? | TidyX Episode 161
R Packages - Feeling a little Testy | TidyX Episode 159
Переглядів 309Рік тому
R Packages - Feeling a little Testy | TidyX Episode 159
R Packages - Write a vignette for yourself | TidyX Episode 158
Переглядів 379Рік тому
R Packages - Write a vignette for yourself | TidyX Episode 158
Say it would be great if you created a update to this video because some of the functions used changed.
The excel sheet looks dreadful.
I find the part with cbind confusing.
You guys are code ninjas!
Why not use lubridate for dates? By the way I like the discussion around adressing the different issues that arise in attempting to solve a given problem.
I know wikipedia allows scraping but that is not true for other sites. I typically check the permissions with the bow(url) function from the polite package.
Does broom work with lists or only data frames?
I got the p-values with the following code: mutate(model_p = map(model_summary, ~.x$coef[1 , 4]) ) %>% unnest(cols = model_p)
You can simply add as.character(x) to the function (map_chr) without using the pipe.
The . in split does not work on my end if I use the base pipe.
Thank you guys! this was really helpful
Flame tree next, please.
gganimate? it will be great one comparison between packages
very cool and so useful simply because event-level dataframes are ubiquitous (especially in a work/company setting) especially with fetching all this data from a nested/tree like structured jsons - you have a big fan in me here gentlemen - thanks !! (i'm willing to bet Cohen is a data analyst/scientist for a company involved in sports/NBA bookkeeping or analytics lmao)
great stuff as always ! clear and easy to understand what easily can be confusing so, thanks ! (very) unrelated to this but would you be interested in diving (either deep) or as an introduction into the logger package, i think it's created with the intent of mimicking python's version and while i find python's pretty straightforward, for some reason R's version is a little more obscure/harder to grasp for me
Thanks for the comment! I'll have to look at the logger package a bit again - I think I've used it in the past, but there may be a few other things/concepts you need to know before it makes sense ~ Ellis
@@TidyX_screencast no worries if you can't of course - thanks Ellis either way !
Great content 🎉
Why was the panel data package (plm) or LSDV Reg method not used?
Nice video! Do you know how the modify() function works in purrr? I really struggle to see the difference with the map() function.
You can try use the \(df) notation for anon functions. purrr::map(my_list, \(df) lm(y~x1+X2, data = df)) This function expects a list of data frames called my_list. It then regress in each of these data frames y against x1+x2, and specifies each of the data frames in the list as the data for the regression. It is the same as what you guys did, but i think this notation was introduced and is now considered better than ~ .x notation
Great job!
This was great! Pretty concise and learnt a lot. Cleaning data is a lot less time consuming and intuitive for me now. Didn't know base R was so good at dealing with strings
Thanks for the comment! R handles strings quite well in a variety of different ways. We just scratched the surface here ~ Ellis
Take the code I commented on episode 144 and replace purrr::map() functions with furrr::future_map() functions.
Awesome. One package I like a lot is the furrr package. You use the function "future_map" as you do with the "map" function from the purrr package, but in parallel. Pretty easy.
Great approach for doing multiple comparisons. Could you not just replace filter(r1 != r2) with filter(as.numeric(r1) > as.numeric(r2))? I think broom::tidy() on the output of the t test might have made it a bit easier to combine and extract the data you wanted. although your approach works fine.
Haven't tried that. Will give it a shot! Thanks! ~patrick
The same solution came to my mind last week for a similar problem, filter the indices if i>j just gave me the upper triangle (except diag.) of the Cartesian product matrix. 👍
Good stuff gents. Appreciate the extra package shared tho.
Another great episodw. Looking forward to the episode 200 party edition! In your plots the y axis was count data. It irritates me that ggplot will often put decimal points on scales even if you bother to define the variable as an integer. For one plot it's not too hard to manually input the breaks etc. Is there an easy way of getting round this problem when you are generating lots of plots, as in your example? It should be as simple as saying "this is integer data, 10.5 is meanigless!"
Great epidode! I learnt the invisible function which seems particularly appropriate for an episode on the Lord of the Rings!
I hadn't even thought about how well that worked! That's great! ~ Ellis
How do you introduce and make predictions on other new dataset (not the test data)
Any plans to do any more NBA stuff? I'd love to see something like trying to predict some of the awards as the season is winding down, maybe Most Improved player as predicted using ML - similar to the HOF pitching
This was a fun one
Great video, guys! And it's great that the baseball season around the corner, so lots of people should be itching for new data points (I mean, games lol)
Excellent showcasing but we also need to know more about map function of purr
🐈🐈
now do barry bonds
I'm not sure if the "A Aron" was a Key and Peele joke or not, but an interesting video either way
You guys rockin', keep it up!
Good work. You guys can get the innings pitched if you divide the IPOuts by 3 (IPOuts = Outs pitched)
Great!
Nice work guys! It would be great if you could do this with tidymodels framework.
First
You should really focus on advanced ggplot, tidy models and shiny stuff....
Thanks for the reply. We have many episodes on ggplot2 and entire series on tidymodels and shiny. Is there something in particular you'd be interested in seeing? ~patrick
Fantastic presentation! Thanks! I am very interested in learning more on Bayesian statistics.
We did a whole series on Bayes, starting with episode 99! Bit.ly/TidyX_Ep99
Thanks a lot @@TidyX_screencast
If you must use a for loop... # Using a for loop library(rlang) wgt_stat_list <- list() for (i in 1:5) { wgt_stat_list[[i]] <- get_wgt_score( fake_dat, sym(str_c("stat", i)), sym(str_c("stat", i, "_n")) ) }
Also, thank you both for this video!
I'd like to offer to solutions, both from the Tidyverse; the second one uses {rlang} for Tidyevaluation (avoid for loops!!!): # Doint it with a pivot_longer and pivot_wider fake_dat |> pivot_longer(starts_with("stat")) |> mutate( stat_number = str_extract(name, "\\d"), name = str_remove(name, "\\d") ) |> pivot_wider( names_from = name, values_from = value ) |> group_by(athlete, stat_number) |> summarise( total_obs = sum(stat_n), wgt_stat = weighted.mean(stat, stat_n) ) # Using tidy_eval library(rlang) get_wgt_score <- function(dat, variable, N) { dat |> group_by(athlete) |> summarise( total_obs = sum(!!variable), wgt_stat = weighted.mean(!!variable, !!N) ) |> mutate(stat = as_string(variable)) } map2( str_c("stat", 1:5), str_c("stat", 1:5, "_n"), \(x, y) fake_dat |> get_wgt_score(sym(x), sym(y))) ) |> list_rbind()
Hey Ellis - great walk-through. I see you predicted both Beltre and Mauer, both of who made it. A-Rod won't make it due to external shenanigans, as not you note. The other player who made it was Todd Helton. Going to do pitchers next?
Really appreciate the for loop solution as often running into this situation myself and I always wondered if you can do this without running the same code multiple times. This has been super helpful!
You can accomplish it with a single pivot longer if the column names have a consistent pattern/separator. I gave the stat variables a new suffix ("_score") to make that happen. fake_dat |> rename_with(\(x) str_replace(x, "(.+\\d$)", "\\1_score")) |> pivot_longer( c(ends_with("_score"), ends_with("_n")), names_to = c("stat", ".value"), names_sep = "_", ) |> summarize( total_obs = sum(n), wgt_stat = weighted.mean(score, n), .by = c(athlete, stat) )
Keeping Patrick on his toes :)
I am right now building a shiny application prototype that I would like to demo to my colleagues. We use Dataiku in my organization and it has the capability to host shiny apps. However, since our tech guys are all pythonistas, most R packages that I am using don't run well on our Dataiku deployment (our tech guys don't care). I explored other alternatives for shiny app deployment but I couldn't find one that is practical enough. I am now going to try this *shinylive* option. Wish me luck.