TidyX
TidyX
  • 194
  • 102 029
Gapminder Camcorder - Be Kind Rewind Code Explanation | TidyX Episode 186
TidyX Episode 186: Gapminder Camcorder - Be Kind Rewind Code Explanation
In this episode, we start a series explaining the examples Ellis showed from his "Be Kind, Rewind" talk at posit::conf(2024). First up, we jump into creating a captivating animated visualization of the Gapminder dataset using R and the {camcorder} package. We break down the code step-by-step, from setting up the animation recording to customizing the plot aesthetics. Learn how to generate smooth and informative animations that tell a compelling story about global trends in GDP per capita and life expectancy.
Join us for an episode that will have your simulations running at the speed of light! ️
Like, Subscribe, and find us on social media! (@ellis_hughes, @OSPpatrick, @tidy_explained).
If you like what we are doing, please sign up to be a patron on Patreon!
www.patreon.com/Tidy_Explained
Email us with any comments, questions, or suggestions at tidy.explained@gmail.com.
Links:
Open an issue on the TidyX Github page!
github.com/thebioengineer/TidyX/issues
Patreon:
www.patreon.com/Tidy_Explained
TidyX Code:
github.com/thebioengineer/TidyX/tree/master/TidyTuesday_Explained/186-Gapminder_Camcorder
Переглядів: 255

Відео

Independence Days with {purrr} | TidyX Episode 185
Переглядів 4964 місяці тому
TidyX Episode 185: Independence Days with {purrr} Using Wikipedia's list of independence days, we'll show you have to use some advanced {purrr} to work with the data, construct new functions, and work with extracted data from webpages to transform it into usable formats. We aim to answer the amusing quip that every 4 days a country celebrates its independence from the UK with this dataset! Join...
Hello Kitty: Intro to {purrr} | TidyX Episode 184
Переглядів 5265 місяців тому
TidyX Episode 184: Hello Kitty: Intro to {purrr} This intro highlights purrr's core functionalities and different ways to write the functions, from named to anonymous functions, keeping types consistent, or even applying functions to filter and pull out contents from lists. Learn the basics to understand how we can apply these techniques to more complicated structures! Join us for an episode th...
Within-group regression using {purrr} | TidyX Episode 183
Переглядів 4225 місяців тому
TidyX Episode 183: Within-group regression using {purrr} Unleash the power of {purrr} to perform within-group regressions! This episode we'll explore fitting separate linear models for different groups in your data, using the Palmer Penguins dataset as an example. Using map(), we'll quickly build models, extract key statistics, and visualize how groups differ. Join us to start the journey on ma...
Turbocharge Your Simulations with Parallel Processing! ⚡️ | TidyX Episode 182
Переглядів 2365 місяців тому
TidyX Episode 182: Turbocharge Your Simulations with Parallel Processing! ⚡️ Ever feel like your simulations take forever to run? This TidyX episode injects a dose of speed with parallel processing using the snowfall package! We'll revisit nested for loops for simulation, then supercharge them to run across multiple cores. Learn how to run simulations in parallel for faster results using the sn...
I Likert Coffee | TidyX Episode 181
Переглядів 3395 місяців тому
TidyX Episode 181: I Likert Coffee Calling all coffee lovers! ☕️ This episode of TidyX gets to the grounds of coffee expertise with a TidyTuesday survey. We'll brew up some data analysis to see if age affects how people rate their coffee knowledge. Get ready for Likert scales, wrangling data, and statistical throwdowns to see which age group claims coffee crown! Like, Subscribe, and find us on ...
How much stuff have we sent to Space? | TidyX Episode 180
Переглядів 1746 місяців тому
TidyX Episode 180: How much stuff have we sent to Space? Ever wondered how much stuff has rocketed into space? This episode we do a 180 and look at how we started TidyX by looking at a TidyTuesday dataset to explore objects launched into space! We'll learn how to wrangle the data, calculate launch counts by year, and create visualizations with ggplot2. Plus, we'll discover a cool trick for face...
How many SpaghettiOs does it take to write LOTR? | TidyX Episode 179
Переглядів 5226 місяців тому
TidyX Episode 179: How many SpaghettiOs does it take to write LOTR? we embark on a hilarious journey to answer the age-old question: how many SpaghettiOs would it take to write a whole book? Inspired by abstract_tyler's instagram reel ( pC6hUeRVp24H/), the we use the power of R to find out! Prepare for some serious spaghetti-fueled fun as we delve into the world of R for data wran...
Player Time Chart - FIBA API Part 2 | TidyX Episode 178
Переглядів 1507 місяців тому
TidyX Episode 178: Player Time Chart - FIBA API Part 2 In this follow-up to Episode 177, we dive deeper into the intricacies of FIBA basketball game data. Building upon our previous exploration, we refine our methods to generate insightful player time charts. Join us as we unravel the complexities of lineup analysis and visualize player dynamics over the course of a game. Get ready for another ...
Who's Next? FIBA API Viewer Question | TidyX Episode 177
Переглядів 1588 місяців тому
TidyX Episode 177: Who's Next? FIBA API Viewer Question We tackle a real-world challenge brought to us by our viewer, Cohen MacDonald. Coehn found an undocumented API that has a bunch of game data from FIBA and has some great ideas on what to do with it. However, theres one problem: the dataset does not contain which players are on the court at what time, just who subs in or out. With an intrig...
Are you sure? | TidyX Episode 176
Переглядів 3558 місяців тому
TidyX Episode 176: Are you sure? In this episode, we're comparing pitchers using the power of Random Forests and bayesian statistics to make comparisons between pitchers likelihood of making it into the Hall of Fame! We show how to make simple simulations of individual player performance and differences, and finally make a function to let you easily compare players. Like, Subscribe, and find us...
Random Strike Zone Forest: Tidyverse Takes on Hall of Fame Hurlers | TidyX Episode 175
Переглядів 2758 місяців тому
TidyX Episode 175: Random Strike Zone Forest: Tidyverse Takes on Hall of Fame Hurlers We explored the world of data modeling using Tidyverse and Purrr to predict the next MLB Hall of Fame pitchers. Stay tuned for some fascinating insights into our modeling process! We use the same datasets as we have the last several weeks, and apply logic and code to create, evaluate, and tune our models. Like...
AI Speed Ball: Predicting the 2024 Pitcher HOF Class in 20 Minutes | TidyX Episode 174
Переглядів 2039 місяців тому
TidyX Episode 174: AI Speed Ball: Predicting the 2024 Pitcher HOF Class in 20 Minutes We're bringing the heat with AI! Join us as we step up to the plate and predict the next MLB Hall of Fame pitchers using the power of TensorFlow and Keras. With a killer convolutional neural network in our arsenal, we're ready to knock it out of the park! We go over normalization techniques, how to set up your...
Pitch into the Bayes - "20" minute MLB Hall of Fame Pitchers predictions | TidyX Episode 173
Переглядів 3349 місяців тому
TidyX Episode 173: Pitch into the Bayes - "20" minute MLB Hall of Fame Pitchers predictions Step up to the mound in TidyX Episode 173 as we predict MLB Hall of Fame pitchers using the power of Bayesian models! Join us as we switch up our game plan, leaving no curveball unturned with rstanarm. We inspect the models and results with prediction intervals and probabilities, bringing a new dimension...
20 minutes to Predict MLB HOF Pitchers - Class of 2024 | TidyX Episode 172
Переглядів 2529 місяців тому
TidyX Episode 172: 20 minutes to Predict MLB HOF Pitchers - Class of 2024 Join us as we look into the numbers behind predicting MLB Hall of Fame pitchers! This episode includes crafting a dataset from the {Lahman} package, creating logistic regression models, and finally assessing them via model summary tools and visualizing techniques. Stay tuned for insights and adjustments as we navigate the...
Bae in the Fast Lane - Bayesian linear regression in 20-Minutes |TidyX Episode 171
Переглядів 54210 місяців тому
Bae in the Fast Lane - Bayesian linear regression in 20-Minutes |TidyX Episode 171
Beyond Basic For loops: Tidy Expressions | TidyX Episode 170
Переглядів 60310 місяців тому
Beyond Basic For loops: Tidy Expressions | TidyX Episode 170
Predicting Hall Of Famers in 20 Minutes | TidyX Episode 169
Переглядів 60010 місяців тому
Predicting Hall Of Famers in 20 Minutes | TidyX Episode 169
Hall of Fame Showdown - Base R Plot Edition | TidyX Episode 168
Переглядів 21311 місяців тому
Hall of Fame Showdown - Base R Plot Edition | TidyX Episode 168
Grand Slam - Knocking it Out of the Park with Base R Density Plots | TidyX Episode 167
Переглядів 22911 місяців тому
Grand Slam - Knocking it Out of the Park with Base R Density Plots | TidyX Episode 167
The Line Plot Saga | TidyX Episode 166
Переглядів 25311 місяців тому
The Line Plot Saga | TidyX Episode 166
The Power of Plotting Compels You | TidyX Episode 165
Переглядів 47311 місяців тому
The Power of Plotting Compels You | TidyX Episode 165
Advanced Shiny - Running Multiple Linked Shiny Apps | TidyX Episode 164
Переглядів 872Рік тому
Advanced Shiny - Running Multiple Linked Shiny Apps | TidyX Episode 164
Creating Player url links in datatable and Shiny | TidyX Episode 163
Переглядів 505Рік тому
Creating Player url links in datatable and Shiny | TidyX Episode 163
Advanced Shiny - Web Scraping and Dynamic Linking | TidyX Episode 162
Переглядів 630Рік тому
Advanced Shiny - Web Scraping and Dynamic Linking | TidyX Episode 162
Shinylive - Is this thing on? | TidyX Episode 161
Переглядів 2 тис.Рік тому
Shinylive - Is this thing on? | TidyX Episode 161
Shiny URL Queries | TidyX Episode 160
Переглядів 411Рік тому
Shiny URL Queries | TidyX Episode 160
TidyX Posit::Conf 2023 Recap
Переглядів 204Рік тому
TidyX Posit::Conf 2023 Recap
R Packages - Feeling a little Testy | TidyX Episode 159
Переглядів 309Рік тому
R Packages - Feeling a little Testy | TidyX Episode 159
R Packages - Write a vignette for yourself | TidyX Episode 158
Переглядів 379Рік тому
R Packages - Write a vignette for yourself | TidyX Episode 158

КОМЕНТАРІ

  • @CaribouDataScience
    @CaribouDataScience 17 днів тому

    Say it would be great if you created a update to this video because some of the functions used changed.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    The excel sheet looks dreadful.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    I find the part with cbind confusing.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    You guys are code ninjas!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    Why not use lubridate for dates? By the way I like the discussion around adressing the different issues that arise in attempting to solve a given problem.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    I know wikipedia allows scraping but that is not true for other sites. I typically check the permissions with the bow(url) function from the polite package.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    Does broom work with lists or only data frames?

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    I got the p-values with the following code: mutate(model_p = map(model_summary, ~.x$coef[1 , 4]) ) %>% unnest(cols = model_p)

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    You can simply add as.character(x) to the function (map_chr) without using the pipe.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Місяць тому

    The . in split does not work on my end if I use the base pipe.

  • @jodarove
    @jodarove 2 місяці тому

    Thank you guys! this was really helpful

  • @coenraadmarais4074
    @coenraadmarais4074 2 місяці тому

    Flame tree next, please.

  • @eustrainroblero729
    @eustrainroblero729 3 місяці тому

    gganimate? it will be great one comparison between packages

  • @k5555-b4f
    @k5555-b4f 4 місяці тому

    very cool and so useful simply because event-level dataframes are ubiquitous (especially in a work/company setting) especially with fetching all this data from a nested/tree like structured jsons - you have a big fan in me here gentlemen - thanks !! (i'm willing to bet Cohen is a data analyst/scientist for a company involved in sports/NBA bookkeeping or analytics lmao)

  • @k5555-b4f
    @k5555-b4f 4 місяці тому

    great stuff as always ! clear and easy to understand what easily can be confusing so, thanks ! (very) unrelated to this but would you be interested in diving (either deep) or as an introduction into the logger package, i think it's created with the intent of mimicking python's version and while i find python's pretty straightforward, for some reason R's version is a little more obscure/harder to grasp for me

    • @TidyX_screencast
      @TidyX_screencast 4 місяці тому

      Thanks for the comment! I'll have to look at the logger package a bit again - I think I've used it in the past, but there may be a few other things/concepts you need to know before it makes sense ~ Ellis

    • @k5555-b4f
      @k5555-b4f 4 місяці тому

      @@TidyX_screencast no worries if you can't of course - thanks Ellis either way !

  • @alelust7170
    @alelust7170 4 місяці тому

    Great content 🎉

  • @hosseinkhandani3937
    @hosseinkhandani3937 5 місяців тому

    Why was the panel data package (plm) or LSDV Reg method not used?

  • @blaisepascal3905
    @blaisepascal3905 5 місяців тому

    Nice video! Do you know how the modify() function works in purrr? I really struggle to see the difference with the map() function.

  • @djangoworldwide7925
    @djangoworldwide7925 5 місяців тому

    You can try use the \(df) notation for anon functions. purrr::map(my_list, \(df) lm(y~x1+X2, data = df)) This function expects a list of data frames called my_list. It then regress in each of these data frames y against x1+x2, and specifies each of the data frames in the list as the data for the regression. It is the same as what you guys did, but i think this notation was introduced and is now considered better than ~ .x notation

  • @liangzhao2659
    @liangzhao2659 5 місяців тому

    Great job!

  • @kurokami254
    @kurokami254 5 місяців тому

    This was great! Pretty concise and learnt a lot. Cleaning data is a lot less time consuming and intuitive for me now. Didn't know base R was so good at dealing with strings

    • @TidyX_screencast
      @TidyX_screencast 5 місяців тому

      Thanks for the comment! R handles strings quite well in a variety of different ways. We just scratched the surface here ~ Ellis

  • @rayflyers
    @rayflyers 5 місяців тому

    Take the code I commented on episode 144 and replace purrr::map() functions with furrr::future_map() functions.

  • @guirodriguues
    @guirodriguues 5 місяців тому

    Awesome. One package I like a lot is the furrr package. You use the function "future_map" as you do with the "map" function from the purrr package, but in parallel. Pretty easy.

  • @mikep8857
    @mikep8857 6 місяців тому

    Great approach for doing multiple comparisons. Could you not just replace filter(r1 != r2) with filter(as.numeric(r1) > as.numeric(r2))? I think broom::tidy() on the output of the t test might have made it a bit easier to combine and extract the data you wanted. although your approach works fine.

    • @patrickward6067
      @patrickward6067 5 місяців тому

      Haven't tried that. Will give it a shot! Thanks! ~patrick

    • @omarahomar
      @omarahomar 5 місяців тому

      The same solution came to my mind last week for a similar problem, filter the indices if i>j just gave me the upper triangle (except diag.) of the Cartesian product matrix. 👍

  • @scotmorrsn
    @scotmorrsn 6 місяців тому

    Good stuff gents. Appreciate the extra package shared tho.

  • @mikep8857
    @mikep8857 6 місяців тому

    Another great episodw. Looking forward to the episode 200 party edition! In your plots the y axis was count data. It irritates me that ggplot will often put decimal points on scales even if you bother to define the variable as an integer. For one plot it's not too hard to manually input the breaks etc. Is there an easy way of getting round this problem when you are generating lots of plots, as in your example? It should be as simple as saying "this is integer data, 10.5 is meanigless!"

  • @mikep8857
    @mikep8857 6 місяців тому

    Great epidode! I learnt the invisible function which seems particularly appropriate for an episode on the Lord of the Rings!

    • @TidyX_screencast
      @TidyX_screencast 6 місяців тому

      I hadn't even thought about how well that worked! That's great! ~ Ellis

  • @IO-qt8kv
    @IO-qt8kv 7 місяців тому

    How do you introduce and make predictions on other new dataset (not the test data)

  • @tomkmb4120
    @tomkmb4120 7 місяців тому

    Any plans to do any more NBA stuff? I'd love to see something like trying to predict some of the awards as the season is winding down, maybe Most Improved player as predicted using ML - similar to the HOF pitching

  • @tomkmb4120
    @tomkmb4120 7 місяців тому

    This was a fun one

  • @rafabws
    @rafabws 8 місяців тому

    Great video, guys! And it's great that the baseball season around the corner, so lots of people should be itching for new data points (I mean, games lol)

  • @ahmedmo8814
    @ahmedmo8814 9 місяців тому

    Excellent showcasing but we also need to know more about map function of purr

  • @scotmorrsn
    @scotmorrsn 9 місяців тому

    🐈🐈

  • @joshsmith8389
    @joshsmith8389 9 місяців тому

    now do barry bonds

  • @tdawry
    @tdawry 9 місяців тому

    I'm not sure if the "A Aron" was a Key and Peele joke or not, but an interesting video either way

  • @ciensalud
    @ciensalud 9 місяців тому

    You guys rockin', keep it up!

  • @fredericrioux6937
    @fredericrioux6937 9 місяців тому

    Good work. You guys can get the innings pitched if you divide the IPOuts by 3 (IPOuts = Outs pitched)

  • @IgnacioAguilarToledo
    @IgnacioAguilarToledo 9 місяців тому

    Great!

  • @ToniGril
    @ToniGril 10 місяців тому

    Nice work guys! It would be great if you could do this with tidymodels framework.

  • @cornellmihkail1238
    @cornellmihkail1238 10 місяців тому

    First

  • @djangoworldwide7925
    @djangoworldwide7925 10 місяців тому

    You should really focus on advanced ggplot, tidy models and shiny stuff....

    • @patrickward6067
      @patrickward6067 10 місяців тому

      Thanks for the reply. We have many episodes on ggplot2 and entire series on tidymodels and shiny. Is there something in particular you'd be interested in seeing? ~patrick

  • @yarriofultramar
    @yarriofultramar 10 місяців тому

    Fantastic presentation! Thanks! I am very interested in learning more on Bayesian statistics.

    • @TidyX_screencast
      @TidyX_screencast 10 місяців тому

      We did a whole series on Bayes, starting with episode 99! Bit.ly/TidyX_Ep99

    • @Aaqib..
      @Aaqib.. 10 місяців тому

      Thanks a lot ​@@TidyX_screencast

  • @ArcenisRojas
    @ArcenisRojas 10 місяців тому

    If you must use a for loop... # Using a for loop library(rlang) wgt_stat_list <- list() for (i in 1:5) { wgt_stat_list[[i]] <- get_wgt_score( fake_dat, sym(str_c("stat", i)), sym(str_c("stat", i, "_n")) ) }

  • @ArcenisRojas
    @ArcenisRojas 10 місяців тому

    Also, thank you both for this video!

  • @ArcenisRojas
    @ArcenisRojas 10 місяців тому

    I'd like to offer to solutions, both from the Tidyverse; the second one uses {rlang} for Tidyevaluation (avoid for loops!!!): # Doint it with a pivot_longer and pivot_wider fake_dat |> pivot_longer(starts_with("stat")) |> mutate( stat_number = str_extract(name, "\\d"), name = str_remove(name, "\\d") ) |> pivot_wider( names_from = name, values_from = value ) |> group_by(athlete, stat_number) |> summarise( total_obs = sum(stat_n), wgt_stat = weighted.mean(stat, stat_n) ) # Using tidy_eval library(rlang) get_wgt_score <- function(dat, variable, N) { dat |> group_by(athlete) |> summarise( total_obs = sum(!!variable), wgt_stat = weighted.mean(!!variable, !!N) ) |> mutate(stat = as_string(variable)) } map2( str_c("stat", 1:5), str_c("stat", 1:5, "_n"), \(x, y) fake_dat |> get_wgt_score(sym(x), sym(y))) ) |> list_rbind()

  • @brianingersoll5604
    @brianingersoll5604 10 місяців тому

    Hey Ellis - great walk-through. I see you predicted both Beltre and Mauer, both of who made it. A-Rod won't make it due to external shenanigans, as not you note. The other player who made it was Todd Helton. Going to do pitchers next?

  • @zl1061
    @zl1061 10 місяців тому

    Really appreciate the for loop solution as often running into this situation myself and I always wondered if you can do this without running the same code multiple times. This has been super helpful!

  • @rayflyers
    @rayflyers 10 місяців тому

    You can accomplish it with a single pivot longer if the column names have a consistent pattern/separator. I gave the stat variables a new suffix ("_score") to make that happen. fake_dat |> rename_with(\(x) str_replace(x, "(.+\\d$)", "\\1_score")) |> pivot_longer( c(ends_with("_score"), ends_with("_n")), names_to = c("stat", ".value"), names_sep = "_", ) |> summarize( total_obs = sum(n), wgt_stat = weighted.mean(score, n), .by = c(athlete, stat) )

  • @Matt_Kumar
    @Matt_Kumar 10 місяців тому

    Keeping Patrick on his toes :)

  • @caty863
    @caty863 10 місяців тому

    I am right now building a shiny application prototype that I would like to demo to my colleagues. We use Dataiku in my organization and it has the capability to host shiny apps. However, since our tech guys are all pythonistas, most R packages that I am using don't run well on our Dataiku deployment (our tech guys don't care). I explored other alternatives for shiny app deployment but I couldn't find one that is practical enough. I am now going to try this *shinylive* option. Wish me luck.