Intermediate R Workshop - Data Management with dplyr and tidyr

Поділитися
Вставка
  • Опубліковано 3 гру 2024

КОМЕНТАРІ • 52

  • @vijay006
    @vijay006 3 роки тому +17

    Dear Professor, I have looked at many R tutorial videos for self-learning. Your lectures are by far the best with absolute clarity and with a great deal of explanation. Slides are very useful. Thank you so much..

  • @茱莉-x2o
    @茱莉-x2o 2 роки тому +1

    i wouldn't watch any other R tutorial after watching this awesome tutorial. thank you!

  • @StefanoVerugi
    @StefanoVerugi 3 роки тому +3

    one of the most complete, professional, well explained and slide supported video on R I found on YT, thank you

    • @cclanfear
      @cclanfear  3 роки тому

      Thanks, much appreciated!

  • @BiologistDillon
    @BiologistDillon 3 роки тому +1

    Just a note at 8:00. Object assignment does not _need_ to be at the start of a pipe. It can be at the end through the use of a right facing arrow -> or equals sign. Sure it may be standard to go at the top, but with pipes reading left to right top to bottom, I find it more intuitive to assign the output of the pipe at the end of said pipe. For example:input _df %>% filter(some_var == "value") %>%group_by(some_var) -> output_dfFrom a consistency standpoint, I still tend to assign the output at the start of the pipe, but its absolutely doesnt need to always be at the start.

    • @cclanfear
      @cclanfear  3 роки тому +1

      In most classes I note the right assignment operator (as well as bidirectional operator), but not in these quick workshops as I don't spend much time on what is considered good R style (e.g., no equals assignment).

  • @krismopolitan
    @krismopolitan 3 роки тому +3

    I'm mad at everyone in the world for not pointing me to this video sooner!

  • @windowviews150
    @windowviews150 2 роки тому

    This is the best tidyverse video on UA-cam. You have a new subscriber. Thanks for sharing!

  • @dantelangone4829
    @dantelangone4829 2 роки тому +1

    Thank you for posting this on UA-cam. I have the habit of going through lessons at 1.5x (or even faster), but I loved this class and enjoyed every minute of the full length. I loved your comment about "spiraling" on a problem, that is very applicable to life (and graduate research in general) and not only to programming.

    • @cclanfear
      @cclanfear  2 роки тому

      Thank you, much appreciated!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    I have been working with R for a few years now and this is some of the best stuff I have come across. Thanks!

  • @_subrata
    @_subrata 2 роки тому

    I'm new to R and stumbled upon your video. This is the best thing happened to me. Thank you so much Charles

  • @cristianmarcelovillegaslob9860
    @cristianmarcelovillegaslob9860 2 роки тому +1

    amazing video! congrats

  • @israkvelvarga993
    @israkvelvarga993 3 роки тому +1

    Thank you for sharing this class, I have learned a lot! Greetings from Costa Rica

  • @ruthreshmahadevan3023
    @ruthreshmahadevan3023 3 роки тому +2

    Thank you Charles for the crystal clear explanations... 🙏

  • @glenpjanson5538
    @glenpjanson5538 2 роки тому +1

    Sir, you are great!! Really love your classes.

  • @sakhawat3003
    @sakhawat3003 3 роки тому

    a little correction at 40:30, it's not allowed anymore to use funs() inside summarize_at(). Rather it is suggested to use list(). For example: list(mean=mean, sd=sd)

  • @ButterfaceGMusicSlump
    @ButterfaceGMusicSlump Рік тому

    Super great video! Thank you for the upload.

  • @birdfullo7
    @birdfullo7 4 роки тому +2

    Dude loves pipes.

  • @uAkide
    @uAkide 3 роки тому

    What a useful presentation, very clear, easy to follow, well explained, thank you for preparing this video!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    Charles,
    You might not be interested in this but I think the dplyr's "lag" function also works in the Usable Dates part. That is mutate(date = date.entered + lag(week, default=0) * 7)mutate(date = date.entered + lag(week, default=0) * 7). By setting default=0 you avoid a NA in the first row.

  • @revolution77N
    @revolution77N 3 роки тому

    Best dplyr course ever! Thank you so much man!!

  • @Motivational_Child
    @Motivational_Child 4 роки тому

    Excellent presentation! Thanks for sharing.

  • @BukenyaKizito-p5v
    @BukenyaKizito-p5v 7 місяців тому

    So smooth. Thank you

  • @hobao4965
    @hobao4965 2 роки тому

    thank you so much for such a great lecture !!

  • @heraldfinch7010
    @heraldfinch7010 5 років тому

    A really really good explanation ..

  • @cyberbrodi
    @cyberbrodi 4 роки тому

    Very helpful Charles! Thank you!

  • @mightyowl1668
    @mightyowl1668 4 роки тому

    awesome presentation! thanks a lot

  • @eileenxu1423
    @eileenxu1423 4 роки тому

    Great video! Well explained!

  • @spirosgyparakis8888
    @spirosgyparakis8888 3 роки тому

    amazing job! Thanks a lot

  • @1622roma
    @1622roma 2 роки тому

    I keep getting errors in 1:11:13 Error: unexpected symbol in:
    " select(-minutes, -seconds)
    summary"
    There are a few codes from your slides keeps giving me a error message; for example;
    billboard_1 %>%
    select(artist, track, weeks_at_1) %>%
    distinct((artist, track, weeks_at_1) %>%
    arrange(desc(weeks_at_1)) %>%
    head(7)
    I did name billboard-1 to a different variable, due to billboard_2000 gave me problems.
    Please check!

    • @cclanfear
      @cclanfear  2 роки тому

      I'd advise looking carefully at the code and perhaps checking the website: clanfear.github.io/Intermediate_R_Workshop/ In your code example, for instance, you have an extra ( in your distinct() call. In your select() error you likely also have an added or missing character like ( or ,

  • @hongkaizhang3000
    @hongkaizhang3000 2 роки тому

    very good!谢谢!

  • @mariocailotto7128
    @mariocailotto7128 4 роки тому

    Very nice video, thank you!

  • @SevenRavens007
    @SevenRavens007 4 роки тому

    Awesome thanks for sharing!

  • @Neontrain
    @Neontrain 3 роки тому

    wow this is one of the most informational videos on this Ive seen. Ive been having this issue on a dataframe not being able to calculate the mean/max in a column that has null values in it. my line currently looks like
    trackman %>% group_by(tagged_pitch_type) %>% summarise(mean(as.numeric(spin_rate, na.rm = TRUE)))
    it works for a few of the different pitch types but the ones that have null values dont work.

    • @cclanfear
      @cclanfear  3 роки тому

      You mean NA values, not NULL values, I assume. If so, you just have an argument out of place in summarize: summarise(spin_rate = mean(as.numeric(spin_rate), na.rm = TRUE)).

  • @patrickmuvunyi55
    @patrickmuvunyi55 3 роки тому

    46:31
    Thanks Charles for your straightforward explanation! However, I have been trying to apply the group_by function, but R gives me this, Error: unexpected ')' in " n = n())"

    • @cclanfear
      @cclanfear  3 роки тому +1

      You likely have an extra ) somewhere or are missing something higher up in the code. Check RStudio for a red mark on the left side pointing out an error in the code.

    • @patrickmuvunyi55
      @patrickmuvunyi55 3 роки тому

      Thanks! To make my question clear, here is my code
      library(magrittr)
      aa %>%
      group_by(Year) %>%
      summarise(Life.expectancy mean = mean(Life.expectancy),
      Life.expectancy median = median(Life.expectancy),
      n = n()) %>%
      And they gave me this in console:
      Error: unexpected symbol in " Life.expectancy median"
      > n = n()) %>%
      Error: unexpected ')' in " n = n())"

    • @cclanfear
      @cclanfear  3 роки тому +1

      @@patrickmuvunyi55 Two issues: (1) The code ends with a pipe, (2) there are spaces in variable names, which is not permitted unless surrounded by backticks (`). Fixed: aa %>%
      group_by(Year) %>%
      summarise(Life.expectancy.mean = mean(Life.expectancy),
      Life.expectancy.median = median(Life.expectancy),
      n = n())

    • @patrickmuvunyi55
      @patrickmuvunyi55 3 роки тому

      @@cclanfear I appreciate, Sir! It has finally worked! Am not going to fail this anymore!

    • @miztx2syuiip590
      @miztx2syuiip590 2 роки тому

      for me and current gen i find that starting with Pen first works much better and ending with Pen inputting more in between making it flow liquidity so much easier-- Teplace pen with Pipe my my too many pens before pipes airplanes eject whoa

  • @diegopacheco9367
    @diegopacheco9367 4 роки тому

    nice video.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    I guess gather and spread are now pivot_longer and pivot_wider... It is hard to keep up with this changes (gather and spread actually made sense to me).

    • @cclanfear
      @cclanfear  2 роки тому

      Yep, I was fine with gather and spread, but in changing these they also added some useful features. They're a bit more powerful.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    In regards to the preponderance of NAs in the Billboard rank data would it not be rendered useless in terms of data analysis as it stands (without some sort of fancy imputation etc)?

    • @cclanfear
      @cclanfear  2 роки тому +1

      Billboard has two types of NAs: (1) False NAs that appear when a song is no longer on the billboard, which are the result only of the data being in wide format. (2) What appear to be true NAs where some songs are no longer tracked after like 20 weeks (truncated observations). If you were modeling these data, you could use something like a survival model for the truncated observations.

    • @haraldurkarlsson1147
      @haraldurkarlsson1147 2 роки тому +1

      @@cclanfear
      I was wondering about the same thing. Survival analysis might work here. NAs would be censored data. This might be an interesting problem to tackle in class (if you have not done so already). Thanks.

  • @deaglanjakob3736
    @deaglanjakob3736 3 роки тому

    No one caught onto the incontinent joke :(

  • @pinnewsyi
    @pinnewsyi 2 роки тому

    HII