Clean your data with R. R programming for beginners.

Поділитися
Вставка
  • Опубліковано 21 вер 2024
  • If you are a R programming beginner, this video is for you. In it Dr Greg Martin shows you in a step by step manner how to clean you dataset before doing any additional analysis. This is part of a series that considers exploring data, cleaning data, manipulating (or wrangling) data, describing data, visualizing data and finally, analyzing data. The tutorial uses data built into R so you can replicate the work on your computer at home. Dr Martin uses the Tidyverse packages that allows for additional functions like select, filter, mutate etc. This tutorial also deals with missing data. So if you are a data scientist, or interested in quantitative analysis or research, this this is a good video to start with.

КОМЕНТАРІ • 161

  • @RProgramming101
    @RProgramming101  Рік тому +1

    Get my FREE cheat sheets for R programming and statistics (including transcripts of these lessons) here: www.learnmore365.com/pages/membership-r-programming-data-visualization-and-research-methods

  • @GFXHDTV
    @GFXHDTV 5 місяців тому +10

    00:01 Cleaning your data involves systematic exploration, cleaning, manipulation, visualization, and analysis.
    01:44 Installing packages in R expands functionality.
    05:31 Converting character variable to factor variable in R
    07:31 Using the factor function to swap levels in R
    11:11 Understanding the difference between 'or' and 'and' in filtering data.
    13:03 Handling missing data is crucial for accurate analysis.
    17:02 Understanding how to handle missing values in data sets is crucial for data cleaning in R.
    18:49 Handle missing data with nuanced approach, not just sweeping deletion
    22:16 Identify and handle duplicates in data frames
    24:04 Selecting and filtering data using base R method

  • @harrisonnash4948
    @harrisonnash4948 2 роки тому +12

    Life saver using this vidoe in a last min dash to finish some coursework

    • @RProgramming101
      @RProgramming101  2 роки тому +1

      Thanks for the feedback Harrison. Glad I could help.

  • @max5916
    @max5916 2 роки тому +27

    We really appreciate your best UA-cam channel for learning R we looking forward to see more especially for survival analysis, parametric and non parametric tests

  • @johnrussell5715
    @johnrussell5715 2 роки тому +5

    I do love your enthusiasm Greg, it really keeps me interested in watching through to the end!

  • @deniseortiz8567
    @deniseortiz8567 Рік тому +1

    Out of all the online classes and videos I have done, I WISH I STARTED WITH THIS ONE!! Thank you!

    • @RProgramming101
      @RProgramming101  Рік тому +1

      Glad it was helpful! Thanks for your amazing feedback

  • @domyndegeya7760
    @domyndegeya7760 Рік тому +2

    Thank you so much Dr Martin. As a beginner , the way you explain the R programming makes me loving that language more and easy to deal with coding. Keep up with more great videos.

    • @RProgramming101
      @RProgramming101  Рік тому +1

      Thank you for the feedback. Glad you enjoyed it! You got this!

  • @Shawn-gm4cf
    @Shawn-gm4cf 2 роки тому +10

    Honestly love all your videos. Detailed explanation and yet simple and straightforward. Keep up the great work.

  • @antonreinhold8478
    @antonreinhold8478 2 роки тому +5

    Thank you so much, you're such a huge help! I dont think i would pass my 'digital data analysis' course without your channel

  • @nonoobott8602
    @nonoobott8602 2 роки тому +6

    Your tutorials are unarguably the most explicit and practical for the use of R. Beyond using R as a tool, you explain a lot of statistical concepts. Thanks for all you do, I've learned a lot from your channel

  • @GallantDanny
    @GallantDanny Рік тому +3

    Excellent! Thanks for putting out such great content that's not only useful, but easy to follow!

  • @phillippin6699
    @phillippin6699 9 місяців тому

    I don't usually comment, but this man here is the best I've come across on youtube,. Damn, too good

  • @JamesEllis-i4j
    @JamesEllis-i4j Рік тому

    Incredible channel, the material is better than any course I have paid for. The delivery and the breakdown of topics into separate videos are perfect for learning. Thank you for sharing your expertise and time.

  • @Sorjen108
    @Sorjen108 10 місяців тому

    Easy, Peasy, Lemon Squeezy!
    Best R Programming Channel
    Keep it going!

  • @AlexKashie
    @AlexKashie 11 місяців тому

    I am here for the "super duper easy"... Keep up the great work Dr, thnaks.

  • @summer7361
    @summer7361 2 роки тому

    You sir have an incredible voice for teaching. Glad I found your channel

  • @itsmitasha
    @itsmitasha 2 місяці тому

    Amazing tutorial!! Thank you so much! ⭐️⭐️⭐️⭐️⭐️

  • @folashadeolaitan6222
    @folashadeolaitan6222 Рік тому

    You are an awesome teacher. I want to give you a hug right now! 😊Thank you for making it so easy to foloow through.

  • @yamimartina
    @yamimartina 2 роки тому +1

    Hi there, Greg! Thanks a lot for these videos, love your style. I've learned A LOT!

    • @RProgramming101
      @RProgramming101  2 роки тому

      Great to hear, Yamila! Thank you for the feedback!

  • @HiltonT69
    @HiltonT69 2 роки тому

    Very clear, useful and interesting. I'm just getting into R and this helped me understand how it can be used for sensible data cleaning.

  • @elenag.224
    @elenag.224 2 роки тому

    You really make programming seem "easy-peasy lemon squeezy", keep it up!

    • @RProgramming101
      @RProgramming101  2 роки тому

      Thanks for watching!!! I appreciate your feedback!

  • @danquixote6072
    @danquixote6072 2 роки тому +6

    Very useful thank you - especially the section on NAs and recoding. Also appreciate the editing, effects, sound quality and close ups of the code. I’ve recently been using the star wars database for my English teaching lessons to help students with the interrogative. How tall is R2D2? How much does Darth Vader weigh? Etc. One request if possible - Times, Dates and TimeDates. Thank you again, your videos have been very helpful.

  • @Researcholigist
    @Researcholigist 3 місяці тому

    Thank you so much for the great and useful videos

  • @IarukaSkYouk
    @IarukaSkYouk Рік тому

    oh my god man you are a godsent!!!!
    I've been learning R in the google data course from Coursera and they don't teach much.

  • @KarstenDrKempf
    @KarstenDrKempf 2 роки тому

    Very helpful. the perfect mixture between the background ideas ( what data to dismiss) and the R way to do so.
    Hope to see more videos.
    Enjoy X-mas

  • @yidanjiang7599
    @yidanjiang7599 2 роки тому +1

    Truly helpful! Amazing video for tidyverse

  • @elliebrown7694
    @elliebrown7694 2 роки тому +1

    Thank you so much for creating so much accessible and engaging content. For a beginner the way you teach is very clear and easy to understand and your passion for R has made me love it even more! (I also appreciate being hyped up for learning by some drum and bass in the intro)

  • @lets_code_this2678
    @lets_code_this2678 Рік тому

    Yo you arer the best programing youtube chanel bro

    • @RProgramming101
      @RProgramming101  Рік тому

      Wow I appreciate the kind words. Your support encourages me to create more content that you'll enjoy! Thank you

  • @anujakori
    @anujakori 4 місяці тому

    Thank you so much. Your videos are very helpful.

  • @goon5031
    @goon5031 2 роки тому

    Love this. I didn't know there was an in-house R dataset for star wars.

  • @vivicaanuforo4754
    @vivicaanuforo4754 Рік тому

    YOU ARE SUCH A GOOD TEACHER... THANK YOU

  • @spraypaul
    @spraypaul Рік тому

    You probably mention that you are looking at the whole observation for Duplicates - but I was merrily making my own example vector with a duplicate name, however with different ages.
    Took me a while to realise to use below to get the same as you:
    friends[!duplicated(friends$Names) , ]
    or
    friends %>% distinct(Names)
    Great content 🙏

    • @spraypaul
      @spraypaul Рік тому

      of course they wouldn't be true duplicates in my vector....

  • @melinaguillon2449
    @melinaguillon2449 3 місяці тому

    You're amazing at teaching this. Thank you!

  • @shaikhahmedbd
    @shaikhahmedbd Рік тому

    Thank you so much! love all your videos! simple and straightforward! with detailed explanation.

    • @RProgramming101
      @RProgramming101  Рік тому

      I'm so glad to have you as a subscriber! Thank you for being a part of this community.

  • @iblisthemage
    @iblisthemage 4 місяці тому

    Great content.
    Recode is superseded… we need a new video on this topic 🙂

  • @emiliezeuthen7631
    @emiliezeuthen7631 2 роки тому

    Very useful Greg, such a big help for at R-n00b

  • @raulpalomares1092
    @raulpalomares1092 2 роки тому

    You're a hell of a teacher! congrats!!

  • @annleonard9713
    @annleonard9713 2 роки тому

    This is brilliant, I am on my third video and I am amazed at how easy this is made to seem👏

  • @felipecruz3061
    @felipecruz3061 Рік тому

    You channel is a life saver man. Thank you

  • @samikzr
    @samikzr 2 роки тому

    Much appreciated. Amazing skills in such a simplified way thank you.

  • @Aaqib..
    @Aaqib.. 2 роки тому

    Thanks a lot sir. I can't be grateful enough for your videos

  • @Junecode
    @Junecode 4 місяці тому

    Thanks for the lessons so far. Love it

  • @krono32
    @krono32 3 місяці тому

    Very helpful! Thank you, sir!:D

  • @jerryeyong5585
    @jerryeyong5585 5 місяців тому

    Excellent lectures and a good lecturer also

  • @lauramagangfopossi1770
    @lauramagangfopossi1770 2 роки тому

    thanks . I've really appreciate your video

  • @MaltePeter
    @MaltePeter 2 роки тому +1

    Such good videos for learning R programming and such a nice series. When is the next episode about manipulating your data comming out? Can't wait for it!

  • @Lin-pj5bo
    @Lin-pj5bo 2 роки тому

    Thank you very very very much!! Really appreciate your wonderful video. 👍👍👍

  • @altareq24953
    @altareq24953 10 місяців тому

    best practice lecture for R

  • @TaraGhimite
    @TaraGhimite Рік тому

    Thank you, Dr Martin

  • @raulpalomares1092
    @raulpalomares1092 2 роки тому

    Thumbs up for you Greg!

  • @yulinliu850
    @yulinliu850 2 роки тому +2

    Thanks a lot!

  • @kamogelokhumalo4792
    @kamogelokhumalo4792 2 роки тому

    Thank you very much Sir. This was quite easy to understand as a beginner. Would you kindly maybe make a series of these videos for us beginners because you have many videos and we wouldn't know which video to watch after this one. I hope that makes sense. Anyway thank you a lot for these videos

  • @erpampa94
    @erpampa94 Рік тому

    this tutorial is Excellent! thank you!

  • @solafajobi
    @solafajobi Рік тому

    Thank you for this. Very helpful.

  • @RosatiSamuel
    @RosatiSamuel 5 місяців тому

    GREAT VIDEO

  • @tarkanh2519
    @tarkanh2519 Рік тому

    Perfect lecture ! benefited a lot.

  • @kamaboko1
    @kamaboko1 5 місяців тому

    Solid tut. Thank you.

  • @krazitired
    @krazitired 8 місяців тому

    Thank you for this amazing resource. Very helpful for someone like myself who is learning R without any meaningful stats experience aside from a semester at uni.
    Is anyone learning along able to share their experience of using the mutate and recode functions. I haven't had any success using this whilst following along with this video and a previous one when trying to recode the gender to M and F, or 1 and 2. I've had to work around using :
    starwars %>%
    select(name,gender) %>%
    mutate(gender=if_else(gender=="masculine", "1", "2"))
    But I'd really like to know what I'm doing wrong using recode as I think my code looks the same as Greg's!
    starwars %>%
    select(name,gender) %>%
    mutate(gender=recode(gender, "masculine"=1, "feminine"=2))

  • @sagarlokare5269
    @sagarlokare5269 Рік тому

    Bravo, this content was very good

  • @noahsalazar2738
    @noahsalazar2738 2 роки тому

    You Sir, are a legend!

  • @shafeen1058
    @shafeen1058 Рік тому

    You explained it thoroughly ❤❤

  • @rnarith855
    @rnarith855 2 роки тому

    Appreciate your helpful videos

  • @marianam5181
    @marianam5181 7 місяців тому

    Thank you!! 👑

  • @riptideking
    @riptideking 2 роки тому +1

    Thank you for the video sir

  • @space5more
    @space5more 5 місяців тому

    Excellente~! Thanks -

  • @BloomingtonFPV
    @BloomingtonFPV 2 роки тому +2

    If you are here from Tom's Bayesian stats class, give this video a thumbs up!

    • @mwegankanda6594
      @mwegankanda6594 10 місяців тому

      nobody seemed to be. But here's a thumbs up (I'm here cause of stats class too lol)

  • @kamekaze997
    @kamekaze997 Рік тому

    Love your videos, man - they're clear and concise, easy to follow! Would you be open to creating content based on the Google Data Analytics Cert?
    One of the case studies they have is about a fictional bike study called Cyclistic.
    And there is only one person on YT who does it in an R (Caribou Data Science) but it's not as seamless or clear as you make your videos out to be! :)

  • @eyadha1
    @eyadha1 Рік тому

    great! thank you very much

    • @RProgramming101
      @RProgramming101  Рік тому

      Thanks for the great feedback- Much appreciated !!

  • @findthetruth3021
    @findthetruth3021 2 роки тому

    Thank you so much, you're such a great help! please show us how to create Dashboards via Shiny. Thanks a lot.

  • @citizenhk1040
    @citizenhk1040 2 роки тому

    Thank you very much! Very helpful

  • @adrianareitano3
    @adrianareitano3 2 роки тому +1

    Hi,
    Love your videos they are so helpful. Could you do a video on loops in r ?? Thanks!

  • @dollysiharath4205
    @dollysiharath4205 Рік тому

    Thank you!

  • @diddysysavane6006
    @diddysysavane6006 10 місяців тому

    For the replace NA, it just filter the dataset but it doesn't change anything from the dataset. So the dataset remained not cleaned.

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 2 роки тому

    Doc Martin!
    As usual an excellent video! I have one question regarding the recoding. In case of binary cases is it not typical to do 0 and 1 in order to be able to some stats on the data?
    Thanks and Happy Holidays!

  • @JAMESYUN-e3t
    @JAMESYUN-e3t 11 місяців тому

    Really excellent PT.
    Then how R is associated with Python algorithm?

  • @DrJohnnyJ
    @DrJohnnyJ 2 роки тому +1

    I can't get the line to work: filter(hair_color %in% c(“blond”, “brown”) & height < 180)
    Error: unexpected symbol in " filter(hair_color c"
    > height < 180)

  • @pipertripp
    @pipertripp 2 роки тому +2

    Just a heads up to everyone, at the end of the vid when you're doing the recode bit, you might hit this error:
    Error: Problem with `mutate()` column `gender`.
    i `gender = recode(gender, masculine = 1, feminine = 2)`.
    x unused arguments (masculine = 1, feminine = 2)
    if you get this error, force R to use dplyr's version of recode like this:
    starwars %>%
    select(name, gender) %>%
    mutate(gender_coded = dplyr::recode(gender, "masculine"=1, "feminine"=2))
    I'm not sure why I had to do this, as I had already run the library(tidyverse) command, but replacing recode with dplyr::recode sorted it out.

    • @pipertripp
      @pipertripp 2 роки тому

      Figured it out. I also have the "car" package (for the Variation Inflation Factor function "vif" that I'm also using) in my project and so there was a name collision and it was taking recode from car (Comparison to Applied Regression) rather than from dplyr.

  • @nabilafandih
    @nabilafandih Рік тому

    variable types
    select and filter
    find and deal with missing data
    find and deal with duplicates
    recode values

  • @robsonreis76
    @robsonreis76 2 роки тому

    Pretty handy tips

  • @microbemike9693
    @microbemike9693 2 роки тому

    How come I am just discovering this channel?

  • @juliablazy4011
    @juliablazy4011 Рік тому

    super thanks

  • @mayanksrivastava7540
    @mayanksrivastava7540 Місяць тому

    i dont know why when i am running the complete cases code , it is showing error that " . " object is not found .
    Please help sir
    starwars|>
    select(name,gender,hair_color,height)|>
    filter(!complete.cases(.))
    I ran this code

  • @chizfoodiehub3444
    @chizfoodiehub3444 Рік тому

    What if the integer was a chr data type and you want to change it to an integer or double

  • @SuccessGossips
    @SuccessGossips 10 місяців тому

    love it

  • @davidispiryan5689
    @davidispiryan5689 2 роки тому

    Hello Sir, can you please make a video about pachage shiny on medical data ?
    Thanks a lot, great videos !!

  • @b5lovermore
    @b5lovermore Рік тому

    When you start to explain how to find complete and incomplete cases at 16:09, what do you do if you want to find incomplete cases for the entire dataset? Would you just omit the "select" portion of the code?

  • @fenysnake
    @fenysnake Рік тому

    Warning in install.packages("tidyverse") :
    'lib = "C:/Program Files/R/R-4.2.3/library"' is not writable
    Error in install.packages("tidyverse") : unable to install packages
    do you have any insight why I get this message? I'm starting with R for a statistics class and I see you recommend this package? my laptop is archaic...

  • @amandihiyare1184
    @amandihiyare1184 2 роки тому

    Could you do a video on data management using R please

  • @andrewjohnson4352
    @andrewjohnson4352 Рік тому

    Got it!

    • @RProgramming101
      @RProgramming101  Рік тому

      Thank you so much for watching and leaving a comment! I appreciate your support.

  • @peterscheerer2346
    @peterscheerer2346 Рік тому

    Can anybody help me with how to disaggregate data that exists in the same column? i.e. in this example lets say you wanted to have a second column(or new variable) for secondary hair color for those values which contain a primary then a secondary hair color i.e "brown, grey", "auburn, white" etc. I am actually working on a file which contains addresses and in many cases the apartment number is not actually separated into another column. However, it should be for the import into the database I am working on and therefore I want to try to create logic to clean and disaggregate these pieces into separate columns. Any help would be greatly appreciated.

  • @reecebinx4191
    @reecebinx4191 Рік тому

    I'm a newbie to r, is there a open community where people can help you with your work

  • @jamesleleji6984
    @jamesleleji6984 Рік тому

    How can you convert data type using mutate in dplyr?

  • @crazyytha
    @crazyytha 11 місяців тому

    Thanks so much for the engaging yet very useful video! Can i ask why the following chunk of code is not able to recode missing value to the assigned value? starwars %>%
    select(name, gender) %>% mutate(gender2= if_else(gender=="masculine",1,if_else(is.na(gender),3,2))). Thanks a lot in advancE!

  • @vincenzo4259
    @vincenzo4259 2 роки тому

    Thanks

  • @GracieJiuJitsu1015
    @GracieJiuJitsu1015 Рік тому

    I'm having a problem where I want to mutate two variables with values 0, 1 and NA into a new variable with the sum of 0 and 1, however, R in my case counts NA as 0. Are there an easy fix to this, to exclude the NA?

  • @korman9872
    @korman9872 Рік тому

    Tx sir

  • @sertansafak2056
    @sertansafak2056 Рік тому

    22.22 There was 5 hair_color missing 3 of em was druid so they don't have hair. U can change it into none but the other 2 they weren't druid they may have some hair but that data is missing. Why didn't u remove that two row?

  • @MrDarkplace22
    @MrDarkplace22 2 роки тому

    Just curious is there any reason why you haven't enabled coloured brackets as per the last update it would make the code easier to read

  • @nitamaitra2921
    @nitamaitra2921 2 роки тому

    what is the difference between a data frame and a tibble

  • @mercywaithira3240onlinemaths
    @mercywaithira3240onlinemaths 2 роки тому

    And how do you undo a command that you had already executed?