Clean your data with R. R programming for beginners.
Вставка
- Опубліковано 21 вер 2024
- If you are a R programming beginner, this video is for you. In it Dr Greg Martin shows you in a step by step manner how to clean you dataset before doing any additional analysis. This is part of a series that considers exploring data, cleaning data, manipulating (or wrangling) data, describing data, visualizing data and finally, analyzing data. The tutorial uses data built into R so you can replicate the work on your computer at home. Dr Martin uses the Tidyverse packages that allows for additional functions like select, filter, mutate etc. This tutorial also deals with missing data. So if you are a data scientist, or interested in quantitative analysis or research, this this is a good video to start with.
Get my FREE cheat sheets for R programming and statistics (including transcripts of these lessons) here: www.learnmore365.com/pages/membership-r-programming-data-visualization-and-research-methods
Great lesson, thanks
00:01 Cleaning your data involves systematic exploration, cleaning, manipulation, visualization, and analysis.
01:44 Installing packages in R expands functionality.
05:31 Converting character variable to factor variable in R
07:31 Using the factor function to swap levels in R
11:11 Understanding the difference between 'or' and 'and' in filtering data.
13:03 Handling missing data is crucial for accurate analysis.
17:02 Understanding how to handle missing values in data sets is crucial for data cleaning in R.
18:49 Handle missing data with nuanced approach, not just sweeping deletion
22:16 Identify and handle duplicates in data frames
24:04 Selecting and filtering data using base R method
Life saver using this vidoe in a last min dash to finish some coursework
Thanks for the feedback Harrison. Glad I could help.
We really appreciate your best UA-cam channel for learning R we looking forward to see more especially for survival analysis, parametric and non parametric tests
Wow, thank you!
I do love your enthusiasm Greg, it really keeps me interested in watching through to the end!
Thanks John - much appreciated!
Out of all the online classes and videos I have done, I WISH I STARTED WITH THIS ONE!! Thank you!
Glad it was helpful! Thanks for your amazing feedback
Thank you so much Dr Martin. As a beginner , the way you explain the R programming makes me loving that language more and easy to deal with coding. Keep up with more great videos.
Thank you for the feedback. Glad you enjoyed it! You got this!
Honestly love all your videos. Detailed explanation and yet simple and straightforward. Keep up the great work.
Thank you so much, you're such a huge help! I dont think i would pass my 'digital data analysis' course without your channel
Happy to help!
Your tutorials are unarguably the most explicit and practical for the use of R. Beyond using R as a tool, you explain a lot of statistical concepts. Thanks for all you do, I've learned a lot from your channel
Excellent! Thanks for putting out such great content that's not only useful, but easy to follow!
Wow - what a nice thing to say (thanks!!)
I don't usually comment, but this man here is the best I've come across on youtube,. Damn, too good
Incredible channel, the material is better than any course I have paid for. The delivery and the breakdown of topics into separate videos are perfect for learning. Thank you for sharing your expertise and time.
Easy, Peasy, Lemon Squeezy!
Best R Programming Channel
Keep it going!
I am here for the "super duper easy"... Keep up the great work Dr, thnaks.
You sir have an incredible voice for teaching. Glad I found your channel
Glad to hear it! Thank you!
Amazing tutorial!! Thank you so much! ⭐️⭐️⭐️⭐️⭐️
You are an awesome teacher. I want to give you a hug right now! 😊Thank you for making it so easy to foloow through.
Hi there, Greg! Thanks a lot for these videos, love your style. I've learned A LOT!
Great to hear, Yamila! Thank you for the feedback!
Very clear, useful and interesting. I'm just getting into R and this helped me understand how it can be used for sensible data cleaning.
Great to hear! Thanks for watching!
You really make programming seem "easy-peasy lemon squeezy", keep it up!
Thanks for watching!!! I appreciate your feedback!
Very useful thank you - especially the section on NAs and recoding. Also appreciate the editing, effects, sound quality and close ups of the code. I’ve recently been using the star wars database for my English teaching lessons to help students with the interrogative. How tall is R2D2? How much does Darth Vader weigh? Etc. One request if possible - Times, Dates and TimeDates. Thank you again, your videos have been very helpful.
Thank you so much for the great and useful videos
oh my god man you are a godsent!!!!
I've been learning R in the google data course from Coursera and they don't teach much.
Very helpful. the perfect mixture between the background ideas ( what data to dismiss) and the R way to do so.
Hope to see more videos.
Enjoy X-mas
Truly helpful! Amazing video for tidyverse
Glad it was helpful!
Thank you so much for creating so much accessible and engaging content. For a beginner the way you teach is very clear and easy to understand and your passion for R has made me love it even more! (I also appreciate being hyped up for learning by some drum and bass in the intro)
Yo you arer the best programing youtube chanel bro
Wow I appreciate the kind words. Your support encourages me to create more content that you'll enjoy! Thank you
Thank you so much. Your videos are very helpful.
Glad you like them!
Love this. I didn't know there was an in-house R dataset for star wars.
Yeah - I love it!
YOU ARE SUCH A GOOD TEACHER... THANK YOU
You probably mention that you are looking at the whole observation for Duplicates - but I was merrily making my own example vector with a duplicate name, however with different ages.
Took me a while to realise to use below to get the same as you:
friends[!duplicated(friends$Names) , ]
or
friends %>% distinct(Names)
Great content 🙏
of course they wouldn't be true duplicates in my vector....
You're amazing at teaching this. Thank you!
you are most welcome melina.
Thank you so much! love all your videos! simple and straightforward! with detailed explanation.
I'm so glad to have you as a subscriber! Thank you for being a part of this community.
Great content.
Recode is superseded… we need a new video on this topic 🙂
Very useful Greg, such a big help for at R-n00b
You're a hell of a teacher! congrats!!
This is brilliant, I am on my third video and I am amazed at how easy this is made to seem👏
You channel is a life saver man. Thank you
Much appreciated. Amazing skills in such a simplified way thank you.
Glad it was helpful! Thanks!
Thanks a lot sir. I can't be grateful enough for your videos
Thanks for the lessons so far. Love it
Very helpful! Thank you, sir!:D
Glad it was helpful!
Excellent lectures and a good lecturer also
thanks . I've really appreciate your video
You are welcome! Thank you!
Such good videos for learning R programming and such a nice series. When is the next episode about manipulating your data comming out? Can't wait for it!
haha - any day now... (perhaps tonight)
Thank you very very very much!! Really appreciate your wonderful video. 👍👍👍
You are very welcome. Glad you enjoyed it!
best practice lecture for R
Thank you, Dr Martin
Thumbs up for you Greg!
Thanks a lot!
You're welcome!
Thank you very much Sir. This was quite easy to understand as a beginner. Would you kindly maybe make a series of these videos for us beginners because you have many videos and we wouldn't know which video to watch after this one. I hope that makes sense. Anyway thank you a lot for these videos
I will try my best
this tutorial is Excellent! thank you!
Thank you for this. Very helpful.
You're very welcome! Glad it was helpful.
GREAT VIDEO
Perfect lecture ! benefited a lot.
Glad it was helpful! Thank you :)
Solid tut. Thank you.
Thank you for this amazing resource. Very helpful for someone like myself who is learning R without any meaningful stats experience aside from a semester at uni.
Is anyone learning along able to share their experience of using the mutate and recode functions. I haven't had any success using this whilst following along with this video and a previous one when trying to recode the gender to M and F, or 1 and 2. I've had to work around using :
starwars %>%
select(name,gender) %>%
mutate(gender=if_else(gender=="masculine", "1", "2"))
But I'd really like to know what I'm doing wrong using recode as I think my code looks the same as Greg's!
starwars %>%
select(name,gender) %>%
mutate(gender=recode(gender, "masculine"=1, "feminine"=2))
Bravo, this content was very good
You Sir, are a legend!
You explained it thoroughly ❤❤
Glad it was helpful!
Appreciate your helpful videos
Happy to help! Thank you.
Thank you!! 👑
Thank you for the video sir
Most welcome! Thanks for watching.
Excellente~! Thanks -
If you are here from Tom's Bayesian stats class, give this video a thumbs up!
nobody seemed to be. But here's a thumbs up (I'm here cause of stats class too lol)
Love your videos, man - they're clear and concise, easy to follow! Would you be open to creating content based on the Google Data Analytics Cert?
One of the case studies they have is about a fictional bike study called Cyclistic.
And there is only one person on YT who does it in an R (Caribou Data Science) but it's not as seamless or clear as you make your videos out to be! :)
great! thank you very much
Thanks for the great feedback- Much appreciated !!
Thank you so much, you're such a great help! please show us how to create Dashboards via Shiny. Thanks a lot.
Thank you very much! Very helpful
Glad it helped! Thanks for watching!
Hi,
Love your videos they are so helpful. Could you do a video on loops in r ?? Thanks!
Thanks for the suggestion. Will do.
Thank you!
You're welcome!
For the replace NA, it just filter the dataset but it doesn't change anything from the dataset. So the dataset remained not cleaned.
Doc Martin!
As usual an excellent video! I have one question regarding the recoding. In case of binary cases is it not typical to do 0 and 1 in order to be able to some stats on the data?
Thanks and Happy Holidays!
Really excellent PT.
Then how R is associated with Python algorithm?
I can't get the line to work: filter(hair_color %in% c(“blond”, “brown”) & height < 180)
Error: unexpected symbol in " filter(hair_color c"
> height < 180)
same here, showing ERROR in 'filter()'
Just a heads up to everyone, at the end of the vid when you're doing the recode bit, you might hit this error:
Error: Problem with `mutate()` column `gender`.
i `gender = recode(gender, masculine = 1, feminine = 2)`.
x unused arguments (masculine = 1, feminine = 2)
if you get this error, force R to use dplyr's version of recode like this:
starwars %>%
select(name, gender) %>%
mutate(gender_coded = dplyr::recode(gender, "masculine"=1, "feminine"=2))
I'm not sure why I had to do this, as I had already run the library(tidyverse) command, but replacing recode with dplyr::recode sorted it out.
Figured it out. I also have the "car" package (for the Variation Inflation Factor function "vif" that I'm also using) in my project and so there was a name collision and it was taking recode from car (Comparison to Applied Regression) rather than from dplyr.
variable types
select and filter
find and deal with missing data
find and deal with duplicates
recode values
Pretty handy tips
Glad you think so!
How come I am just discovering this channel?
super thanks
i dont know why when i am running the complete cases code , it is showing error that " . " object is not found .
Please help sir
starwars|>
select(name,gender,hair_color,height)|>
filter(!complete.cases(.))
I ran this code
What if the integer was a chr data type and you want to change it to an integer or double
love it
Hello Sir, can you please make a video about pachage shiny on medical data ?
Thanks a lot, great videos !!
When you start to explain how to find complete and incomplete cases at 16:09, what do you do if you want to find incomplete cases for the entire dataset? Would you just omit the "select" portion of the code?
Warning in install.packages("tidyverse") :
'lib = "C:/Program Files/R/R-4.2.3/library"' is not writable
Error in install.packages("tidyverse") : unable to install packages
do you have any insight why I get this message? I'm starting with R for a statistics class and I see you recommend this package? my laptop is archaic...
Could you do a video on data management using R please
Got it!
Thank you so much for watching and leaving a comment! I appreciate your support.
Can anybody help me with how to disaggregate data that exists in the same column? i.e. in this example lets say you wanted to have a second column(or new variable) for secondary hair color for those values which contain a primary then a secondary hair color i.e "brown, grey", "auburn, white" etc. I am actually working on a file which contains addresses and in many cases the apartment number is not actually separated into another column. However, it should be for the import into the database I am working on and therefore I want to try to create logic to clean and disaggregate these pieces into separate columns. Any help would be greatly appreciated.
I'm a newbie to r, is there a open community where people can help you with your work
How can you convert data type using mutate in dplyr?
Thanks so much for the engaging yet very useful video! Can i ask why the following chunk of code is not able to recode missing value to the assigned value? starwars %>%
select(name, gender) %>% mutate(gender2= if_else(gender=="masculine",1,if_else(is.na(gender),3,2))). Thanks a lot in advancE!
Thanks
I'm having a problem where I want to mutate two variables with values 0, 1 and NA into a new variable with the sum of 0 and 1, however, R in my case counts NA as 0. Are there an easy fix to this, to exclude the NA?
Tx sir
Welcome!!
22.22 There was 5 hair_color missing 3 of em was druid so they don't have hair. U can change it into none but the other 2 they weren't druid they may have some hair but that data is missing. Why didn't u remove that two row?
Just curious is there any reason why you haven't enabled coloured brackets as per the last update it would make the code easier to read
what is the difference between a data frame and a tibble
And how do you undo a command that you had already executed?