Introduction to R and RStudio part 2
Вставка
- Опубліковано 18 вер 2024
- Introductory video tutorial on using R and RStudio (part 2).
Please view in HD (cog in bottom right corner).
Link to Pokemon.xlsx file: drive.google.c...
Link to Pokemon.csv file: drive.google.c...
Link to R script: drive.google.c...
I wish you had made a career out of UA-cam tutorials, this is the best R tutorial I've found. You seem to understand very well what is important to know for beginners and convey it beautifully. Thank you so much!
Thanks Hefin for making these amazing tutorials available to us!
To those who are having issues when executing plot(pokemon) and returning errors like "need finite 'ylim'" or "NAs introduced by coercion", it is because when importing the data, RStudio labeled the categorical variables as "Characters" instead of "Factors" (and incidentally "Numeric" instead of "Integer" for other variables, but don't think it matters).
You can confirm this by checking the structure of your dataframe using str(pokemon) and comparing it to Hefin's at around the 6:30 mark.
For reasons beyond my beginner understanding, the only way to import it properly is to use read.csv("pokemon.csv"). read_csv, read.xlsx, read_xlsx, read_excel all do not output the right variable types. If you tried to import the file by clicking the file in the working directory on the bottom right, it automatically executes read_csv or read_excel. Clicking the CSV file to import will allow you to press each heading and change the type of variable it is, but for some reason if you want to change the character to a factor, it'll ask you to enter every factor which is not feasible.
Alternatively, you can force convert your already imported file. I only know how to do it column by column and not by batch, but it would look something like this: pokemon$Pokemon
Thanks for posting this! Packages get updated so frequently that some of the functions have now changed, resukting in the error you mentioned. I'm glad you managed to find a way around it. If you use the read_csv function from the readr package, you can (and is good practice to) specify the type you want each column to be read in as.
@@hefinrhys8572 Thanks for clarifying and thanks again for this amazing tutorial. I couldn't get the read_csv("pokemon.csv", col_types = "ifiiiiiiiffffff) to reproduce the same results as read_csv, the graphs look quite different when I plotted it. Anyway it doesn't matter, I think I just need to spend more time learning and practicing as I still have difficulty with even some of the more basic functions. Thanks!
Hi, I'm getting this error even though I am using the function "read.csv("Pokemon.csv").
I didn't really understand how to fix this error reading your explanation. Could you explain again how to get around this problem one more time?
EDIT: Just got this fixed. I just added another parameter in the read.csv function:
pokemon
@@pyunglee8321 Thanks. This worked.
You need to do more videos with R. This is by far the greatest introduction to R. I am writing this comment almost two years after I started learning R.
Wow! glad i found your channel. Subject selection and teaching skills are just great. Subscribed immediately and hope you can resume these tutorials. Thank you for recording these classes and sharing them with us to learn. Keep them coming please
This tutorial series is great! But I think one of the main reasons it appealed to me so much is that i've played the Pokemon games extensively, so the data made perfect sense (it's nice seeing actual numbers of how much better mah bois Tyranitar and Alakazam are).
I wonder how many people following this guide are familiar with the games and the mechanics. I have a suspicion that a lot of people are completely lost with the Pokemon jargon. However, this series is an excellent example of how to teach data science using unique datasets from anything that inspires passion. I would not have been as interested or invested into crunching numbers on financial data, for instance.
Given the quantity of material out there on this topic, your video still stands out. Proved very helpful. Thanks.
Your introductory videos on R are extremely helpful for me. Thank you so much!
Both of your intro to R tutorials are the best online I've found. I really appreciate the organization and creative data set that made the tutorial both accessible and enjoyable. I'll be moving on to the GG plot one next. I was hoping to bother you with one question. After running the lm with type III SS, if I use the summary() instead of Anova(), will the output still use III SS? I've been trying to figure out the difference between the two, but am not able to find specific information regarding the SS. Thanks for making such great material open to everyone!
I'm so sorry this answer is so late, and I hope you found an answer, but in case it's useful for you still or for others:
The p values in the output of summary() are just the p values for individual parameters, i.e. for each individual slopes, and these aren't calculated using the sums of squares. These p values are calculated by calculating the t value for each parameter, and then finding the probability of drawing a value of t at least this extreme for a t distribution with the same degrees of freedom. The sum of squares are calculated to conduct tests of main effects, or of interactions. The difference in ss is needed here because we're testing whether the model fit changes when we remove all the parameters corresponding to one factor. For example, to test whether there is a significant main effect of the Type.I factor, we compare the ss between the model with, and without all the slopes corresponding to levels of the Type.I factor
Thanks to your videos I am back to being a student for a few hours a week, after being the "Prof asking his student..." for almost 15 yrs.Thx
I often pause your videos for several minutes to experiment and play around with stuff but you explain everything very well and in a really good order. Although nearly 3 years old this video explained the logic behind coding in R much better than the tutorial on the RStudio website. I was a huge Pokemon nerd through grade school so the pokemon data made a lot of sense to me too. I have no prior coding experience and I've only used Stata a little bit before for econometrics.
Thank you, sir! Your lectures are really excellent!
Simple and direct to the point,thanks alot for sharing
This video was the best tutorial for R and R studio I have found. Muito Obrigado!!!
Thank you Hefin for giving us such an informative and clear lecture.
Thank you for a very informative understandable explanation of R. I will be attending all classes :-)
Good morning guys! I had a quick question when running the code:
plot(pokemon[,"Type.I"], pokemon[ "Atk"])
This happens:
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
If anyone know what I did wrong it would be greatly appreciated! Thank you!
You need to change "Type.I" from a character to a factor like this:
pokemon$Type.I = as.factor(pokemon$Type.I)
Then try to make the plot again!
I hope it works!
yeah, the video is really good like it explores all the facts needed for one to have a good grasp of r
thanx
Hefin…you’re a Fucking Beast, I swears! ❤️🙏👍
Very clear and informative nice video thanks. Pre stydying R for masters in finance
First of all, I want to thank you a lot for your introductory videos on R, you have already helped me a lot.
Secondly, and following your pokemon data frame, I've been looking around to find ways of how to create two pokemon groups that show no statistical difference between each other, based on the HP, Atk, and Def of the pokemon. Could you enlight me with this a little bit?
Cheers
Thanks! So you want to creat two new imaginary classes of data that have no difference between them? If you sample random values from a distribution (using the rnorm function, for example) then you can draw random values for each variable for as many new subjects as you like. Then just randonly partition the subjects into the two different classes. The two classes will have been drawn from the same distribution. If you draw each class from different distributions, you can control how different the samples are by altering their means and standard deviations. Hope that helps!
@@hefinrhys8572 Thank you for your answer :)
Answering the question on your answer, yes. That is what I want, to create two groups of data subsetting the main data frame. So what I have is a data frame similar to the pokemon one and I want to tell R to from that data frame select 20 pokemon and put them in two groups, but always taking in mind that they have to be very similar based on their attack, def, means.
I don't think I can use the rnorm function since the data I need is already there, I see how I could use it just to create the two groups but without taking in mid their properties.
Oh I see. Well you could write a function that would start by sampling a pokemon from 1 group, then find the pokemon from the other that is most similar to it. Then sample again from the first group, then sample from the second to find the pokemon that makes the mean of the values of that group, closest to the mean of the first group.
Another way to do it would be to create many random samples from each group, and choose the pair of samples that are most similar to each other. Does that make sense? You can canculate distance (eg Euclidean distance) between different pokemon, or different means of your samples, using the dist() function (it will return a distance matrix).
Thanks Hefin for the tutorial.
I have an issue though, I have tried running this code psychic
I have also been getting the same but cannot fingure out why! Thanks in advance for any tips or suggestions :)
Hefin, can you help us out?
Thanks a lot for providing this useful tutorial for free
I kindly need to ask you something. After passing the two videos you provided on our channel, do they exempt the followers(viewers) from taking an R or RStudio course/workshop? In other words, how much the two videos cover the R major contents?
There is a huge amount of information to learn about R. This video gives you a good starting point, so I would suggest you try to use R for your own projects, and as you identify gaps in your knowledge, then pursue learning material that covers that. I highly recommend you look into the "tidyverse" set of packages, notably ggplot2 (for plotting), and dplyr (for performing data wrangling).
thank you Hefin, I really appreciate your help
For some reason @ 11:55, I had to convert the Type.I column to factor first. Is there a setting where it automatically changes categorical columns into factor? My plot won’t work unless I convert factor(pokemon[, “Type.I”])
Hi Hassan, yes you're correct. Since I published this R has been updated and this behaviour is different. Prior to R 4.0, the language would convert character variables to factor by default. This is now no longer the default, so you need to explicitly convert the variable to a factor yourself :) hope that clears it up.
@@hefinrhys8572 appreciated mate
For me it worked with: dat$Type.I
@@hefinrhys8572 Thank you very much. This issue was driving me crazy!
Hello, I am also having this issue. I have changed "Type.I" and "Atk" into factors (Pokemon$Atk
Many thanks for this very helpful tutorial. As i run the function plot(pokemon[, "Type.I"], pokemon[, "Atk"]), i get error message as follows: Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Not sure why. Please help
This is because R has changed since I made this video, and no longer reads character columns as factor by default. So you need to convert the Type.I column into a factor, using:
pokemon$Type.I
@@hefinrhys8572 Really appreciate your prompt response, and it works now.
@@hefinrhys8572 Mine still isn't working. Could you please write the whole command down for me please
No need, I got it
@@hefinrhys8572 This also helped me so thanks
thanks for the tutorial. Great Job. Keep doing videos please.
Thanks for these videos, best I have found so far!
Thank you, Hefin. You have done a great job. But I have a problem. When I run your line 185 code(1:23:50) , it says that testInteractions(twoWay,pairwise="Type.I",
Error in colMeans(mf[numeric.predictors]) : 'x' must be numeric. Could you tell me how to fix it?
Same thing here! Have not figured it out
you've probably solved it already. But just in case, "Captive" needs to be converted to factor like "Type.I" was: pokeSubset2$Captive = factor(pokeSubset2$Captive)
@@martasampaio5627 thank you!
I get an error at 1:23:05 Why please? I typed in this: testInteractions(twoWay, pairwise = "Captive", fixed "Type.I"). Help please. Thanks
Thanks a lot for this. I struggled to plot "Type.I" because R see's the non-integer data as Characters, when R needs to see them as Factors. for anyone else having this issue, I fixed this by converting character columns to Factors using the following code:
pokemon$Type.I
Thanks for the videos. Really great tutorials
Hi Hefin, I had entered: plot(pokemon[,'Type.I'], pokemon[,'Atk']).
However, I received an error :
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Could u pls help me with this? My code works when I use variables that are integers, but it does not if one of the variables is a character (like 'Type.I'). Thank you!
same here!
Thanks for the videos, very informative and easy to follow :)
Hi Hefin,
I'm really enjoying these videos, they're extremely helpful!
When I tried plotting it comes up with "Error in plot.new() : figure margins too large"
what am I doing wrong?
Thanks so much :)
Try dragging the plotting window in rstudio so it's larger. It struggles to draw large plots when the window is too small.
@@hefinrhys8572 so simple! Thanks 🤦
Hi!, first of all thank you for uploading these videos they're so helpful for someone like me who's just starting, i have some questions i hope i can get an answer.
I'm doing my thesis based on an experimental model, my objetive is to analyze gene expression on a control subject and a treated one, during different days. I think, for what i undrstand is that i have to see the distribution of my data, to see if i apply parametric or non parametric tests, i can see the distribution with an histogram only or what else should i do?
Now, if you could just create a video explaining the relevance of all these different types of test and when to use them in a tad bit more detail, my life would be complete! =)
Thanks Hefin! Great tutorial. I learnt a lot :)
best tutorial ever ,please check the video its not clear
Thank you for the tutorial !!
Really helpful 😊😊
It is greatly helpful for beginners!
Thank you so much for these videos!
Hello, Hefin Rhys thank you so much for simpifying R. Can I please get the Pokemon csv fie to continue with the second part of the introduction to R and R studio. The links shared indicate that the files have been deleted.
Sorry about this, can you try again and let me know if it works?
@@hefinrhys8572 It works. Thank you.
Loved it, amazing
The best tutorial
Thank you so much for this Hefin
I have this error when trying to execute testInteractions(twoWay, pairwise = "Captive", fixed = " Type.I")
Error in colMeans(mf[numeric.predictors]) : 'x' must be numeric
In addition: Warning message:
In testInteractions(twoWay, pairwise = "Captive", fixed = " Type.I") :
Some factors with specified contrasts are not in the model and will be ignored.
amazing and very very helpful! thank you!
when i plot(pokemon[,"Type.I"], pokemon[,"Atk"]), i dont get the same plot , is that possible ? also i dont see poison in x-axis which is included in Type.I variable , what to my plot here ?
Hey there, I have been following this tutorial b ut getting a lot of errors saying need finite '
xlim' values...has something changed that this no longer works?
Hmm, sorry you're having this issue. Are you sure you loaded the dataset corretly? This error can occur if you're trying to plot a vector of NAs, or a character vector. Take a look here and let me know if it helps: stackoverflow.com/questions/21349368/error-in-plot-window-need-finite-xlim-values
@@hefinrhys8572 I have had the same error loading the .csv file because the column Type.I is a column of characters. I solve using as.factor() as suggested by the thread on StackOverflow. Thanks again for this precious material, perfect for having a good overview of R basics!
Thanks. Works for me too. The new line is
plot(as.factor(pokemon[, "Type.I"]), pokemon[,"Atk"])
Hello Hefin: Would it be possible for me to obtain the Pokeman files you refer to in the opening of this tutorial?
Hi! Have a look at the links in the video description. Let me know if they don't work for you.
Muchas gracias por la clase 10/10 a 1.25x
great course, thank you very much
How can i change heading of the dataframe?
What If i want to remove only 1 heading from entire dataframe and column should be same as it is (headless coumn).
You want to have a column without a name? All columns must be named in a dataframe. If you want a tabular datastructure with no column names or row names, you should use a matrix instead. Is that what you mean?
Hi Hefin,
I got this message and i can't continue the lecture with you. Would you please help me with this message?
What i have to do to fix this problem?
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
I
Hi Hamad, this might be because you're using a newer version of R. R recently changed so that by default, character data are loaded in as character vectors, not as factors (which they used to be read in as). So try loading in the data, adding the argument stringsAsFactors = TRUE. For example:
read.csv("Pokemon.csv", stringsAsFactors = TRUE) # for the csv file
read.xlsx("Pokemon.xlsx", sheetIndex = 1, stringsAsFactors = TRUE) # for the xlsx file
Let me know if this helps!
@@hefinrhys8572 Hi Hefin,,,
Yes it works ! thank you for your help !
@Hannah Barnes do you think you might have the same issue as described here: stackoverflow.com/questions/51651451/performing-a-t-test-in-r-with-categorical-variables
Thank you very much .
Thank you, very helpful:)
Hello. Thank you for Your tutorial. When I try to download the dataset from your drive, it says "File is in owner's trash". Would it please be possible to make those files available again?
Sorry about that. Can you try again and let me know if it works?
@@hefinrhys8572 It works. Thank You.
Thanks for this... everything was going smoothly until I could not install Mosaic.... get this error message. Any help?
* installing *source* package 'broom' ...
** package 'broom' successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
namespace 'dplyr' 0.8.5 is being loaded, but >= 1.0.0 is required
ERROR: lazy loading failed for package 'broom'
* removing 'C:/Users/angus/Documents/R/win-library/3.5/broom'
In R CMD INSTALL
Warning in install.packages :
installation of package ‘broom’ had non-zero exit status
ERROR: dependencies 'latticeExtra', 'broom' are not available for package 'mosaic'
* removing 'C:/Users/angus/Documents/R/win-library/3.5/mosaic'
In R CMD INSTALL
Warning in install.packages :
installation of package ‘mosaic’ had non-zero exit status
It looks like you don't have an up to date version of the dplyr package. Try running install.packages("dplyr") first, then try again :)
@@hefinrhys8572 Thank you... however, this did not seem to work... I updated my entire RStudio... read through all the documentation... still get an error that mosaic would not load and when trying to install the package I get this...
I would really love to learn R... but this type of stuff drives me crazy and makes me just want to stick with SAS
FYI - I tried installing dplyr, broom separately to no avail. latticeExtra did not show up as available
* installing *source* package 'broom' ...
** package 'broom' successfully unpacked and MD5 sums checked
** R
** inst
** byte-compile and prepare package for lazy loading
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
namespace 'dplyr' 0.8.5 is being loaded, but >= 1.0.0 is required
ERROR: lazy loading failed for package 'broom'
* removing 'C:/Users/angus/Documents/R/win-library/3.5/broom'
In R CMD INSTALL
Warning in install.packages :
installation of package ‘broom’ had non-zero exit status
ERROR: dependencies 'latticeExtra', 'broom' are not available for package 'mosaic'
* removing 'C:/Users/angus/Documents/R/win-library/3.5/mosaic'
In R CMD INSTALL
Warning in install.packages :
installation of package ‘mosaic’ had non-zero exit status
Figured this out... please don't respond... thanks for all your help.
swet intro
Think you.
abline(RegModel) ... try this if you can't draw the regression line ;)
helloo.....i watched ur 1st R basic tutorial.its really fantastic.but i am having issue with pokemon episode.i created a folder named "pokemon" and pasted 2 files pokemon.xls and pokemon.csv but after typing the same commands of urs after pokemon
Thank you! What is your working directory? You can find out with getwd(). If the files are inside a folder in your working directory, R won't find them if you just refer to them by name. If the folder you created is inside your working directory, then you can just use read.csv("pokemon/pokemon.csv"). Does that make sense? Your working directory is the directory R will look in for files you mention by name. If you have files not in your working directory, you need to refer to them using their relative filepath. Let me know if this helps.
@@hefinrhys8572 yeah thankyou so much for immediate reply...it really means a lot...😊😊let me try if there is still any problem.i will let u know..
@@hefinrhys8572 what do you mean by working directory??can u tell me wher should i paste the folderpokemon) which contains both xls. and cvs file in my computer so that i can resume my working.....im stuck....please help me....
Hi, @@jayachandra4572 so watch from 2:06 to 3:30 where I explain about the working directory. A working directory in an R session is where R will look for files when you refer to them by name. The easiest way to do this is to create an R project (as shown in this part of the video). When you create an R project, RStudio creates an .RProj file, and wherever this file is, that folder is your working directory.
The reason for using a working directory is to limit how far R needs to search in order to find a file. Imagine you have the pokemon.csv file somewhere on your computer. When you tell R to read.csv("pokemon.csv"), you can't expect R to search your entire computer until it finds the file (and this would take along time). Instead, we tell R where to look for files, and this is your working directory.
So you created a folder *inside* your working directory called pokemon, and put the pokemon.csv file inside that. Notice that I didn't do that. Now, when you refer to pokemon.csv by name, R can't find it, because it isn't in the working directory, it's in a folder *inside* the working directory. For R to find this file, either move it to your working directory, or specify it's file path relative to your working directory. By this I mean read.csv("pokemon/pokemon.csv"). This tells R to look for the pokemon.csv file, in the pokemon folder, inside the working directory.
Does that make sense?
Error in plot.window(...) : need finite 'xlim' values
for plot(pokemon[, "Type.I"], pokemon[, "Atk"])
Anybody else facing the same issue?
You need to change "Type.I" from a character() to a factor() like this:
pokemon$Type.I = as.factor(pokemon$Type.I)
Then try to make the plot again!
I hope it works!
@@joaoteixeira5961 Thanks it worked out for me.
Can somebody tell me how to install mosaic? and let in run
hello...im getting this "NAs introduced by coercion" whe i gave command plot(pokemon)....how to overcome this issue
Hmm, I can't reproduce this error. Are you sure you've copied the code exactly as shown? Can you paste the code you've run here?
@@hefinrhys8572 Hi, i got the same error. It says "In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion" and an error about limiting xlim when i use the cod "plot(pokemon[, "Type.I"], pokemon[, "Atk"])". I could solve the second problem by coding as "plot(pokemon[, "Type.I"], pokemon[, "Atk"], xlim = c(0,200))". But then it only gives the error ""In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion" again. I couldn't figure it out about what's wrong.
@@hefinrhys8572 I couldn't**
@@hefinrhys8572 Hi, I also have the same error: trying to do the command plot(pokemon[ , "Type.I"], pokemon [ , "Atk"])
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
Oh we all have this error
Hello Hefin.. I'm a researcher working with fMRI analysis. I would like to know how to simulate fMRI data using R with neuRosim...Could you plz make a video on that available.
I have no idea what just happened.
u need to follow the 1st part of the video