- 51
- 195 660
Data Science with Yan
United States
Приєднався 24 лют 2021
Welcome to my channel! And welcome to the world of Statistics and Data Science!
I am a college professor, and some of the videos are initially made to help college students do better in their study of courses and research projects in Statistics and Data Science. I have been creating and sharing videos whenever I find the time.
I am a college professor, and some of the videos are initially made to help college students do better in their study of courses and research projects in Statistics and Data Science. I have been creating and sharing videos whenever I find the time.
Flights dataset| data manipulation | deal with date/week of the day | calendar heat map| R tidyverse
This video uses a flight dataset to go over a few typical data manipulations and explorations.
(I tend to speak slowly when explaining things. You may change the speed to 1.25 or 1.5 if you like to hear it faster.)
R code for this video is here: github.com/yz-DataScience/R-for-data-science/blob/main/a%20flight%20dataset%20for%20data%20manipulation%20and%20heatmap.R
Download this dataset in the video: github.com/yz-DataScience/R-for-data-science/blob/main/ny_airports_nov2022.csv click the downward arrow to download and import it into your RStudio.
Original dataset source: raw.githubusercontent.com/iramler/stat234/main/notes/data/ny_airports_nov2022.csv
(I tend to speak slowly when explaining things. You may change the speed to 1.25 or 1.5 if you like to hear it faster.)
R code for this video is here: github.com/yz-DataScience/R-for-data-science/blob/main/a%20flight%20dataset%20for%20data%20manipulation%20and%20heatmap.R
Download this dataset in the video: github.com/yz-DataScience/R-for-data-science/blob/main/ny_airports_nov2022.csv click the downward arrow to download and import it into your RStudio.
Original dataset source: raw.githubusercontent.com/iramler/stat234/main/notes/data/ny_airports_nov2022.csv
Переглядів: 125
Відео
Split strings in R | change variables from characters to numeric | strisplit( )
Переглядів 105Рік тому
Split strings in R | change variables from characters to numeric | strisplit( )
Using R to draw US maps | regions | Selected States in the United States PART 1
Переглядів 555Рік тому
This starts a series of videos drawing the US maps using R. It can be very helpful to visualize data with location information since it shows clearly where the data is mostly from, or which part of the US is more in poverty, etc. The R code for this video is here: github.com/yz-DataScience/R-for-data-science/blob/main/R for US map part1.R
One step boxplot considering two factors | customize your plot color palette
Переглядів 31Рік тому
One step boxplot considering two factors | customize your plot color palette
check for normal conditions | normality test | histogram | qq plot
Переглядів 876Рік тому
In this video, you are going to learn to use three tools to check normality condition: 1. histogram 2. q q plot 3. shapiro-wilk normality test The r code is here: github.com/yz-DataScience/R-for-data-science/blob/main/check for normality.R
Logarithmic regression| non-linear regression| lm in R| visualization of models
Переглядів 4,5 тис.Рік тому
In this video, you are going to see a dataset named growth data. And using this data, I will guide you through the exploration, fit a model, and visualize the model. R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/non linear regression.R Bluebell and Growth data can be found here: github.com/yz-DataScience/R-for-data-science Make sure you import from excel if you download...
polynomial regression using R | non-linear regression | curved regression
Переглядів 2,4 тис.Рік тому
In this video, you will learn to build a polynomial regression through a data set example. R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/non linear regression.R Bluebell and Growth data can be found here: github.com/yz-DataScience/R-for-data-science Make sure you import from excel if you download the excel file. In the video, I imported the .csv file. Part 2 of this vid...
The full process of one-way ANOVA | EDA | aggregate| model process| using R| RStudio
Переглядів 1952 роки тому
In this video, you will learn the full process of conducting ANOVA analysis, - set up the research question - the EDA ( boxplot and aggregate data) - the modeling process - the interpretations. The full R code is available here: github.com/yz-DataScience/R-for-data-science/blob/main/one-way ANOVA R code.R
outliers and influential points| how to identify| understand them using data in R
Переглядів 1,3 тис.2 роки тому
Outliers and Influential points in one short video!! I created a few datasets to create some examples to show you: 1) how to check an outlier of the regression model 2) how to check if an outlier is an influential point. The R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/outliers and influential points.R
working directory in R | check and change | export a R dataset to csv document
Переглядів 2482 роки тому
The working directory is important! Change it to a folder that you like! For a reading document, see here: bookdown.org/ndphillips/YaRrr/the-working-directory.html
Leave one out and k-fold cross validation| using R| cv.glm | train and test data | prediction error
Переглядів 11 тис.2 роки тому
In this video, I show you in R: 1. how to split data into training and test data 2. how to use training data to build the model and test data to check the model 3. how to do cross-validation using cv.glm() 4. how to do k-fold cross validation using cv.glm() To learn about the idea behind cross-validation, please check out this video: ua-cam.com/video/x1gz-M4VT14/v-deo.html It takes you 8 minute...
smoothing splines in R | degrees of freedom in smooth.spline | data predictions| data matches
Переглядів 4,1 тис.2 роки тому
In this video, you will learn about smoothing splines and how it changes as you change the degrees of freedom. We will use smooth.spline( ) R function and on a dataset named Auto. Again, the full R code from this video is posted here: github.com/yz-DataScience/R-for-data-science/blob/main/smooth.spline in R.R
data visualization| ggplot2| dplyr| data manipulation| Bar plot with error bars using R
Переглядів 1,3 тис.2 роки тому
In this video, you will see me combine two popular R packages together and draw the graph in one step: dplyr: data manipulation ggplot2: data visualization In this video, you. can follow me to create your dataset and practice drawing the graph. You can also import your own data and follow along to create the graph for your data.
creating dummy variables automatically using R | dummy_cols function|
Переглядів 11 тис.2 роки тому
Hello everyone, in this video, you will learn to use a function from R to create automatic dummy variables. The function is named as dummy_cols( ). It is quick and easy. And it can create dummy variables for each and every categorical variable in a dataset. This is a follow up to my previous video on using ifelse( ) to create dummy variables and categorical variables: ua-cam.com/video/rLXlab5Kz...
How to plot any function curves in R | draw function curves using R | plot( ) | curve ( ) R function
Переглядів 12 тис.2 роки тому
In this video, you will learn to draw function curves using R. From the examples, you will know how to draw any function curve. I find drawing function curves very helpful because it helps you to see the increasing or decreasing trend of the function. It also gives you some idea about when the function achieves its minimum or maximum value. The R code from this video is also available here: git...
What is cross validation? Why we need it? Leave one out and k-fold cross validation
Переглядів 6 тис.2 роки тому
What is cross validation? Why we need it? Leave one out and k-fold cross validation
what is poisson regression | what are really GLM?| using R | fit the model | real data examples
Переглядів 1,3 тис.2 роки тому
what is poisson regression | what are really GLM?| using R | fit the model | real data examples
logistic regression using R | when to use | fit | interpret coefficients| odds | chi-square test
Переглядів 5 тис.3 роки тому
logistic regression using R | when to use | fit | interpret coefficients| odds | chi-square test
Hypothesis tests on Multiple linear regression using R | T-test| partial F-test| model comparison
Переглядів 4,9 тис.3 роки тому
Hypothesis tests on Multiple linear regression using R | T-test| partial F-test| model comparison
Multiple linear regression model using R | lm( ) | variations of MLR | visualize results coefplot( )
Переглядів 2,5 тис.3 роки тому
Multiple linear regression model using R | lm( ) | variations of MLR | visualize results coefplot( )
boxplot for comparison | before and after| group cross group comparison| ggplot2| R
Переглядів 4,1 тис.3 роки тому
boxplot for comparison | before and after| group cross group comparison| ggplot2| R
Modeling using R | simple linear regression| correlations, visualizations, fit a model lm() function
Переглядів 1 тис.3 роки тому
Modeling using R | simple linear regression| correlations, visualizations, fit a model lm() function
Create dates and times in R lubridate package| make_datetime( ) function | ymd( ) in RStudio
Переглядів 2,9 тис.3 роки тому
Create dates and times in R lubridate package| make_datetime( ) function | ymd( ) in RStudio
combine different datasets into one | relational data | R for data science | left_join function in R
Переглядів 7853 роки тому
combine different datasets into one | relational data | R for data science | left_join function in R
Tidy messy data | R for data science | tidyr tidyverse package
Переглядів 1,2 тис.3 роки тому
Tidy messy data | R for data science | tidyr tidyverse package
tibbles and data frames in R | R for data science| book club| How to create tibbles and subset it
Переглядів 3,6 тис.3 роки тому
tibbles and data frames in R | R for data science| book club| How to create tibbles and subset it
EDA part 2| ultimate guide to visualize covariations on two variables | R for data science book club
Переглядів 3593 роки тому
EDA part 2| ultimate guide to visualize covariations on two variables | R for data science book club
EDA exploratory data analysis part 1 distributions of one variable | R for data science book club
Переглядів 4773 роки тому
EDA exploratory data analysis part 1 distributions of one variable | R for data science book club
A guide to help you organize your R scripts | how to find your old code file quickly
Переглядів 1,5 тис.3 роки тому
A guide to help you organize your R scripts | how to find your old code file quickly
Atomic Habits: Get better each day | R for loops, data manipulation, data visualization all in one
Переглядів 1863 роки тому
Atomic Habits: Get better each day | R for loops, data manipulation, data visualization all in one
suono del sudore con spiegazione chiara grazie!
Many thanks for the tutorial. Unfortunately, the script is difficult to read in some parts and I can’t comprehend to do the analysis in R.
well explained, appreciated
Thank You, could you please share any resource to learn checking these assumptions for linear regression
Very nice video! May i ask you, how can i plot an ellipse?
Can u drop R downloader link please
How you can do the leave one out cross validation for mixed effects models?
It helped me a lot, thank you!
This was very helpful, thank you!!
If it was possible to paste or add here I'll do it. I mean that I'll show up my error or maybe my r software is missing something
My error is about stacked bar plots unless I made a mistake
Stacked bar plot should be in another video, not this one
@@datasciencewithyan4124 In fact, I'm asking whether my R software is missing something. another example could be like what your friend Biologist asked you to do . PLZ
Sorry too, I wouldn't asked, because email is personal like you said. If I can paste what I made in order to show my error I'll do it. Thank you!
Excuse me, month ago, I sent a message that I would like to send an email in order to see what I made unfortunately no reply. In fact, I did not succeed.
Hi Sorry, I don’t provide personal email or individual help. What was your error message? Can you copy it here?
hi! i want to make a stacked bar plot, however it keeps adding up the values for the different groups I have (graphing years on x -axis, and different income categories in the fill bars). e.g., high income is 20 people, low income 40 ... however it is adding up all the groups e.g., 40 + 20, and graphing the total, instead of having the highest value at the top of the bar it has the total. does this make sense? how can i fix this? this is my code: library(ggplot2) library(dplyr) # Filter the data for the required entities and select relevant columns filtered_data <- disasters %>% filter(Entity %in% c("High income", "Low income", "Lower middle income", "Upper middle income")) %>% select(Year, Entity, no..deaths) # Calculate the maximum value for each year max_values <- filtered_data %>% group_by(Year) %>% summarise(max_deaths = max(no..deaths)) # Merge the maximum values with the filtered data merged_data <- filtered_data %>% left_join(max_values, by = "Year") # Create the stacked bar plot with individual entity values ggplot(filtered_data, aes(x = Year, y = no..deaths, fill = Entity)) + geom_bar(stat = "identity") + labs(title = "Number of Deaths by Entity and Year", x = "Year", y = "Number of Deaths") + scale_fill_manual(values = c("High income" = "pink", "Low income" = "green", "Lower middle income" = "yellow", "Upper middle income" = "orange")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) + ylim(0, NA) + # Remove y-axis limits scale_x_continuous(breaks = seq(1900, max(filtered_data$Year), by = 10)) # Set breaks to increase by 10 years ggplot(disasters_money ,aes(y = no..deaths, x = Year, fill = Entity)) + geom_bar(stat = "identity") + labs(title = "Number of Deaths by Entity and Year", x = "Year", y = "Number of Deaths") + scale_fill_manual(values = c("High income" = "pink", "Low income" = "green", "Lower middle income" = "yellow", "Upper middle income" = "orange"))
Very nice. For fans of ggplot2 there is the ggformula package. It has the gf_fun() function that will generate curves in the spirit of ggplot2.
Otherwise type my name and you will find my LinkedIn or Twitter or Instagram. Please.
In order to find what I made you may send me your second or third e-mail. PLZ. To enter into it and see it you may convert it to Internet explorer after receiving it. Still not make it in the way you did. I need a hand. Thank you!
Thank you so much. I have a practical tomorrow and I will be asked to do this, I was struggling until I found your video. TYSM again!!
The time is international or military time. Standard the world over.
mam, your content is best 😇
Thank you😀
well explained. Thank you
thanks
Rstudio, R 4.2.3
Perhaps, if I may have an email address I will save what I done and sent it to You. Thank you!
I tried to follow You unfortunately my boxplot it appears minus or dash
Can you paste your r code so I can take a look?
Dummy variables are not showing using fixed effect, R drops the variables because of multicollinearity...I don't know what to do now
It is possible that one variable tells all the information about the other variable. You may consider removing the variable you don’t want to include.
Hi, Yan. What annotation pen did you use to draw the curve line in the video? Thanks.
The generated picture is by R. The annotation is using zoom tool when I record the video.
谢谢你!@@datasciencewithyan4124
Thanks a lot! I have been stuck on that question.
How you make a video for same example using repeated measure anova
Thanks a ton.
Most welcome!
You need to make a new video because your video and audio are out of sync, do you check this stuff before you hit upload?
It is not out of sync from my side.
thank you so helpful!!!
Very clear explanation! Learned a lot from your video❥(^_-)
what software do you use?
This is R, used in RStudio
thank you so much@@datasciencewithyan4124
I did’t find The package
Excellent explanation, thank you!
Hi, I need to do regression analysis for multiple variables using a non-linear model. Is there a type of model you can suggest, or put me in the right direction? Thank You !
send the data
could you please make a video on how to handle imbalance datasets in data science project
What's your Xiaohongshu?
nice video, i like, i comment, and i subscribble. thanks for the information as always!
Thank you .very useful.you are a very good teacher .
Thanks
Thanks a lot for your explanation. Really helpful and well presented
You are welcome!
Thank you . Great vedeo
Thank you! I was searching for the code to run this: Gn) In the existing R database - iris, there exists outliers in the column values of Sepal.Width. Ques.) To create a new column that signify that the record is an outlier ('Yes' for outlier and 'No' for other records). Ans) Built through this video: summary(iris$Sepal.Width) #Get quartile values from the summary q1=2.8 q3=3.3 InterQuartileRange=IQR(iris$Sepal.Width) LowerWhisker=q1-(1.5*InterQuartileRange) UpperWhisker=q3+(1.5*iInterQuartileRange) iris$is_Outlier=ifelse(iris$Sepal.Width>UpperWhisker | iris$Sepal.Width<LowerWhisker,"yes","no") iris
good
thank you very much , but i encounterd a difficuality to conduct leave one-out sensitivity analysis using R for prevalence or signle proportion meta analysis
Error in install.packages : Updating loaded packages > ggplot(crab_data,aes(x=species,y=weight,fill=stage))+ + geom_boxplot() Error in ggplot(crab_data, aes(x = species, y = weight, fill = stage)) : could not find function "ggplot" > ggplot(crab_data,aes(x=species,y=weight,fill=stage))+ + geom_boxplot() Error in ggplot(crab_data, aes(x = species, y = weight, fill = stage)) : could not find function "ggplot"
Download the latest version
Sorry, except R software we downloaded, we also need to download ggplot (ggplot2) before running, right?
you're a life saviour!
Glad you find it helpful
Thank you. great vidéo