Data Science with Yan
Data Science with Yan
  • 51
  • 195 660
Flights dataset| data manipulation | deal with date/week of the day | calendar heat map| R tidyverse
This video uses a flight dataset to go over a few typical data manipulations and explorations.
(I tend to speak slowly when explaining things. You may change the speed to 1.25 or 1.5 if you like to hear it faster.)
R code for this video is here: github.com/yz-DataScience/R-for-data-science/blob/main/a%20flight%20dataset%20for%20data%20manipulation%20and%20heatmap.R
Download this dataset in the video: github.com/yz-DataScience/R-for-data-science/blob/main/ny_airports_nov2022.csv click the downward arrow to download and import it into your RStudio.
Original dataset source: raw.githubusercontent.com/iramler/stat234/main/notes/data/ny_airports_nov2022.csv
Переглядів: 125

Відео

Split strings in R | change variables from characters to numeric | strisplit( )
Переглядів 105Рік тому
Split strings in R | change variables from characters to numeric | strisplit( )
Using R to draw US maps | regions | Selected States in the United States PART 1
Переглядів 555Рік тому
This starts a series of videos drawing the US maps using R. It can be very helpful to visualize data with location information since it shows clearly where the data is mostly from, or which part of the US is more in poverty, etc. The R code for this video is here: github.com/yz-DataScience/R-for-data-science/blob/main/R for US map part1.R
One step boxplot considering two factors | customize your plot color palette
Переглядів 31Рік тому
One step boxplot considering two factors | customize your plot color palette
check for normal conditions | normality test | histogram | qq plot
Переглядів 876Рік тому
In this video, you are going to learn to use three tools to check normality condition: 1. histogram 2. q q plot 3. shapiro-wilk normality test The r code is here: github.com/yz-DataScience/R-for-data-science/blob/main/check for normality.R
Logarithmic regression| non-linear regression| lm in R| visualization of models
Переглядів 4,5 тис.Рік тому
In this video, you are going to see a dataset named growth data. And using this data, I will guide you through the exploration, fit a model, and visualize the model. R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/non linear regression.R Bluebell and Growth data can be found here: github.com/yz-DataScience/R-for-data-science Make sure you import from excel if you download...
polynomial regression using R | non-linear regression | curved regression
Переглядів 2,4 тис.Рік тому
In this video, you will learn to build a polynomial regression through a data set example. R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/non linear regression.R Bluebell and Growth data can be found here: github.com/yz-DataScience/R-for-data-science Make sure you import from excel if you download the excel file. In the video, I imported the .csv file. Part 2 of this vid...
The full process of one-way ANOVA | EDA | aggregate| model process| using R| RStudio
Переглядів 1952 роки тому
In this video, you will learn the full process of conducting ANOVA analysis, - set up the research question - the EDA ( boxplot and aggregate data) - the modeling process - the interpretations. The full R code is available here: github.com/yz-DataScience/R-for-data-science/blob/main/one-way ANOVA R code.R
outliers and influential points| how to identify| understand them using data in R
Переглядів 1,3 тис.2 роки тому
Outliers and Influential points in one short video!! I created a few datasets to create some examples to show you: 1) how to check an outlier of the regression model 2) how to check if an outlier is an influential point. The R code is here: github.com/yz-DataScience/R-for-data-science/blob/main/outliers and influential points.R
working directory in R | check and change | export a R dataset to csv document
Переглядів 2482 роки тому
The working directory is important! Change it to a folder that you like! For a reading document, see here: bookdown.org/ndphillips/YaRrr/the-working-directory.html
Leave one out and k-fold cross validation| using R| cv.glm | train and test data | prediction error
Переглядів 11 тис.2 роки тому
In this video, I show you in R: 1. how to split data into training and test data 2. how to use training data to build the model and test data to check the model 3. how to do cross-validation using cv.glm() 4. how to do k-fold cross validation using cv.glm() To learn about the idea behind cross-validation, please check out this video: ua-cam.com/video/x1gz-M4VT14/v-deo.html It takes you 8 minute...
smoothing splines in R | degrees of freedom in smooth.spline | data predictions| data matches
Переглядів 4,1 тис.2 роки тому
In this video, you will learn about smoothing splines and how it changes as you change the degrees of freedom. We will use smooth.spline( ) R function and on a dataset named Auto. Again, the full R code from this video is posted here: github.com/yz-DataScience/R-for-data-science/blob/main/smooth.spline in R.R
data visualization| ggplot2| dplyr| data manipulation| Bar plot with error bars using R
Переглядів 1,3 тис.2 роки тому
In this video, you will see me combine two popular R packages together and draw the graph in one step: dplyr: data manipulation ggplot2: data visualization In this video, you. can follow me to create your dataset and practice drawing the graph. You can also import your own data and follow along to create the graph for your data.
creating dummy variables automatically using R | dummy_cols function|
Переглядів 11 тис.2 роки тому
Hello everyone, in this video, you will learn to use a function from R to create automatic dummy variables. The function is named as dummy_cols( ). It is quick and easy. And it can create dummy variables for each and every categorical variable in a dataset. This is a follow up to my previous video on using ifelse( ) to create dummy variables and categorical variables: ua-cam.com/video/rLXlab5Kz...
How to plot any function curves in R | draw function curves using R | plot( ) | curve ( ) R function
Переглядів 12 тис.2 роки тому
In this video, you will learn to draw function curves using R. From the examples, you will know how to draw any function curve. I find drawing function curves very helpful because it helps you to see the increasing or decreasing trend of the function. It also gives you some idea about when the function achieves its minimum or maximum value. The R code from this video is also available here: git...
What is cross validation? Why we need it? Leave one out and k-fold cross validation
Переглядів 6 тис.2 роки тому
What is cross validation? Why we need it? Leave one out and k-fold cross validation
what is poisson regression | what are really GLM?| using R | fit the model | real data examples
Переглядів 1,3 тис.2 роки тому
what is poisson regression | what are really GLM?| using R | fit the model | real data examples
logistic regression using R | when to use | fit | interpret coefficients| odds | chi-square test
Переглядів 5 тис.3 роки тому
logistic regression using R | when to use | fit | interpret coefficients| odds | chi-square test
Hypothesis tests on Multiple linear regression using R | T-test| partial F-test| model comparison
Переглядів 4,9 тис.3 роки тому
Hypothesis tests on Multiple linear regression using R | T-test| partial F-test| model comparison
Multiple linear regression model using R | lm( ) | variations of MLR | visualize results coefplot( )
Переглядів 2,5 тис.3 роки тому
Multiple linear regression model using R | lm( ) | variations of MLR | visualize results coefplot( )
boxplot for comparison | before and after| group cross group comparison| ggplot2| R
Переглядів 4,1 тис.3 роки тому
boxplot for comparison | before and after| group cross group comparison| ggplot2| R
Modeling using R | simple linear regression| correlations, visualizations, fit a model lm() function
Переглядів 1 тис.3 роки тому
Modeling using R | simple linear regression| correlations, visualizations, fit a model lm() function
Create dates and times in R lubridate package| make_datetime( ) function | ymd( ) in RStudio
Переглядів 2,9 тис.3 роки тому
Create dates and times in R lubridate package| make_datetime( ) function | ymd( ) in RStudio
combine different datasets into one | relational data | R for data science | left_join function in R
Переглядів 7853 роки тому
combine different datasets into one | relational data | R for data science | left_join function in R
Tidy messy data | R for data science | tidyr tidyverse package
Переглядів 1,2 тис.3 роки тому
Tidy messy data | R for data science | tidyr tidyverse package
tibbles and data frames in R | R for data science| book club| How to create tibbles and subset it
Переглядів 3,6 тис.3 роки тому
tibbles and data frames in R | R for data science| book club| How to create tibbles and subset it
EDA part 2| ultimate guide to visualize covariations on two variables | R for data science book club
Переглядів 3593 роки тому
EDA part 2| ultimate guide to visualize covariations on two variables | R for data science book club
EDA exploratory data analysis part 1 distributions of one variable | R for data science book club
Переглядів 4773 роки тому
EDA exploratory data analysis part 1 distributions of one variable | R for data science book club
A guide to help you organize your R scripts | how to find your old code file quickly
Переглядів 1,5 тис.3 роки тому
A guide to help you organize your R scripts | how to find your old code file quickly
Atomic Habits: Get better each day | R for loops, data manipulation, data visualization all in one
Переглядів 1863 роки тому
Atomic Habits: Get better each day | R for loops, data manipulation, data visualization all in one

КОМЕНТАРІ

  • @asnakewworku4163
    @asnakewworku4163 7 днів тому

    suono del sudore con spiegazione chiara grazie!

  • @roselvyjuarez4391
    @roselvyjuarez4391 24 дні тому

    Many thanks for the tutorial. Unfortunately, the script is difficult to read in some parts and I can’t comprehend to do the analysis in R.

  • @statisticsbymalik4158
    @statisticsbymalik4158 5 місяців тому

    well explained, appreciated

  • @himankaghosh7307
    @himankaghosh7307 5 місяців тому

    Thank You, could you please share any resource to learn checking these assumptions for linear regression

  • @MathimatikametonGiorgo
    @MathimatikametonGiorgo 6 місяців тому

    Very nice video! May i ask you, how can i plot an ellipse?

  • @yapya4199
    @yapya4199 6 місяців тому

    Can u drop R downloader link please

  • @rafaelpugoni
    @rafaelpugoni 6 місяців тому

    How you can do the leave one out cross validation for mixed effects models?

  • @alessandromorra7797
    @alessandromorra7797 6 місяців тому

    It helped me a lot, thank you!

  • @hecreatescoding
    @hecreatescoding 7 місяців тому

    This was very helpful, thank you!!

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 7 місяців тому

    If it was possible to paste or add here I'll do it. I mean that I'll show up my error or maybe my r software is missing something

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 7 місяців тому

    My error is about stacked bar plots unless I made a mistake

    • @datasciencewithyan4124
      @datasciencewithyan4124 7 місяців тому

      Stacked bar plot should be in another video, not this one

    • @tchistermorrelebissa8628
      @tchistermorrelebissa8628 7 місяців тому

      @@datasciencewithyan4124 In fact, I'm asking whether my R software is missing something. another example could be like what your friend Biologist asked you to do . PLZ

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 7 місяців тому

    Sorry too, I wouldn't asked, because email is personal like you said. If I can paste what I made in order to show my error I'll do it. Thank you!

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 7 місяців тому

    Excuse me, month ago, I sent a message that I would like to send an email in order to see what I made unfortunately no reply. In fact, I did not succeed.

    • @datasciencewithyan4124
      @datasciencewithyan4124 7 місяців тому

      Hi Sorry, I don’t provide personal email or individual help. What was your error message? Can you copy it here?

  • @ellapisani4259
    @ellapisani4259 7 місяців тому

    hi! i want to make a stacked bar plot, however it keeps adding up the values for the different groups I have (graphing years on x -axis, and different income categories in the fill bars). e.g., high income is 20 people, low income 40 ... however it is adding up all the groups e.g., 40 + 20, and graphing the total, instead of having the highest value at the top of the bar it has the total. does this make sense? how can i fix this? this is my code: library(ggplot2) library(dplyr) # Filter the data for the required entities and select relevant columns filtered_data <- disasters %>% filter(Entity %in% c("High income", "Low income", "Lower middle income", "Upper middle income")) %>% select(Year, Entity, no..deaths) # Calculate the maximum value for each year max_values <- filtered_data %>% group_by(Year) %>% summarise(max_deaths = max(no..deaths)) # Merge the maximum values with the filtered data merged_data <- filtered_data %>% left_join(max_values, by = "Year") # Create the stacked bar plot with individual entity values ggplot(filtered_data, aes(x = Year, y = no..deaths, fill = Entity)) + geom_bar(stat = "identity") + labs(title = "Number of Deaths by Entity and Year", x = "Year", y = "Number of Deaths") + scale_fill_manual(values = c("High income" = "pink", "Low income" = "green", "Lower middle income" = "yellow", "Upper middle income" = "orange")) + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) + ylim(0, NA) + # Remove y-axis limits scale_x_continuous(breaks = seq(1900, max(filtered_data$Year), by = 10)) # Set breaks to increase by 10 years ggplot(disasters_money ,aes(y = no..deaths, x = Year, fill = Entity)) + geom_bar(stat = "identity") + labs(title = "Number of Deaths by Entity and Year", x = "Year", y = "Number of Deaths") + scale_fill_manual(values = c("High income" = "pink", "Low income" = "green", "Lower middle income" = "yellow", "Upper middle income" = "orange"))

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 7 місяців тому

    Very nice. For fans of ggplot2 there is the ggformula package. It has the gf_fun() function that will generate curves in the spirit of ggplot2.

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 8 місяців тому

    Otherwise type my name and you will find my LinkedIn or Twitter or Instagram. Please.

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 8 місяців тому

    In order to find what I made you may send me your second or third e-mail. PLZ. To enter into it and see it you may convert it to Internet explorer after receiving it. Still not make it in the way you did. I need a hand. Thank you!

  • @Gifritaaa
    @Gifritaaa 8 місяців тому

    Thank you so much. I have a practical tomorrow and I will be asked to do this, I was struggling until I found your video. TYSM again!!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 9 місяців тому

    The time is international or military time. Standard the world over.

  • @anikdebbarman1448
    @anikdebbarman1448 9 місяців тому

    mam, your content is best 😇

  • @阿莲游记
    @阿莲游记 9 місяців тому

    well explained. Thank you

  • @adityamisra7702
    @adityamisra7702 9 місяців тому

    thanks

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 9 місяців тому

    Rstudio, R 4.2.3

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 9 місяців тому

    Perhaps, if I may have an email address I will save what I done and sent it to You. Thank you!

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 9 місяців тому

    I tried to follow You unfortunately my boxplot it appears minus or dash

  • @dka9756
    @dka9756 10 місяців тому

    Dummy variables are not showing using fixed effect, R drops the variables because of multicollinearity...I don't know what to do now

    • @datasciencewithyan4124
      @datasciencewithyan4124 10 місяців тому

      It is possible that one variable tells all the information about the other variable. You may consider removing the variable you don’t want to include.

  • @CanDoSo_org
    @CanDoSo_org 10 місяців тому

    Hi, Yan. What annotation pen did you use to draw the curve line in the video? Thanks.

    • @datasciencewithyan4124
      @datasciencewithyan4124 10 місяців тому

      The generated picture is by R. The annotation is using zoom tool when I record the video.

    • @CanDoSo_org
      @CanDoSo_org 10 місяців тому

      谢谢你!@@datasciencewithyan4124

  • @user-kg4kk1dg8p
    @user-kg4kk1dg8p 10 місяців тому

    Thanks a lot! I have been stuck on that question.

  • @WheatAndrogenesisRatib1993
    @WheatAndrogenesisRatib1993 11 місяців тому

    How you make a video for same example using repeated measure anova

  • @AbdulRahman-r4i5h
    @AbdulRahman-r4i5h Рік тому

    Thanks a ton.

  • @a1cswiz1611
    @a1cswiz1611 Рік тому

    You need to make a new video because your video and audio are out of sync, do you check this stuff before you hit upload?

  • @rustamatahoja
    @rustamatahoja Рік тому

    thank you so helpful!!!

  • @yuchuxie8170
    @yuchuxie8170 Рік тому

    Very clear explanation! Learned a lot from your video❥(^_-)

  • @alianatasha3296
    @alianatasha3296 Рік тому

    what software do you use?

  • @aatot5100
    @aatot5100 Рік тому

    I did’t find The package

  • @danielmartineau3089
    @danielmartineau3089 Рік тому

    Excellent explanation, thank you!

  • @antomathew9502
    @antomathew9502 Рік тому

    Hi, I need to do regression analysis for multiple variables using a non-linear model. Is there a type of model you can suggest, or put me in the right direction? Thank You !

  • @varalakshmipokala9653
    @varalakshmipokala9653 Рік тому

    could you please make a video on how to handle imbalance datasets in data science project

  • @jamesgazeley
    @jamesgazeley Рік тому

    What's your Xiaohongshu?

  • @iancastille9934
    @iancastille9934 Рік тому

    nice video, i like, i comment, and i subscribble. thanks for the information as always!

  • @bkarim7349
    @bkarim7349 Рік тому

    Thank you .very useful.you are a very good teacher .

  • @eliascapini2049
    @eliascapini2049 Рік тому

    Thanks a lot for your explanation. Really helpful and well presented

  • @bkarim7349
    @bkarim7349 Рік тому

    Thank you . Great vedeo

  • @Sruthia-p8s
    @Sruthia-p8s Рік тому

    Thank you! I was searching for the code to run this: Gn) In the existing R database - iris, there exists outliers in the column values of Sepal.Width. Ques.) To create a new column that signify that the record is an outlier ('Yes' for outlier and 'No' for other records). Ans) Built through this video: summary(iris$Sepal.Width) #Get quartile values from the summary q1=2.8 q3=3.3 InterQuartileRange=IQR(iris$Sepal.Width) LowerWhisker=q1-(1.5*InterQuartileRange) UpperWhisker=q3+(1.5*iInterQuartileRange) iris$is_Outlier=ifelse(iris$Sepal.Width>UpperWhisker | iris$Sepal.Width<LowerWhisker,"yes","no") iris

  • @khanakbarzafar303
    @khanakbarzafar303 Рік тому

    good

  • @gueshmebrahtom3443
    @gueshmebrahtom3443 Рік тому

    thank you very much , but i encounterd a difficuality to conduct leave one-out sensitivity analysis using R for prevalence or signle proportion meta analysis

  • @tchistermorrelebissa8628
    @tchistermorrelebissa8628 Рік тому

    Error in install.packages : Updating loaded packages > ggplot(crab_data,aes(x=species,y=weight,fill=stage))+ + geom_boxplot() Error in ggplot(crab_data, aes(x = species, y = weight, fill = stage)) : could not find function "ggplot" > ggplot(crab_data,aes(x=species,y=weight,fill=stage))+ + geom_boxplot() Error in ggplot(crab_data, aes(x = species, y = weight, fill = stage)) : could not find function "ggplot"

  • @morrelebissa9029
    @morrelebissa9029 Рік тому

    Sorry, except R software we downloaded, we also need to download ggplot (ggplot2) before running, right?

  • @sylviapinheiro2591
    @sylviapinheiro2591 Рік тому

    you're a life saviour!

  • @bkarim7349
    @bkarim7349 Рік тому

    Thank you. great vidéo