Statistics Ninja
Statistics Ninja
  • 517
  • 388 766

Відео

05 Factorial designs principles and applications
Переглядів 67Місяць тому
05 Factorial designs principles and applications
04 Normal data
Переглядів 31Місяць тому
04 Normal data
03 Experimental Data Setup - Blocking and Stratification
Переглядів 19Місяць тому
03 Experimental Data Setup - Blocking and Stratification
02 Setting Up Experiments
Переглядів 51Місяць тому
02 Setting Up Experiments
01 Introduction to Experimental Design
Переглядів 83Місяць тому
01 Introduction to Experimental Design
21 Ensemble Methods in Supervised Learning - Filmed during Hurricane Milton
Переглядів 59Місяць тому
This video was filmed while Hurricane Milton was about to make landfall.
How frequent is Friday the 13th?
Переглядів 702 місяці тому
How frequent is Friday the 13th?
20 Model Selection in Supervised Learning
Переглядів 872 місяці тому
20 Model Selection in Supervised Learning
19 Model Validation
Переглядів 633 місяці тому
19 Model Validation
18 Parameter Tuning in Supervised Learning
Переглядів 553 місяці тому
18 Parameter Tuning in Supervised Learning
17 Neural Networks in Supervised Learning
Переглядів 1063 місяці тому
17 Neural Networks in Supervised Learning
16 K-Nearest Neighbors Models in Supervised Learning
Переглядів 1693 місяці тому
16 K-Nearest Neighbors Models in Supervised Learning
15 Supervised Learning with Gradient Boosting
Переглядів 933 місяці тому
15 Supervised Learning with Gradient Boosting
14 Random Forest Models in Supervised Learning
Переглядів 1104 місяці тому
14 Random Forest Models in Supervised Learning
Using a large language model for sentiment analysis
Переглядів 1644 місяці тому
Using a large language model for sentiment analysis
Using a large language model to classify topics
Переглядів 1354 місяці тому
Using a large language model to classify topics
Using a large language model for classification supervised learning
Переглядів 3494 місяці тому
Using a large language model for classification supervised learning
16 Histogram-based Gradient Boosting Regression Tree
Переглядів 2634 місяці тому
16 Histogram-based Gradient Boosting Regression Tree
13 Supervised learning with decision trees
Переглядів 565 місяців тому
13 Supervised learning with decision trees
12 Supervised learning with support vector machines
Переглядів 705 місяців тому
12 Supervised learning with support vector machines
11 Supervised learning with logistic regression
Переглядів 1096 місяців тому
11 Supervised learning with logistic regression
10 Generalized linear models in supervised learning
Переглядів 566 місяців тому
10 Generalized linear models in supervised learning
09 Feature Selection in Supervised Learning
Переглядів 2096 місяців тому
09 Feature Selection in Supervised Learning
08 Lasso, Ridge, and Elastic-Net Regression in Supervised Learning
Переглядів 1126 місяців тому
08 Lasso, Ridge, and Elastic-Net Regression in Supervised Learning
07 Linear Regression in Supervised Learning
Переглядів 1066 місяців тому
07 Linear Regression in Supervised Learning
06 Model Complexity and Generalization in Supervised Learning
Переглядів 986 місяців тому
06 Model Complexity and Generalization in Supervised Learning
05 Evaluating Classification Supervised Learning Model Quality
Переглядів 537 місяців тому
05 Evaluating Classification Supervised Learning Model Quality
04_Evaluating_Regression_Supervised_Learning_Model_Quality
Переглядів 977 місяців тому
04_Evaluating_Regression_Supervised_Learning_Model_Quality
03 Preparing data for regression supervised learning
Переглядів 1227 місяців тому
03 Preparing data for regression supervised learning

КОМЕНТАРІ

  • @prateekkumar.1325
    @prateekkumar.1325 25 днів тому

    nice sir! Thanks

  • @kalechips965
    @kalechips965 28 днів тому

    Thanks for the video! What if both your sample and the population are imbalanced, but to different degrees (e.g., 5:1 vs. 10:1)? Would changing class weights to reflect the population imbalance rather than the sample imbalance be a solution? If so, how does this affect the calibration of the model?

    • @statisticsninja
      @statisticsninja 28 днів тому

      @@kalechips965 Excellent question! It depends on your project. I would alter weights to address imbalance. If you need model score to equal the true probability, I would rescale model scores so that the mean training score equals the fraction of positive training. I would not about this if you do not need to interpret model scores. Find the cutoff that gives the best sensitivity, specificity, precision, recall etc.

    • @kalechips965
      @kalechips965 27 днів тому

      @statisticsninja I appreciate the tips! One more question that’s a bit more complex… As mentioned above, my sample has a 5:1 class imbalance. I've done a stratified split according to this imbalance, creating (1) a training set for model development (within which cross-validation subsets are themselves split using stratified k-folds) and (2) a test/holdout set purely for final model evaluation. For reproducibility, I have set a fixed seed variable and passed it to any method that has a "random_state" argument. There are two issues. First, my cross-validated training and test metrics are similar, but these same metrics can be more than 10% lower for my final test set. Second, multiple runs of my script using different random seeds causes the overall results to vary appreciably. I think these issues are related. I believe the relatively matched validation training/test scores reflects good capability of my models to learn from each validation subset. However, the fact that the validation test scores tend to be higher than the (unseen) holdout test scores indicates to me that these models do not generalise well. I think this points to a data issue - I would guess that the overall sample is unbalanced or biased in ways that are not captured by the class-stratified sampling. For example, some important features may be underrepresented in certain subgroups of a class, and their distribution would vary significantly across splits, affecting classifier performance. Does this interpretation make sense to you?

  • @wanqingtai1490
    @wanqingtai1490 Місяць тому

    You are amazing, I am using R leaflet to generate linear transects on the map with individual pairs of location. Although I didn't find the solution in this video, I still tried the techniques in your video. It's amazing and help me a lot. Thank you for posting this guide. Very detailed and understandable. Best wishes!

  • @minghuachang8126
    @minghuachang8126 Місяць тому

    Do you have a reference to any literature that you've used this in?

    • @statisticsninja
      @statisticsninja Місяць тому

      @@minghuachang8126 You can use the citation() function in R to get the citation information for a package. Typically it references the journal article that announced the package.

  • @nasheedjafri3564
    @nasheedjafri3564 2 місяці тому

    What if you get a steep negative slope line in your added variable plot?

    • @statisticsninja
      @statisticsninja 2 місяці тому

      @@nasheedjafri3564 If a variable has a steep slope positive or negative then you want to include it in your model.

  • @pauleseme5517
    @pauleseme5517 2 місяці тому

    THANKS for the series, it's helping me 🙏

  • @fathymohamed4312
    @fathymohamed4312 2 місяці тому

    Can you plz make a tutorial for spatial machine learning in python

    • @statisticsninja
      @statisticsninja 2 місяці тому

      @@fathymohamed4312 What type of spatial data do you have? The simplest approach would be to treat your spatial data as predictor variables. R has a lot more spatial tools than Python because R is more common in science.

    • @fathymohamed4312
      @fathymohamed4312 2 місяці тому

      @@statisticsninja thank you Sir

  • @nadimalfana
    @nadimalfana 3 місяці тому

    This is gold, thank you! I just want to ask how you conclude dimensionality of this set of items? Since PCA, EFA, and irt CFA tells you different number of factors?

    • @statisticsninja
      @statisticsninja 3 місяці тому

      @@nadimalfana That is a good question. It is a subjective decision. Try to balance the goals and constraints of your project, and what you see in the data. Make the best call you can. There is usually a range of reasonable values.

    • @nadimalfana
      @nadimalfana 3 місяці тому

      @@statisticsninja Woah, that's tough decision haha... Thanks!

  • @carmelbaris7088
    @carmelbaris7088 3 місяці тому

    this is just what I needed. thankssss!!!!!

  • @Jason-o5s
    @Jason-o5s 3 місяці тому

    Cheer~~~a charge or claim that someone has done something undesirable---an accusation.😅

  • @bilhanbel34
    @bilhanbel34 5 місяців тому

    I have a model in my task; one numerical 2 categorical variable. When I want to create a formula like you do here: formula1 = 'numerical ~ C(cat1) + C(cat2)' I can observe category one slightly less than 5% so I can reject null hypothesis however I see on another video they use one categorical variable to one numerical variable right? so formula2 = 'numerical ~ cat1' and I can observe that category one is 9% what exactly means in difference here when we use these two dependent variables in formula1 and formula2 and which formula we use ?

  • @GB-qc8un
    @GB-qc8un 5 місяців тому

    Hi, I liked your video tutorial, it is quick and easy to follow. May I ask how do you add your own data say number of published studies in state? How do you incorporate that into your map? Cheers!

    • @statisticsninja
      @statisticsninja 5 місяців тому

      The easiest way is to get an sf object with the spatial features and a data.frame, then merge your data.frame with merge.sf(). If you have the spatial features without a data.frame(). Then you need to match your data.frame() to the geometries.

  • @bridgettsmith7206
    @bridgettsmith7206 5 місяців тому

    Thanks

  • @bridgettsmith7206
    @bridgettsmith7206 6 місяців тому

    Thanks

  • @bridgettsmith7206
    @bridgettsmith7206 6 місяців тому

    Thanks

  • @bridgettsmith7206
    @bridgettsmith7206 6 місяців тому

    Thanks

  • @brunobarreto8812
    @brunobarreto8812 6 місяців тому

    When you use only categorical data where the options for the question are the same, you don't need to normalize the data?

    • @statisticsninja
      @statisticsninja 6 місяців тому

      For that situation the data are all on the same measurement scale, so I would not normalize. If I had data on different scales, height in inches and weight in lbs, then I normalize.

  • @bridgettsmith7206
    @bridgettsmith7206 6 місяців тому

    Thanks

  • @sinan_islam
    @sinan_islam 6 місяців тому

    you need to create playlist for multiple correspondence analysis

  • @bridgettsmith7206
    @bridgettsmith7206 7 місяців тому

    Thanks

  • @jericajadesy998
    @jericajadesy998 7 місяців тому

    Is it possible to generate the communalities and/or proportion of total variance explained in BEFA? Looking for some metric to assess the model fit of my model. Thank you!

    • @statisticsninja
      @statisticsninja 7 місяців тому

      I could not find a function to do this for you. You could manually compute regression diagnostics using the equation in BayesFM::befa Model specification. Use your befa to fill in everything except errors. Be sure to switch columns and signs of your befa output first.

    • @ThankfulAlways
      @ThankfulAlways 7 місяців тому

      How would I do this? The only output of BEFA are the factor loadings, variances of error terms, factor correlations and the indicator values. Is it possible to manually generate with only these values?

    • @statisticsninja
      @statisticsninja 7 місяців тому

      @@ThankfulAlways I found a better way. Use parameters::model_parameters, and parameters::efa_to_cfa to get the parameters from your befa model. Then fit a confirmatory factor analysis model using your preferred package. This will give your the full power of your favorite factor analysis package.

    • @ThankfulAlways
      @ThankfulAlways 7 місяців тому

      @@statisticsninja oh really? I will do some research and try it out. Never tried doing confirmatory factor analysis before. Not sure how it's different from exploratory factor analysis. Will read on that. Thanks a lot!

  • @MrArdahazal
    @MrArdahazal 8 місяців тому

    hi how can I take item information and test information parameters? can you help me for these syntaxes?

    • @statisticsninja
      @statisticsninja 8 місяців тому

      After you fit your model, enter your model into the str() function. It will print the slots of your model. You can use the slot names to extract what you need.

    • @MrArdahazal
      @MrArdahazal 8 місяців тому

      @@statisticsninja Thank you for reply, I hope you would share a video on how to obtain item and test information functions in multidimensional confirmatory IRT. 😀

    • @statisticsninja
      @statisticsninja 8 місяців тому

      @@MrArdahazal Which function are you using to fit your model?

    • @MrArdahazal
      @MrArdahazal 8 місяців тому

      @@statisticsninja Hi, I constructed a model as follows as far as I understand your presentation. Besides, I want to learn the test information function and item function parameters, but, I havent understood them. I tried to code them at below of the syntax but I am not sure. Model -------------------------------------- library(mirt) library(latticeExtra) cfa <- mirt::mirt.model( input=' pl = 1-9 sb = 10-15 db= 16-21 dk= 22-28 oy= 29-34 COV = pl*sb, pl*db, pl*dk, pl*oy, sb*db, sb*dk, sb*oy, db*dk, db*oy, dk*oy') ACS<- mirt(data=thesis, model=cfa, method= "MHRM", itemtype = 'graded', SE=FALSE, SE.type="MHRM", TOL=1e-2) ACST <- coef(ACS, IRTpars = TRUE, simplify = TRUE) options(max.print = 1000000) print(ACST, digits = 2) For the whole scale: test information matrix --------------------------------------- Theta <- matrix(seq(-4,4, by = .01)) thetas <- fscores(ACS, method = "EAP", rotate = "oblimin", QMC=TRUE) tinfo <- testinfo(ACS, thetas, degrees = c(0, 0, 0, 0, 0)) plot(thetas, tinfo, type="1") For Item 1: item information matrix ---------------------------------------- Theta <- as.matrix(expand.grid(-4:4, -4:4, -4:4, -4:4, -4:4)) iteminfo1 <- extract.item(ACS, 1) iteminfo<- iteminfo(iteminfo1, thetas, degrees = c(0, 0, 0, 0, 0), total.info= TRUE, multidim_matrix = TRUE) options(max.print = 1000000)

    • @statisticsninja
      @statisticsninja 8 місяців тому

      @@MrArdahazalI hope this helps. I posted the RMarkdown file on my website's shared files section. ua-cam.com/video/k_oNhQ9Fy6w/v-deo.html

  • @mounkailagarba9952
    @mounkailagarba9952 9 місяців тому

    Thank

  • @Philantrope
    @Philantrope 9 місяців тому

    Helpful insights - very well done. Thank you!

  • @Shog-Qi
    @Shog-Qi 9 місяців тому

    You are so to the point. I really believe American professors are built different!

  • @random_16Aj
    @random_16Aj 9 місяців тому

    Hey i am unsure if you will repond but my boss wants me to do multiple imputation And i never did that before. I have large dataset cleaned and manipulated . I am unsure what predictor matrix is.? Any help .. how to know which imputed dataset is better... i asked my boss that what if we use random forest... because it handle both numeric and categorical data.. is there any insights.. amy book article which will be helpful

    • @statisticsninja
      @statisticsninja 9 місяців тому

      The predictor matrix is all of your predictor predictors in a matrix or data frame. For multiple imputation, I prefer missRanger. A way to compare imputation methods would be to copy your data, ramdomly replace values with missing values, try several imputation methods, then compare the imputed values to the original values.

  • @RyanChen-g4q
    @RyanChen-g4q 10 місяців тому

    thank you

  • @random_cape_town
    @random_cape_town 10 місяців тому

    Much appreciated

  • @Riedas
    @Riedas 10 місяців тому

    What a great intro is sf and its functionalitys. Thank you so much and keep up the good work!

  • @ahmedduzce
    @ahmedduzce Рік тому

    Can i use R for satellite remote sensing and can you advice me from where to start?

    • @statisticsninja
      @statisticsninja Рік тому

      I have used R for analyzing satellite data. The project was a success. I would start by analyzing your data independentl of spatial coordinates, then analyze coordinate distribution independentl of data, perform full spatial data analysis.

    • @ahmedduzce
      @ahmedduzce 11 місяців тому

      t@@statisticsninja Thank you so much for your kind response and i hope to find your help in terms of guide me what course i should take to master R software analysing remote sensing data in short time as i started long time ago but unfortunitly i found i have to learn many things to be able to use R to analysis rmote sensing data .

  • @dokutorian
    @dokutorian Рік тому

    Hello, maptools:Rgshhs is not available anymore. Is there any other code we can use?

  • @apoorvasingh9747
    @apoorvasingh9747 Рік тому

    Hi Aaron. I tried to use the list_cv section of the code on my data and strangely it created list of length 0. could you suggest what can I try? Also the length of the dataset is very small.

    • @statisticsninja
      @statisticsninja Рік тому

      Make sure your data is in a data.frame and not a tibble.

  • @robertnelson3561
    @robertnelson3561 Рік тому

    I need help on fixing on R application error. Error in .local(obj, ...) Cannot derive coordinates from non-numeric matrix When we use raster:: intersect(a,b) method we are getting this error on new server but which is working fine on old R shiny server

    • @statisticsninja
      @statisticsninja Рік тому

      Make sure your data frame has only numeric or integer columns and pass your data frame to to.matrix() or matrix(). Make sure you are not using a tibble

    • @robertnelson3561
      @robertnelson3561 Рік тому

      @@statisticsninja okay let me check, thanks

    • @robertnelson3561
      @robertnelson3561 Рік тому

      ​@@statisticsninjaWe are using a spatial polygon & point to intersect which is not a matrix. coordinates(o_yb) <- ~easting+northing #convert the locations into a SpatialPoint proj4string(o_yb) <- CRS("+init=epsg:27700") #for each order, get the ycodes which intersect the building boundaries o_yb <- do.call(rbind,lapply(o$order_id[o$in_building == 1], function(x) { #x <- 1 #testing t_bld <- bld[bld$fid %in% o_bld$fid[o_bld$order_id == x],] #get the building boundaries for the order do.call(rbind,lapply(o_yb@data$key[o_yb@data$order_id == x], function(x1) { # x1 <- "YDLHP" #testing t_o_yb <- o_yb[(o_yb@data$order_id == x & o_yb@data$key == x1),] t1_bld <- r_intersect(t_bld,t_o_yb) #check if the ycode is in the building if (length(t1_bld) == 0) return(NULL) t_o_yb@data[,c("order_id","key")] #if length is greater than 0, make a data.frame, else return null })) }))

  • @abdulbouraa4529
    @abdulbouraa4529 Рік тому

    How do you check the quality of your imputation ? I’m confuse

    • @statisticsninja
      @statisticsninja Рік тому

      You can check to see if there is a statistical difference between imputed and nonimputed data; you can run anomaly detection and see if imputed records are proportionally a lot of anomalies, you can also train another model on non-imputed data to predict the imputed column and check the residuals when you predict imputed values

    • @abdulbouraa4529
      @abdulbouraa4529 Рік тому

      @@statisticsninja Thank You for your help. Do you know of the what test I could you use ? I am a passionate amateur so I have some gap that I'm trying to fill.

    • @statisticsninja
      @statisticsninja Рік тому

      @@abdulbouraa4529 I would compare the preimputed column to postimputed column by comparing the histograms, and running a Kolmogorov-Smirnov test, ks.test()

    • @abdulbouraa4529
      @abdulbouraa4529 Рік тому

      @@statisticsninja Thank you very much !

  • @LyricalSerenade
    @LyricalSerenade Рік тому

    Where should I put this code?

  • @danielalcidesmaravibarrera2780

    I like your explanation. Im trying to use this in dataset with numerical and ordinal questions

  • @andresimi
    @andresimi Рік тому

    How do I get test information from multidimensional model?

    • @statisticsninja
      @statisticsninja Рік тому

      For a mirt model from the mirt package, the itemfit() and M2() functions extract test statistics. You could also use the @Fit slot from the mirt object directly.

    • @andresimi
      @andresimi Рік тому

      @@statisticsninja But is there a way to evaluate the Test Information Curves? Or some analogous function?

    • @statisticsninja
      @statisticsninja Рік тому

      @@andresimi Which package and function are you using to fit your model?

    • @andresimi
      @andresimi Рік тому

      @@statisticsninja I am using mirt package with the plot(type = "info") function. Right now, I split a multidmensional model into unidimensional ones so the curves are interpretable. I was wondering if this is ok, or if I have another way of doing this.

  • @katibsareeh9056
    @katibsareeh9056 Рік тому

    Hi. Thanks for this interesting video. Please, I performed Fisher's exact test on a 4*2 table in SPSS, and I got a significant difference (P= 0.010) and I wonder what is the post hoc test to use following that? is it the adjusted standardized residuals? and if I want to calculate the P value from each adjusted standardized residuals, how can I do that?

    • @statisticsninja
      @statisticsninja Рік тому

      You can look at 2x2 subtables and run hypothesis tests to identify which conditional distributions differ from the Fisher exact test null hypothesis. Consider using a p-value correction such as Bonferroni. If your variables have an independent-dependent variables relationship, you can run chi-squared test on pairs of conditional distributions. The standardized residuals will show which cells are most different from the Fisher null hypothesis.

  • @brianjing6319
    @brianjing6319 Рік тому

    Thank you again

  • @brianjing6319
    @brianjing6319 Рік тому

    Thank you

  • @JOANNROBLE
    @JOANNROBLE Рік тому

    can you show to us the data in csv file?

    • @statisticsninja
      @statisticsninja Рік тому

      I posted the .dat files in the shared files section of my website. You can load them the same way you would load a .txt file

  • @yuankaizhang1080
    @yuankaizhang1080 Рік тому

    Thanks!

  • @yuankaizhang1080
    @yuankaizhang1080 Рік тому

    Very helpful video! Could you please briefly explain how to get the factor score in BEFA? Thanks!

    • @statisticsninja
      @statisticsninja Рік тому

      If you set save.lvs = TRUE when you fit your model, blavaan::blavPredict() with type = "lvmeans" will give the factor score of your fitted data. I could not get blavaan::blavPredict() nor predict() to work with newdata. For new newdata, extract coefficient estimates and multiply the observed values by their coefficients

  • @RufastoSpiffyFrench
    @RufastoSpiffyFrench Рік тому

    Very helpful! Thank you very much for posting this!

  • @sophss25
    @sophss25 Рік тому

    Hello! I cannot seem to find the dove dataset on your website - is there any way I could find it elsewhere? Thanks kindly!

    • @statisticsninja
      @statisticsninja Рік тому

      The data set is on the book's website asdar-book.org The homepage has links to each chapter's data

    • @sophss25
      @sophss25 Рік тому

      @@statisticsninja Wow, thank you so much! I really appreciate the quick reply and your fabulous videos!

  • @mikodine
    @mikodine Рік тому

    Hey Aaron, love your videos, thanks so much for the content! Do you have a Github where we can find your code?

  • @zacharyadams3772
    @zacharyadams3772 Рік тому

    This man is trying so hard to make Survival analysis not sound morbid lol

  • @bridgettsmith7206
    @bridgettsmith7206 Рік тому

    Thanks

  • @bridgettsmith7206
    @bridgettsmith7206 Рік тому

    Thanks

  • @KaustubhAmle
    @KaustubhAmle Рік тому

    Could you give us your Jupyter notebook?

    • @statisticsninja
      @statisticsninja Рік тому

      I posted my R markdown file to the shared files section of my website.