Multiple Linear Regression with R | 2. Data Preparation

Поділитися
Вставка
  • Опубліковано 2 лис 2024

КОМЕНТАРІ • 45

  • @faizahmedmughal9550
    @faizahmedmughal9550 4 роки тому +1

    I have learnt a lot from your videos. Thank you

    • @bkrai
      @bkrai  4 роки тому

      You are welcome!

  • @stelluspereira
    @stelluspereira 4 роки тому +1

    Thankyou Dr Rai

    • @bkrai
      @bkrai  4 роки тому

      Welcome!

  • @ArpitSingh-dz7gt
    @ArpitSingh-dz7gt 4 роки тому +1

    I got to know something more about data cleansing, thank you sir!!

    • @bkrai
      @bkrai  4 роки тому

      Welcome!

  • @nicolassimon9340
    @nicolassimon9340 4 роки тому +1

    Hi,
    Many thanks, very clear as usual.
    I have one suggestion about replacing by the mean:
    vehicle$lh[vehicle$lh==0]

    • @bkrai
      @bkrai  4 роки тому

      That's even better!

  • @mohamedabdullah9061
    @mohamedabdullah9061 4 роки тому +1

    thank u sir .plz put videos for multiclassfication

    • @bkrai
      @bkrai  4 роки тому +1

      You can refer to this:
      ua-cam.com/play/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG.html

  • @emmanuelobeng4931
    @emmanuelobeng4931 4 роки тому +1

    Thank you very much but really need your assistance

    • @bkrai
      @bkrai  4 роки тому

      Let me know your question.

  • @smsm314
    @smsm314 4 роки тому +1

    Great work. Congratulations.

    • @bkrai
      @bkrai  4 роки тому

      Thank you! Cheers!

  • @mrishojohn7551
    @mrishojohn7551 4 роки тому +1

    Thanks a lots Dr, you have made R language very easy to read for me. I have question D. I analysed income data which showed positive skew then I didn't apply normally distributed to fit the data. So what can I do to fit the distribution because I want to have interval estimate of the population income using sample data. I think exponential distribution is suitable to fit the data but how to fit using R?
    Thanks Dr

    • @bkrai
      @bkrai  4 роки тому

      You can try log transformation. I would also suggest try this:
      ua-cam.com/video/_3xMSbIde2I/v-deo.html

  • @irsadahamad4760
    @irsadahamad4760 4 роки тому +1

    Thanks Sir

    • @bkrai
      @bkrai  4 роки тому

      Welcome

  • @pshani3512
    @pshani3512 4 роки тому +1

    Sir, your videos have been very helpful for self-learning R. Always very clear. Thank you so much!
    Could you please tell whether there is a method to analyze and interpret how well our model works with testing data? Can we compare the means of the outcome derived from the model, with original outcome data in the testing data, using t-test?

    • @bkrai
      @bkrai  4 роки тому +1

      You can make a plot of actual and predicted values with test data. And obtain R-sq.

    • @pshani3512
      @pshani3512 4 роки тому +1

      @@bkrai Thank you very much, Sir!
      Is there a cut-off of R-sq value which is required to have a good agreement? I have read "R-sq value

    • @bkrai
      @bkrai  4 роки тому +1

      Instead of cutoff, you can use it as a benchmark. Let's say you run a model and get R-sq 0.65. And then you make changes to the model and get r-sq of 0.74. So now you will know that changes to the model are yielding positive outcome.

    • @pshani3512
      @pshani3512 4 роки тому +1

      @@bkrai Thank you very much Sir....!

    • @bkrai
      @bkrai  4 роки тому

      Welcome!

  • @darktemplar2827
    @darktemplar2827 4 роки тому +1

    Great work Dr. Do you mind putting together some videos on how to analyze Liberty scale data in R? Thanks in advance

    • @darktemplar2827
      @darktemplar2827 4 роки тому +1

      I meant Likert scale data, sorry about that

    • @bkrai
      @bkrai  4 роки тому +1

      Great suggestion! I've added it to my list.

  • @shinerajukappil6295
    @shinerajukappil6295 4 роки тому +1

    Can we do the same procedure of multiple linear regression for timeseries data to find the factors affecting a dependent variable. I have converted the whole raw data into its differences I.ie. Present value minus past. I have done this to remove autocorrelation that occur in time series. Now model variables will be
    Change in production - dependent variable
    Change in rainfall - independent variable
    Change in temperature- independent variable
    Change in area - independent variable
    For 17 years. Ami following right track. I have followed the same way in R for creating the multiple regression model for cross sectional data

    • @bkrai
      @bkrai  4 роки тому

      You need time-series with regressors:
      ua-cam.com/play/PL34t5iLfZdduRvHafEKM6vrDmfnlUfzAy.html

  • @shuvhamdigitalacademy3228
    @shuvhamdigitalacademy3228 3 роки тому +1

    Sir, In Multiple Regression Model, Do we have to consider only the significant independent variables and then do other tests like BP, DW,ad.test, BG,VIF etc for the Linear model to be good or we need to include all the variable both significant and insignificant variables for the further process?
    Please help.❤️

    • @bkrai
      @bkrai  3 роки тому +1

      I would suggest check for multicollinearity before removing non-significant variables.

    • @shuvhamdigitalacademy3228
      @shuvhamdigitalacademy3228 3 роки тому +1

      @@bkrai Thank you so much ❤️.

  • @84quaker
    @84quaker 4 роки тому +1

    Thanks for your work!

    • @bkrai
      @bkrai  4 роки тому

      Welcome!

  • @geetakhatri6049
    @geetakhatri6049 4 роки тому +1

    Great job! Thanks!

    • @bkrai
      @bkrai  4 роки тому

      Welcome!

  • @nishadseeraj7034
    @nishadseeraj7034 4 роки тому +1

    Thank you for these videos, I really benefit from them. Can I ask a question? I was going through an example on kaggle and the author used the dummyVars function. Do you think you can explain how it works when applied to a dataset? Again I really appreciate these lessons

    • @bkrai
      @bkrai  4 роки тому

      Thanks! Do you remember what method they were using?

    • @nishadseeraj7034
      @nishadseeraj7034 4 роки тому +1

      @@bkrai I'm sorry I should've included the link to the sample code in my initial question: www.kaggle.com/virosky/the-only-way-to-handle-missing-values/notebook
      I am not too sure what the function does when applied to a dataframe as done in the example I am referring too. The piece of code using the dummyVars function is towards the end of the "exploratory data analysis" section after opening the link I provided. Thank you for the reply.

    • @bkrai
      @bkrai  4 роки тому +1

      They have used xgboost. It is one of the must know methods in top 10 link below:
      ua-cam.com/play/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O.html

    • @nishadseeraj7034
      @nishadseeraj7034 4 роки тому

      @@bkrai Thank you very much Sir

  • @lalithalalitha3178
    @lalithalalitha3178 4 роки тому +1

    Can We get R files Sir!

    • @bkrai
      @bkrai  4 роки тому

      Added a link in the description.

  • @ebentee
    @ebentee 4 роки тому +1

    Whoever read this you'll be successful one day, let's help grow this channel together for the future🤑❤

    • @bkrai
      @bkrai  4 роки тому

      Thanks for your comments!