Seasonality and Trend Forecasting using Multiple Linear Regression with Dummy Variables as Seasons

Поділитися
Вставка

КОМЕНТАРІ • 82

  • @XXBASSOON1STXX
    @XXBASSOON1STXX 2 роки тому +3

    You're an absolute life saver! I would have never even gotten close to completing my assignment without this guide!

  • @turkanseymurqizihasanzade6608

    Than you so so much! I had no idea how I was going to complete my assignment. This video saved my life.!

  • @tsibir3800
    @tsibir3800 Рік тому +1

    WELL DONE Mustafa! Amazing Work! Your Videos Helped me very much with my job as Invetory and a Buyer manager!! We are able to predict the futere to our inventory preety good! Thank's a lot. Love from Greece

  • @emilk4743
    @emilk4743 Рік тому

    thank you so much! had an assignment for this and couldn't wrap my head around it.

  • @warne232000
    @warne232000 3 роки тому +5

    Super helpful! Thank you 😊 Could you please take us through the regression statistics and how to interpret them? Adjusted R2, coefficients, Anova table etc. Thanks!!

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      I already posted four videos on regression analysis. Please see my video list.

  • @marthacastrejon9688
    @marthacastrejon9688 3 роки тому +1

    Thank you so much for this. It was so helpful with a homework assignment.

  • @jthie6802
    @jthie6802 2 роки тому +1

    Thank you, this video was really helpful for my assigment

  • @YAlife721
    @YAlife721 3 роки тому +2

    I have questions.. can u make on inventory forecasting and budget planning

  • @adirayhan6379
    @adirayhan6379 2 роки тому

    Helpful. One question though! what if some of the coefficients are negative?

  • @fxcksalt1
    @fxcksalt1 3 роки тому +1

    Thank you! Subscribed

  • @LovinduWijesinghe
    @LovinduWijesinghe 3 роки тому +2

    Great Video..!

  • @ganeshn3800
    @ganeshn3800 Рік тому

    Hi Mustfa, thank you. How to do a forecasting when you have commitment months and Lead time. How to predict delays based on this? I don't have order date. Just have commit months, LT and order was on time Or delayed column. How to forecast delays based on LT?

  • @mohiuddinshojib2647
    @mohiuddinshojib2647 2 роки тому

    Hello Mustafa, Can you explain why you don't copy the December, while coping the month and make it transpose ?

  • @unseen1231
    @unseen1231 4 роки тому +2

    WOW! Thanks a bunch

  • @yaramostafa6319
    @yaramostafa6319 7 місяців тому

    Hello, does it matter which month we choose as baseline? Does it affect the results? Also, I dont get why you chose the average of the month and not the total demand per month. Also, for November the p-value is greater than 0.05, does it mean that November doesn't have a significant effect on December baseline? Do we need to remove it from the variables then? Because in other forms of variables, the insignificant variable on the "y" is usually removed. Thanks!

    • @mcanbolat
      @mcanbolat  7 місяців тому +1

      Hi, 1. No, any month could be a baseline. 2. Average is used because some months have more occurences than others. 3. All of those month dummy variables are part of a single variable. You don’t remove them. It means that November sales are not significantly different from December sales.

    • @yaramostafa6319
      @yaramostafa6319 7 місяців тому

      Thank you so much! Well explained.@@mcanbolat

  • @juanpablohorn6642
    @juanpablohorn6642 3 роки тому +1

    Hi Mustafa! I've watched your video and I found it very interesting. I have a question. Why do we not use December as a dummy variable? You said it is a baseline. What exactly do you mean by this? Since you are basically ignoring all the values for December arent you?

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      Hi Juan, it is the way that dummy coding works here. We are using the December values but December is controlled by the intercept. If the month is December, the estimated mean of the first December is the intercept.

    • @juanpablohorn6642
      @juanpablohorn6642 3 роки тому

      @@mcanbolat so, the values for slope for December are the same as the intercept?

    • @mcanbolat
      @mcanbolat  3 роки тому

      No, there is no slope for December. The mean estimated value for December is Intercept + Coefficient of t x t

    • @hoaphanthi5267
      @hoaphanthi5267 3 роки тому +1

      1, When all of the month index = 0, the month must be December. It didn't disappear, it was just shown in a different way. Hence, the teacher didn't write it in the table.
      2, December_value = intercept + error. We always have error values when we estimate.
      I think so. If anyone has any other ideas, feel free to respond to me. Thank you.
      P/s: My English is not good, I hope you can understand my explanation.

  • @brianeasley9343
    @brianeasley9343 4 роки тому +1

    Fantastic video, thank you!

  • @patriario9379
    @patriario9379 4 роки тому +1

    Hello Mustafa, this is a great video. It's very helpful for my works. So thank you so much. However, I would like to ask you, why you don't include Dec on your dummy variable? if my data start from Jan, did i must to not include Jan for my dummy variable too or not? what happen if I include Jan as dummy variable? really appreciate your advice

    • @mcanbolat
      @mcanbolat  4 роки тому +4

      Thanks. Dec is already included in the model as the baseline. If you want to forecast for December, each of the other dummies will be equal to zero. You can choose any month as your baseline, you just need to use (n-1) dummy variables if there are (n) categories.

  • @danelleduardorapozoramirez1030
    @danelleduardorapozoramirez1030 3 роки тому +1

    Hi Mustafa. How would you forecast sales when there was an extreme value in the historical data and it's unlikely for that event to happen again. For example, sales completely stopped (went to 0) in april due to government lockdown because of covid. We believe that for next year that won't happen again. I think maybe that changing the real value of April (0) equal to march's vaule is a reasonable approach. Thanks in advance, great video
    Edit: what do you think about excel's forecast function using exponential triple smoothing? Is this a similar method?

    • @mcanbolat
      @mcanbolat  3 роки тому +3

      Hi Danell, you can put the previous year’s April value or the value of March as you suggested. You can also use the median or mean sales value. Excel’s forecasting tool is using a different approach and it may or may not be superior to the model here for different datasets. You need to check and compare the error measures. The advantage of constructing your own model is that you don’t use a black box approach like Excel’s and you may be able to generate further insights on your dataset.

    • @danelleduardorapozoramirez1030
      @danelleduardorapozoramirez1030 3 роки тому

      @@mcanbolat great. Thanks

    • @danelleduardorapozoramirez1030
      @danelleduardorapozoramirez1030 3 роки тому

      @@mcanbolat another question. If we were to make a risk analysis using this forecast (and i say this because the projection here is made based on how the number of corolla searches changes over time), how would you calculate the probability that the corrolla searches fall between a range? From the graph alone i can tell that the searches somehow fall into a normal distribution.

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      You then just create a frequency chart-histogram for all values ignoring the time periods. The histogram will give you an idea about the probabilities for certain ranges.

    • @manhtuannguyen7375
      @manhtuannguyen7375 3 роки тому

      Hi @@mcanbolat, regarding Danell's question about extreme value during Covid. Can we add a factor called pandemic in the model and use binary coding (1 if the month is in Covid time and 0 if not)? I assume it might help explain the jump or dip of sales or consumption during the pandemic but not sure if there is any flaw with that approach

  • @Deniz-si8qu
    @Deniz-si8qu 3 роки тому +1

    teşekkürler, çok yararlı oldu.

  • @arielaweiner2338
    @arielaweiner2338 2 роки тому

    Hey, why did you use per month averages rather than the sum? If I have daily sales data and want to forecast demand, should I be looking at the average monthly sales or sum of sales in each month?

    • @mcanbolat
      @mcanbolat  2 роки тому

      You will use the sum of the sales in each month but you will average the sums (assuming you have multiple years) to find your seasonality indices.

  • @StockSpotlightPodcast
    @StockSpotlightPodcast 3 роки тому +1

    Great job! Love the MMULT function I haven’t thought of that. Also, I’m surprised you didn’t remove the Nov variable due to its high p-value and re-run the regression analysis. However, then you wouldn’t be able to forecast November. What do you recommend in such situations?

    • @mcanbolat
      @mcanbolat  3 роки тому +3

      When a categorical variable is used, you cannot remove the categories just because they have a high p-value. They are the part of the original variable, they are not variables themselves.The meaning of that p-value is quite different here. It means that the category which has a high p-value is not significantly different from the baseline category. So November sales here are not significantly different from December sales. Think it as an ANOVA model.

    • @StockSpotlightPodcast
      @StockSpotlightPodcast 3 роки тому +1

      @@mcanbolat Thank you this is a great lesson to takeaway for future regressions!

  • @dalalbackups5572
    @dalalbackups5572 2 роки тому

    Thank you for share this !!
    I have a question, if I have a data for one year can I find the seasonality index? Or can I find the trend ?

    • @mcanbolat
      @mcanbolat  2 роки тому

      I would not recommend using it for seasonality as you will have only one data value per season (month). But you can make your seasons as days instead of months if there is a daily seasonal pattern. You can also just look at the overall trend.

  • @pinkaura6983
    @pinkaura6983 3 роки тому +1

    Perfect!

  • @dayamathishetty7225
    @dayamathishetty7225 3 роки тому +1

    Sir how to use it for large number of materials ,company where I'm working has more than 5000 spares to forecast and analyse please help if u can . Thank u.

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      If all those parts have some kind of a monthly seasonality then you can implement the following method: ua-cam.com/video/-WK0Db7hY2U/v-deo.html But the best way is to learn more about different forecasting methods and move to R or Python.

    • @dayamathishetty7225
      @dayamathishetty7225 3 роки тому

      @@mcanbolat Thank u for ur help , Yea I'm planning to learn python till then was in need of some method to forecast n analyse.

  • @littlecreator4838
    @littlecreator4838 2 роки тому

    Hi Mustafa. When there's a seasonality, we usually use triple exponential method.how is this different and what advantages does this have over triple exponential

    • @mcanbolat
      @mcanbolat  2 роки тому +2

      Holt-Winters (the triple exponential smoothing) may or may not be better than this method and it really depends on the dataset. The regression approach gives you a better understanding of average differences between seasons and it also quantifies the trend. I personally prefer the regression approach because of its interpretability.

  • @almota7
    @almota7 5 років тому +1

    Really nice! Well done.

  • @TheRohit9463
    @TheRohit9463 3 роки тому +1

    superb

  • @karimsebak5960
    @karimsebak5960 4 роки тому

    Hi Mustafa much thx for the great work it really worked fine with me, i have a question is this model can work with any type of analysis? ie, if there is no seasonality or non cyclical trends...etc?

    • @mcanbolat
      @mcanbolat  4 роки тому +2

      Thanks. For data with no seasonality or trend, you can use models that are suitable for stationary data. These include moving averages and exponential smoothing. I have videos covering them.

  • @IAKhan-km4ph
    @IAKhan-km4ph Рік тому +1

    VERY NICE. i NEED ITS MATHS EQUATION

  • @manishbhaskar2917
    @manishbhaskar2917 5 років тому

    Superb Explanation!

  • @xinyuetang9332
    @xinyuetang9332 3 роки тому

    Hello Mustafa, I don’t understand why you see Dec as baseline and choose not to put Dec as a variable. Could you please explain it in more detail?

    • @mcanbolat
      @mcanbolat  3 роки тому

      This is a typical way of modeling categorical variables. Think it in this way: if you have a gender variable you would not need to create two columns one for female and one for male. You will use a single column with 1 being female or male and 0 otherwise. If you code 1 as female, the intercept value will give you the mean for male. See also: stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-dummy-coding/

    • @xinyuetang9332
      @xinyuetang9332 3 роки тому +1

      @@mcanbolat Thank you so much, it’s very helpful!!

  • @estellanyimba4729
    @estellanyimba4729 2 роки тому

    Hi Dr. I noticed the November coefficient has an insignificant p value. What does this mean? Can we remove November from our data on this basis?

    • @mcanbolat
      @mcanbolat  2 роки тому

      Hi, it means that November is not significantly different from the baseline month (the month that we did not have a coefficient for). You won’t remove it as it is part of one variable (month).

    • @estellanyimba4729
      @estellanyimba4729 2 роки тому

      @@mcanbolat Thank you. So if i got you, we keep it in the equation even if the coefficient is insignificant? what would be the implication of dropping it as one of the variables

  • @robialmaynaufalmahdy5420
    @robialmaynaufalmahdy5420 4 роки тому

    Nice work!
    I have a question about time series with multiple variable. What i read from college book about time series, they introduce you to use exactly like you did (using a column of t as period, also other as dummy variables). But a lot of sources online says that it is incorrect to use multiple regression method as it doesn't count time component (or so they say, from my understanding that column t acts as time component). Also it seems that many sources directed me to use vector autoregression, or box jenkin, which are above my capability for now.
    Does those VAR and box jenkin superior to multiple regression method, maybe in term of forecast accuracy? Let me know what you think. Thank you, sorry for the long post.

    • @mcanbolat
      @mcanbolat  4 роки тому +1

      The time component is taken into consideration here with t being an independent variable. We cannot say one method is superior to another one as the performance really depends on the dataset you are using. I think for most cases MLR provides clear and practical results.

    • @robialmaynaufalmahdy5420
      @robialmaynaufalmahdy5420 4 роки тому

      How about adding another time series data that might be causing demand to fluctuate, for example foreign exchange rate or other economic data? How do i fit those that into this method?
      And can i use seasonality by including monthly period index (1 to 12 for first to last month and reapeating for the new year, sequential just like period t) as the independent variable?

    • @mcanbolat
      @mcanbolat  4 роки тому

      You can add other variables too. It would not be problem.
      It would be a mistake using months as numbers from one to twelve. You cannot assume a linear change in demand when you increase the month value.

  • @littlecreator4838
    @littlecreator4838 2 роки тому

    Sir would you also let me know,how do I adjust my forecast in case of positive bias. I am looking for the formula pls

    • @mcanbolat
      @mcanbolat  2 роки тому +1

      Regression based models should provide unbiased estimations. If you are using another model, you can find the average bias (average of all forecast errors) and then simply subtract it from each forecast value.

    • @littlecreator4838
      @littlecreator4838 2 роки тому

      Thank you for the the explanation. It was very useful. Sir it would also be very beneficial if you make videos on exploring different time series models in Python pls.

  • @yasserfatehy
    @yasserfatehy 4 роки тому +2

    Thank u

  • @Ruben-mx8gk
    @Ruben-mx8gk 3 роки тому

    Hello, how can I forecast seasonality and trend with daily data. I'm working on forecasting daily deposits and want to know day by day the forecast

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      Your seasons will be days instead of months. You cannot use the regression model in Excel because there will be 364 dummies and Excel won’t let you use more than I believe 18 variables. An alternative could be using R or check my other video on seasonality and trend forecasting model.

  • @MohammedHawat
    @MohammedHawat 5 років тому +1

    thank you, very nice

  • @jimmybrooks9812
    @jimmybrooks9812 5 років тому

    Thank you for the help!

  • @andresvega3892
    @andresvega3892 5 років тому

    When I try to do the regression Excel shows an alert saying that the model contains non.numeric data. When I change the text to numbers it then says that X Range and Y Range must have the same number of rows (sample values), regardless of labels. What am I doing wrong?

    • @mcanbolat
      @mcanbolat  5 років тому

      If you are including the titles (the first row) make sure that you check the box "Labels".

    • @Templarium
      @Templarium 4 роки тому

      I'm getting the same error. I did check 'labels'. Did you figure out how to fix it?

    • @Templarium
      @Templarium 4 роки тому +1

      I figured out what the problem is. For the Y range the video makes it seem you're only selecting the top row with the months but you're actually selecting that and the whole dummy variable matrix. I did that and the problem was fixed.

  • @davidlangridge2246
    @davidlangridge2246 3 роки тому +1

    Do you not encounter problems of multicolinearity by not checking the box that runs constant = 0?

    • @mcanbolat
      @mcanbolat  3 роки тому +1

      Good question. Because we use (12-1) dummy variables to code 12 months, we do not need to remove the intercept. If we had 12 dummies we would have to remove the intercept.

    • @davidlangridge2246
      @davidlangridge2246 3 роки тому +1

      Dear @@mcanbolat , thank you your response, is much appreciated. So do we interpret the coefficients as deviations from the average values of the period of focus i.e. the month that you do not create a dummy for?

    • @mcanbolat
      @mcanbolat  3 роки тому +2

      Yes, but only if their p-values are significant.

  • @gabrielrochasouza8796
    @gabrielrochasouza8796 2 роки тому

    Thanks for the video Dr. Mustafa. I tried using the MMULT formula like you showed here, but receive a 0 amount for my forecast. Do you know what might be causing that to happen?
    Thank you

    • @mcanbolat
      @mcanbolat  2 роки тому

      I am not sure what the issue is but first make sure that you use Ctrl+Shift+Enter for the MMULT formula. If it still does not work, you can put both arrays in the same way (both horizontal) and use SUMPRODUCT(), or you can multiply them one by one :)

  • @ENSubscriptions
    @ENSubscriptions Рік тому +1

    That work is fantastic Dr. Mustafa,!! It was quite helpful.
    In one of your videos (Seasonality and Trend Forecasting), you forecasted using (Intercept & Slope functions) for a similar data set (ua-cam.com/video/RoKGvvMJAME/v-deo.html). However, when I applied both methods-(Intercept & Slope functions) and (The regression analysis tool as shown in this video-I got different results (the calculated intercept value is slightly different).
    My question is why the results are different, and which one should we use?
    Note: I applied both methods on the same dataset (Corolla).

    • @mcanbolat
      @mcanbolat  Рік тому

      If it is a slight difference that is okay. You may have not included the first data point, or other minor errors.