This was extremely helpful!! Between my 3 econometrics textbooks (Griffiths, Greene, and Wooldridge), the information on MA models was sparse. This really cleared up the mindset behind this model!
Thank you so much for explaining this so well! My professor and textbook explain this concept very mathematically which is hard to understand for beginners, they should really give a simple example and then dive into the details as you did.
Gemini 1.5 Pro: This video is about moving average model in time series analysis. The speaker uses a cupcake example to explain the concept. The moving average model is a statistical method used to forecast future values based on past values. It is a technique commonly used in time series analysis. The basic idea of the moving average model is to take an average of the past observations. This average is then used as the forecast for the next period. There are different variations of moving average models, and the speaker introduces the concept with moving average one (MA1) model. In the video, a grad student is used as an example. The grad student needs to bring cupcakes to a professor's dinner party every month. The number of cupcakes the grad student should bring is the forecast. The professor is known to be crazy and will tell the grad student how many cupcakes he thinks were wrong each month. This is the error term. The moving average model is used to adjust the number of cupcakes the grad student brings based on the error term from the previous month. The coefficient is a weight given to the error term. In the example, the coefficient is 0.5, meaning the grad student will adjust the number of cupcakes he brings by half of the error term from the previous month. For example, if the grad student brings 10 cupcakes in the first month, and the professor says the grad student brought 2 too many, then the grad student will bring 9 cupcakes in the second month (10 cupcakes - 0.5*2 error term). The video shows how the moving average model works through a table and graph. The speaker also mentions that there are other variations of moving average models, such as moving average two (MA2) model, which would take into account the error terms from two previous months.
How is the average moving though? It was fixed for each prediction! Wouldn't it have to be recalculated each time for it to be moving? Also we didn't seem to use anything related to the error being normally distributed... is there a reason for that? why was it mentioned in the first place?
I still don't think this makes sense to me why is incorporating past error somehow gives us better prediction in the future in this case. Since this crazy professor will randomly choose an acceptable # of cupcakes, your past error shouldn't help in better predicting in the future.
Event though the professor selects a different number every time, at the end the average is stable. Assume you have a time series of images. Images, due to the unstable environment they're taken in or all other factors that manipulate images nature, are not always the same, although they are taken from the same scene. So, what is the goal here ?to find the mutual information in the images and ignore the noises. These noises are how crazy professor is , and the importance of error, which we can handle by its coefficient. By handling these factors, we can get close to recognising the mutual information. Remember, these are unsupervised models. There are no lable to rely on.
the idea is that you're trying to predict the next value. you get told what the next value is by the professor. if its random then there is no signal in there & the results are still meaningless
Great videos, thank you! I have a question. Period 1 value is our mean value but we don't know what is mean since we just started from point 0. How to calculate residual then? We know the true observation and we don't know the mean. Is it just a guess? But when we use any statistical package it does not ask us to input guess mean value.
How come some MA(1) formulas have x_t = mu + (phi1) error_t + (phi2) error_t-1..... If you predicting at time t then how would you know error at time t (error_t), why are some formulas like this?
Where does the noise in the equation come from? In our data we only have time on the x axis and Y as the target variable. There is no error term. What I mean to ask is does the MA model first regress y on y lag terms like the AR model and then calculate error between the actual and predicted y terms? Then regress y against the calculated error terms(residuals)?
The error is a white noise coming from random shocks whose distribution is iid~(0,1). Ftting the MA estimates is more complicated than it is in autoregressive models (AR models), because the lagged error terms are not observable. This means that iterative non-linear fitting procedures need to be used in place of linear least squares. Hope this helps :).
Thanks this is a really clear explanation. My only question is when you are calculating your f_t column, why are you including the error from the current time period? Shouldn't you only be including the 0.5*e-t-1?
what is the difference between taking the average of first 3 values and calculating the centered average at time period 2 and this method(average+error t+ error at previous time period)
Why in some models the prediction (f hat) is the average of the previous f values. But in some models, it is the error of the previous models that predict f hat.
This is a great explanation but in many equation they also add the current error (epsilon_t). I just don't get how are we supposed to know our current error if we are trying to forecast a value. Do we simply neglect that current equation for forecasting?
Hi... I have one doubt.. shouldn't you have plotted the values for ft^ instead of ft in the graph? P.S: Thank you for taking the time to make these videos. It's really helpful.
@@isabellaexeoulitze6544 yeah.. I kinda expected that since it's a old video.. nevertheless the commented my doubt, hoping that someone else watching the video might clarify...
Like he drew the ft line for showing that the time series data is kind of like centered around the mean , but even I have a doubt that why didn't he also draw predicted ft along with real ft
If a physics student is reading this, just wanna share my intution that this is exactly like a control system . whatever error our model is getting, it is moving to cover it , little bit like PI controller in Electrical engineering :) not sure if it clicks to anyone
I don't really get the model. Let's say I have a non-crazy professor, that always wants 8 cupcakes. My mean is 10, so by default I always bring 10. So in order: I bring 10, error = -2 I bring 8, error = 0 I bring 10, error = -2 I bring 8, error = 0 The model doesn't take into account that the error was based on the last base value, not the current. Wouldn't a good moving average mean I want to bring mean(f(x - y) for y in 0...YS), where YS is the order of the moving average? Then I always bring the perfect amount for non-crazy professors, and for crazy ones I just increase YS to something meaningful.
I really like your videos. They work very well for me, someone without any background in time series. However, this one is somewhat confusing. You are demonstrating the concept of *moving average* with an example where the average stays the same. I get that the estimate moves around, but that is due to the error variance, right? The average itself is not moving anywhere. Both mu and mu_epsilon are assumed to be constant, so what's moving here?
i wish my professor had explained it exactly like u just did
This was extremely helpful!! Between my 3 econometrics textbooks (Griffiths, Greene, and Wooldridge), the information on MA models was sparse. This really cleared up the mindset behind this model!
Thank you very much for making a vague concept so clear.
Never seen a better explanation of MA models. Immediate subscription!
Same here! I knew I would suscribe after 1 minute in the video. Very clear and very useful video. Thank you very much.
Oh damm!! this is wonderful, Simplified and explained pretty nicely. Keep spreading you knowledge!!
Thank you! Will do!
This was the best video on MA. The crazy prof made our life easier 😂😂😂
Thank you so much for explaining this so well! My professor and textbook explain this concept very mathematically which is hard to understand for beginners, they should really give a simple example and then dive into the details as you did.
Glad it helped!
I was stuck where is the “error" term coming from. Now I know... it is the error from the past. You explained! I wish you were my professor.
Wow! Great explanation. The professor´s example was very intuitive. Thanks for the content!
I really don't know how to thank you for that great demonstration! I've been trying to understand MA process for years!
Couldn't be expressed so handsomely! Thanks!
God Bless You! I needed a fast way to get some concepts on time series forecasting and you saved me.
Easy, Fast, Complete.
This men's explanation is way better than those profs at University.
You are spectacularly GOOD in the explanation of the ARIMA! Cheers
I appreciate that!
Thank you Sir. You have a great way of explaining things, something I sadly rarely find from my coding/statistics teachers.
Thank you so much for making this fun video! Makes so much more sense now (after struggling through my not-so-crazy professor's stats class)
Gemini 1.5 Pro: This video is about moving average model in time series analysis. The speaker uses a cupcake example to explain the concept.
The moving average model is a statistical method used to forecast future values based on past values. It is a technique commonly used in time series analysis.
The basic idea of the moving average model is to take an average of the past observations. This average is then used as the forecast for the next period. There are different variations of moving average models, and the speaker introduces the concept with moving average one (MA1) model.
In the video, a grad student is used as an example. The grad student needs to bring cupcakes to a professor's dinner party every month. The number of cupcakes the grad student should bring is the forecast. The professor is known to be crazy and will tell the grad student how many cupcakes he thinks were wrong each month. This is the error term.
The moving average model is used to adjust the number of cupcakes the grad student brings based on the error term from the previous month. The coefficient is a weight given to the error term. In the example, the coefficient is 0.5, meaning the grad student will adjust the number of cupcakes he brings by half of the error term from the previous month.
For example, if the grad student brings 10 cupcakes in the first month, and the professor says the grad student brought 2 too many, then the grad student will bring 9 cupcakes in the second month (10 cupcakes - 0.5*2 error term).
The video shows how the moving average model works through a table and graph. The speaker also mentions that there are other variations of moving average models, such as moving average two (MA2) model, which would take into account the error terms from two previous months.
Thanks for existing in this world bro.
So nice of you
Thank you so much for your very intelligent explanation to this model!!! i felt so confused about this model before.
So simple yet easy to understand. Thank you!
Great explanation! I've learned everything that I looked for. Thank you.
Finally ❤️ a video with an applicable and relevant example ❤️🙏
Simple Explanation is a Talent - Thanks for this
ALWAYS GRATEFUL, THANK YOU FOR THE WONDERFUL CONTENT
Simple and clear explanation, thank you !
a year trying to understand this, and I ve just needed 15 minutes thx!!
This explanation gives better understanding why do we need avoid unit root in Time Series predictions
OMG, this is brilliant , amazing ,wonderful ,thank you
How is the average moving though? It was fixed for each prediction! Wouldn't it have to be recalculated each time for it to be moving?
Also we didn't seem to use anything related to the error being normally distributed... is there a reason for that? why was it mentioned in the first place?
Exactly right, I am also having same query, Average not moving
Did you get any other source where this explained clearly
I was terrified for the mathematical symbols, but you made it so easy to understand! thank you!
Thank you so much, I have been reading this concept in an Econometric book...but this is easy to comprehend
Glad it was helpful!
Fantastic, got too caught up in the math in my macroeconometrics course and had no idea what these things actually were. Super helpful conceptually
Finally understood this, thank you so much. Highly recommend!
Hi, great explanation! One question, how do you guess the mu value (the average cupcake you bring) for the fist time?
Great explanation. Keep up the good work!
I still don't think this makes sense to me why is incorporating past error somehow gives us better prediction in the future in this case. Since this crazy professor will randomly choose an acceptable # of cupcakes, your past error shouldn't help in better predicting in the future.
I think the student naively believes the crazy professor will stick to his prior t-1 position (the student is unaware of the professor's craziness)
Everything in time series assumes that you can use past info to predict future info
Event though the professor selects a different number every time, at the end the average is stable. Assume you have a time series of images. Images, due to the unstable environment they're taken in or all other factors that manipulate images nature, are not always the same, although they are taken from the same scene. So, what is the goal here ?to find the mutual information in the images and ignore the noises. These noises are how crazy professor is , and the importance of error, which we can handle by its coefficient. By handling these factors, we can get close to recognising the mutual information. Remember, these are unsupervised models. There are no lable to rely on.
Explained with the Cup Cakes it makes perfect sense, thumbs up!
How do we know what the "error" is there is if there is no "true value" given a random realization of data.
the idea is that you're trying to predict the next value. you get told what the next value is by the professor. if its random then there is no signal in there & the results are still meaningless
Does MA model assume et (lagged residuals) are pure white noise ? Mean =0, constant variance , and no autocorrelation of residuals ?
Nice example super easy to understand the concept!
perfect explanation. Thank you!
Thanks man. You're doing a suberb job.
Thank you very much! Such a clear explanation!
I love this video, so simple but effective
Great videos, thank you! I have a question. Period 1 value is our mean value but we don't know what is mean since we just started from point 0. How to calculate residual then? We know the true observation and we don't know the mean. Is it just a guess? But when we use any statistical package it does not ask us to input guess mean value.
Manyt thanks for your clear explanation of the mathematical moving average formula
of course!
How come some MA(1) formulas have x_t = mu + (phi1) error_t + (phi2) error_t-1..... If you predicting at time t then how would you know error at time t (error_t), why are some formulas like this?
this is really helpful and so easy to understand!!!
Thank you for the video, how should we choose the 0.5 coefficient in front of the error term from last period in the regression model?
Great video! Thanks for sharing!
Great explanation! Third row shouldn't it be 9.5 rather than 10.5?
No, 10+1/2=10.5
@@wenzhang5879 Yeah, got it. Thanks
Thanks!!! Perfect explanation :)
Brilliant explanation, thank you!
Exceptionally useful videos for actuarial exams. Thanks for helping me pass🙂(hopefully)
So not natural.. it is why you are so good in teaching
Great video. I think the calculation of the 3rd row is wrong. It should've been 9+0.5 = 9.5
No.. Constant term is 10 not 9
Great video. Do you always start with the mean as your first guess for f hat? Also, how do you fit an MA(q) model?
Thank you. Love your video tutorials! Just one question: shouldn't the curve at 5'58'' be f_t? And c(10,9,10.5,10,11) be f_(t-1)?
Amazing explanation man
how do we find the coefficient for the moving average model?
Algorithms use the entire time series to get as close as possible to the true value of the coefficient (often with a maximum likelihood estimator).
Extremely well explained
Had I watched your series earlier would have saved me $3000 :(
Hello, thanks for this video, but i Wonder about \theta_0. Could it be something different than 1?
Great Presentation...
Glad you liked it!
Really good explaination!
Maybe I'm stupid for asking this...
If one was to write an MA filter, how do you determine M?
Where does the noise in the equation come from? In our data we only have time on the x axis and Y as the target variable. There is no error term. What I mean to ask is does the MA model first regress y on y lag terms like the AR model and then calculate error between the actual and predicted y terms? Then regress y against the calculated error terms(residuals)?
The error is a white noise coming from random shocks whose distribution is iid~(0,1). Ftting the MA estimates is more complicated than it is in autoregressive models (AR models), because the lagged error terms are not observable. This means that iterative non-linear fitting procedures need to be used in place of linear least squares. Hope this helps :).
Thanks this is a really clear explanation. My only question is when you are calculating your f_t column, why are you including the error from the current time period? Shouldn't you only be including the 0.5*e-t-1?
does miu have to be a constant? can we use a rolling window to calculate the average? will this yield better predictions?
LOVE IT. Thank you.
Of course!
Thanks you so much.
Greatly explain!!! Thanks
what is the difference between taking the average of first 3 values and calculating the centered average at time period 2 and this method(average+error t+ error at previous time period)
What you are describing is MA smoothing, which is used to describe the trend-cycle of past data
God Bless you.
Excellent explanation
Why in some models the prediction (f hat) is the average of the previous f values. But in some models, it is the error of the previous models that predict f hat.
I have the same doubt, sometimes he added the half of the error to f ,and sometime to f-hat
Let's use an example that is sligtly more natural to us -- so here's this crazy professor. :D
This is a great explanation but in many equation they also add the current error (epsilon_t). I just don't get how are we supposed to know our current error if we are trying to forecast a value. Do we simply neglect that current equation for forecasting?
Hi. The mean of et is not 0. For time interval 5, you need to write -1.
What does it mean when the MA(1) estimated parameter = 1? For AR(1) that would mean there's a unit root. Any particular corollary for MA models?
you are just amazing
Fantastic!
Are the mean 0 and SD 1 of error_t assumptions?
Sir please make videos on restricted Boltzmann machine
Amazing explaination
How can I use such a model for forecasting?? I can forecast for one day into the future but how about 2 or more days into the future?
Hi... I have one doubt.. shouldn't you have plotted the values for ft^ instead of ft in the graph?
P.S: Thank you for taking the time to make these videos. It's really helpful.
I was about to ask the same thing but I don't think the instructor responds to questions.
@@isabellaexeoulitze6544 yeah.. I kinda expected that since it's a old video.. nevertheless the commented my doubt, hoping that someone else watching the video might clarify...
Like he drew the ft line for showing that the time series data is kind of like centered around the mean , but even I have a doubt that why didn't he also draw predicted ft along with real ft
thanks! Really helpful
Should it be 9.5 instead of 10.5?
Hey amazing Content Bravo !
Can you add to that a video talking about random walk ?
That would be great .
Perfect!
How do you find the error terms for last time period in real world uni series?
If a physics student is reading this, just wanna share my intution that this is exactly like a control system . whatever error our model is getting, it is moving to cover it , little bit like PI controller in Electrical engineering :) not sure if it clicks to anyone
Or a thermostat.
How is mean determined?
BTW, it was a great video! Thanks a lot!
I don't really get the model.
Let's say I have a non-crazy professor, that always wants 8 cupcakes. My mean is 10, so by default I always bring 10. So in order:
I bring 10, error = -2
I bring 8, error = 0
I bring 10, error = -2
I bring 8, error = 0
The model doesn't take into account that the error was based on the last base value, not the current. Wouldn't a good moving average mean I want to bring
mean(f(x - y) for y in 0...YS), where YS is the order of the moving average? Then I always bring the perfect amount for non-crazy professors, and for crazy ones I just increase YS to something meaningful.
I really like your videos. They work very well for me, someone without any background in time series. However, this one is somewhat confusing. You are demonstrating the concept of *moving average* with an example where the average stays the same. I get that the estimate moves around, but that is due to the error variance, right? The average itself is not moving anywhere. Both mu and mu_epsilon are assumed to be constant, so what's moving here?
damn u a real one for this
Wonderful example.
thanks!
THANK YOU SO MUCH
AR is also centered around its average, but its not called moving average