Python ML #08: Sales Forecast Tutorial with Linear Regression Model

Поділитися
Вставка
  • Опубліковано 27 сер 2022
  • In this machine learning tutorial, you will learn how to forecast sales and compare actual and forecasted sales using different metrics such as mean squared error, mean absolute error and R2 score using Linear Regression model.
    We are going to use sales data from different stores from 2013 to 2017
    [ items sold per day ].
    **Google Collab** is being used in this tutorial instead of VS Code.
    ✨Download the dataset file : github.com/BekBrace/Sales-For...
    ✨ GitHub Repo: github.com/BekBrace/Sales-For...
    🔗 Social Media
    --------------------------
    Facebook : / bekbrace​​​​
    Twitter : / bekbrace
    Instagram : / bek_brace
    Tech Blog : ttps://dev.to/bekbrace​​​​
    GitHub profile : github.com/BekBrace​​​
    Website : bekbrace.com
    Join this channel to get access to perks:
    / @bekbrace
  • Наука та технологія

КОМЕНТАРІ • 135

  • @BekBrace
    @BekBrace  Рік тому +7

    The CSV file is available for you guys to download : github.com/BekBrace/Sales-Forecast-data-csv
    The Repo Link: github.com/BekBrace/Sales-Forecast/tree/main

  • @JanBaucke
    @JanBaucke Рік тому +5

    This is great! The methodology I was searching for to solve my problem. Thanks a lot!

  • @paulocarneiro4947
    @paulocarneiro4947 Рік тому +2

    Thank you, Bek for this nice job, helping others!

    • @BekBrace
      @BekBrace  Рік тому +1

      Thank you for your support, Paulo 🙏

  • @danrobert7241
    @danrobert7241 7 місяців тому +11

    This is awesome! Just one quick note that you've probably caught already but around 22 minute mark you mistake the total for Feb 2013 for January. In fact January 2013's data was dropped after you created the sales_diff column and then dropped null values in the next line. January 2013 would have been null because it was the first row in the data (there would have been no value to call the difference). Anyway, not a big deal just wanted to point that out in case it tripped anyone up. Also, at 22:30 I think you meant to plot monthly_sales['sales_diff'] but you actually just re-plotted monthly sales. Regardless, still a great tutorial for figuring out the correct syntax.

    • @BekBrace
      @BekBrace  2 місяці тому +2

      Thank you for the kind words and for pointing that out! You're absolutely right-January 2013's data was dropped after creating the sales_diff column and dropping null values because it was the first row with no previous value to calculate the difference. Also, good catch on the plotting mistake at 22:30; it should have been monthly_sales['sales_diff'] instead of re-plotting monthly_sales. Appreciate the feedback!

  • @snowturtle319
    @snowturtle319 9 місяців тому +1

    Awesome tutorial ! Keep the videos coming !
    I wanted to ask is these algorithms work on qualitative data for exemple what if in "store" , it's not "1" but "Amazon" for exemple ?
    Thanks !

    • @BekBrace
      @BekBrace  9 місяців тому +1

      Thank you very much.
      I am really not sure, if you would want to tweak in your code or not, I will find out and let you know

  • @afolaluhelenbosede4028
    @afolaluhelenbosede4028 Рік тому +8

    Awesome tutorial Bek.
    Just to point out, the second plot still showed the sales trend and not sales_diff. I could notice cos I was looking out for a change in the increasing trend after you performed the sales difference.

    • @BekBrace
      @BekBrace  Рік тому +2

      Heyyy ! Oh that must be an error from my side , sorry for that !

    • @nwabuezeprecious457
      @nwabuezeprecious457 Рік тому

      Hi bek was the ratio of your train set and test set approximately 66:34

    • @PaulDenman
      @PaulDenman Рік тому +1

      Yup, I changed the code to:
      # Visualisation
      plt.figure(figsize=(15,5))
      plt.bar(monthly_sales['date'], monthly_sales['sales_diff'], width=12)
      plt.xlabel("Date")
      plt.ylabel("Sales")
      plt.title("Monthly Customer Sales Difference")
      plt.show()
      Notice how I changed the plot type to bar also; I prefer that vis :)

    • @PaulDenman
      @PaulDenman Рік тому

      And the Actual vs Predicted to:
      # Vis the predictions vs the actual sales
      plt.figure(figsize=(15,5))
      # Actual Sales:
      plt.bar(monthly_sales['date'], monthly_sales['sales'], width=10)
      # Actual Sales:
      plt.plot(predict_df['date'], predict_df['Linear Prediction'], color='red')
      plt.xlabel("Date")
      plt.ylabel("Sales")
      plt.title("Predictions vs Actual")
      plt.show()

    • @PaulDenman
      @PaulDenman Рік тому

      Great vid @BekBrace 😀

  • @svetlana9699
    @svetlana9699 Рік тому +1

    Thanks Bek! 🔥

    • @BekBrace
      @BekBrace  Рік тому

      Thanks a lot for the support ☺️

  • @santiagomendez1787
    @santiagomendez1787 2 місяці тому

    Bek Thank you for the tutorial. i have a one question. why you take 13 rows for the actual sales and not 12?

  • @YMikay777
    @YMikay777 Рік тому

    Nice sample, but if I need to make predictions to 01-2019, 02-2019 … What I need to change?

  • @fidelinojuls
    @fidelinojuls Рік тому

    What changes should I imply to predict for the next 3 years?

  • @DevBishwasBh
    @DevBishwasBh Рік тому +2

    Thanks a lot Bek, I saw your video on React and Fast API (FARM Stack) in freeCodeCamp, thanks a lot for that video. I am here to request you a video on Next Js and Fast API authentication. I am really waiting for your video and reply on this topic. Have a great day :)

    • @BekBrace
      @BekBrace  Рік тому +1

      Thank you very much for you kind words.
      Your request is taken in consideration :)

    • @DevBishwasBh
      @DevBishwasBh Рік тому +1

      @@BekBrace I'll be waiting.

  • @asmafathima7115
    @asmafathima7115 Рік тому

    Excellent tutorial! I have a couple of questions. In the graphs you presented, "Monthly Customer Sales" and "Monthly Customer Sales Difference," they appear to be identical. Shouldn't the second graph include the "Sales Difference" column instead of "Sales" on the y-axis? I apologize for the confusion, but I would greatly appreciate it if you could clarify this.

    • @BekBrace
      @BekBrace  Рік тому

      Hi, thank you so much for watching :) - Yes, and that was a mistake from my side

  • @senzaura6786
    @senzaura6786 7 місяців тому +1

    Bek, I wanna ask, did you drop the 'date' and 'sales' when you make supervised _data?

    • @BekBrace
      @BekBrace  2 місяці тому

      Yes, the 'date' and 'sales' columns were not used directly in the supervised data. Instead, the 'sales' column was transformed into 'sales_diff' to capture the monthly sales differences. The 'date' column was not included in the supervised data.

  • @SaqibAhmadca
    @SaqibAhmadca 7 місяців тому +1

    Hi Bek, it was a great tutorial but i have question, why you calculated lr_mse, lr_mae, lr_r2 vars as you are not using them anywhere?

    • @BekBrace
      @BekBrace  7 місяців тому

      Salam Saqib. Thank you for watching, brother, there was supposed to be a second part for the tutorial, unfortunately I haven't had the chance to finish it, that's why. Hope you're not disappointed, and thank you for being a good friend for the channel's 🙂🙏

  • @gurtejbains
    @gurtejbains Рік тому +5

    Hi @Bek, great video.
    I see a few other people also asking the same question as mine. How can we use the fit model to predict sales for upcoming days? The sample data is at the day level so let's assume predicting daily sales for the upcoming month. Maybe you can record a new video as that will really add a lot of value.
    Thanks.

    • @BekBrace
      @BekBrace  Рік тому +4

      Hey Hey 👋 thank you
      I might create a follow-up video on this specific point / Thanks for the suggestion

    • @gurtejbains
      @gurtejbains Рік тому

      But to complete the loop and get an answer to the question in hand, do you have any recommendations for how to predict the upcoming days/weeks? Thanks again.

    • @terencedimanche
      @terencedimanche Рік тому

      @@gurtejbains Hey, I'm really interested to know how to forecast next days, months with this method as well !! Did you manage to find a solution?

    • @Silas_Wahome
      @Silas_Wahome Рік тому +1

      @@BekBrace
      Hi, @Bek Brace
      lr_model=LinearRegression()
      lr_model.fit(x_train,y_train)
      I am getting this error after putting in that code, what could be the issue
      ValueError: Found input variables with inconsistent numbers of samples: [1, 33]

    • @Vlapstone
      @Vlapstone Рік тому +1

      ​@@BekBrace Hi there mate... thanks a lot for the video, it's amazing. what about this other video to show how to make the predictions for the upcoming days... this is actually what matters as there is no sense in predicting something is already passed. You are a great teacher and pass the info clearly wed love to have this video from you continuing with the explanation, please.

  • @estherjokodola1187
    @estherjokodola1187 Рік тому +1

    Thank you ❤

  • @ForexPotatoe
    @ForexPotatoe 6 місяців тому

    Please how come you are grouping by no longer using the sales store df but just monthly sales that got me confused and i suggest next time you allow the code to run so you can see what you are getting you just run it but dont see your resutlts if its what you want before moving to the next

  • @am0x01
    @am0x01 Рік тому +1

    Hello,
    Can I use the same method you used here, in yearly gross production data?
    Thanks in advance

    • @BekBrace
      @BekBrace  Рік тому +1

      Yes of course you can 👍

  • @danielholocsi440
    @danielholocsi440 6 місяців тому +1

    I wish there was a tutorial for forecasting demand by items and by stores with this same dataset.

  • @ehiztheo166
    @ehiztheo166 Рік тому +1

    Awesome video, thanks so much for putting this out. Please I’m working on a project to predict sales for 28 days for Walmart store. Is it possible to follow this code format?

    • @BekBrace
      @BekBrace  Рік тому +1

      Thanks man, I'd say yes 🙂

    • @ehiztheo166
      @ehiztheo166 Рік тому

      @@BekBrace please how do i do the foreast for just 28 days? what should i do please

    • @Vlapstone
      @Vlapstone Рік тому

      @@ehiztheo166 hi there bro... above @SHUVRO AHMED said its about the threshold for the loop, check it out and try diminishing your as he pointed. It seems @Bek Brace is too busy to reply so many questions... hehehehe

  • @christopherdelgado1397
    @christopherdelgado1397 8 місяців тому

    I have a question about how to interpret the supervised data:
    I'm following the code and getting the same data as you no errors, but I'm confused on how the supervised_data ended up with 47 rows. How do these rows represent the sales of each store number if we dropped the store number in the very start of the video????

  • @user-lw3yn7ty9y
    @user-lw3yn7ty9y Рік тому

    Hi , I have a dataset who's data - granularity is monthly and I receive data for multiple items and stores but only monthly ie 1st of each month. How can I accommodate the code accordingly and forecast shares?

    • @notSOanonymousBD
      @notSOanonymousBD Рік тому

      hey, how did you do it , cause I have the same issue

  • @nctkim8476
    @nctkim8476 Рік тому +2

    Hello, Bek. I trying to do like yours code but with different data and i have problem in 'the preparing supervised data' . when i run it, it all have NaN values so i have nothing (they get drop). What should i do with that problem? can you give insight?
    Btw, awesome tutorial, Bek. Thank you for sharing this with us.
    # sorry if you not understand what i am saying, english is not my first language.

    • @BekBrace
      @BekBrace  Рік тому +1

      Hey hey 👋 your English is perfect 👍 and i understand your problem. Only one thing, when you try to clean the data from NaN, what do you get ?

    • @nctkim8476
      @nctkim8476 Рік тому

      @@BekBrace i got nothing just a column name like yours.
      btw, thank you for responding

    • @Vlapstone
      @Vlapstone Рік тому

      @SHUVRO AHMED nice you replied him, otherwise he'd still be lost... do you also get the need to have the predictions for the upcoming days? as its not part of this tutorial... im kinda lost of what this is for without the prediction for the upcoming days... if you do, can you share with me?

  • @adin6429
    @adin6429 11 місяців тому

    First of all thanks alot for awesome tutorial.
    Could you please answer how to apply the model to predict for next year, in this case 2019?

    • @BekBrace
      @BekBrace  10 місяців тому +1

      I will probably create a whole video to explain that, thanks for the suggestion my friend :)

    • @user-hl4cv6tb8q
      @user-hl4cv6tb8q 10 місяців тому

      This video is already done?@@BekBrace

  •  Рік тому +4

    When you want to plot the sales_difference you forgot to write (at Y axis data) difference. So the plot is wrong at 23:09

    • @BekBrace
      @BekBrace  Рік тому +1

      Thanks for the heads up

  • @01marcosnunes
    @01marcosnunes Рік тому +1

    when i going plot the chart, i obteined an error: TypeError: float() argument must be a string or a real number, not 'Period'.
    Please, you can help me?

    • @BekBrace
      @BekBrace  Рік тому

      Sure. The error message indicates that you are trying to convert a 'Period' object to a float, which is not possible. To resolve this issue, you need to convert the 'Period' object to a numeric value before using it in your sales forecast chart, you can convert the 'Period' object to a numeric value by accessing its 'value' attribute :
      numeric_value = period.value
      Make sure to check your code and ensure that you are applying the necessary conversion where needed.

  • @luisalejandro9335
    @luisalejandro9335 Рік тому +4

    Hello, i'm trying to apply this for university project but i'm not sure about what the process would be to make the predictions for the following months that we don't have information, could you help me? Many thanks

    • @BekBrace
      @BekBrace  Рік тому +1

      Sure

    • @gurtejbains
      @gurtejbains Рік тому +1

      Good question Luis. @bek, any answer for how to achieve this? Use the fit model to predict sales for the upcoming quarter?

    • @anjalipakmode5716
      @anjalipakmode5716 Рік тому +1

      Same question

    • @Vlapstone
      @Vlapstone Рік тому +1

      good question mate... did you find out how to do it?

    • @moatzmaloo
      @moatzmaloo Місяць тому

      You need time series analysis for this phenomena or put your values of nex quarters manually but take in your consideration extrapolation is not always perfect

  • @senzaura6786
    @senzaura6786 7 місяців тому +1

    Hello, Bek, thank you for making this video, it helps me alot.
    But, I want to ask something. When I first create the linear regression, the time when I add x/_train and y_train to model fit. It says, "Found input varibales with inconsistent number of samples." Any clue?

    • @senzaura6786
      @senzaura6786 7 місяців тому +1

      I notice that when I try to look up in range 1-13, for differences sales each store each month, I got the result is different with you.

    • @BekBrace
      @BekBrace  7 місяців тому

      This is odd. I have got to find the time to check out the code, but please feel free to ask the friends on the channel, they might be able to answer you quicker

    • @senzaura6786
      @senzaura6786 7 місяців тому +1

      I try to re-run everything and re-chechk everything the found out something odd, my supervised_data for sales_diff is totally different with you. Mine start with 3130 while you even start from minus value@@BekBrace Any clue?

    • @BekBrace
      @BekBrace  2 місяці тому

      I’m glad to hear the video has been helpful! Regarding the error you encountered-“Found input variables with inconsistent numbers of samples”-this typically occurs when the X_train and y_train datasets do not have the same number of rows. Here’s how you can address this issue:
      * Check Lengths: Make sure that both X_train and y_train have the same number of rows. You can check this by printing their shapes:
      print(X_train.shape)
      print(y_train.shape)
      * Synchronize Data: Ensure that during your data preparation phase, when you split the data or create features, you keep the dataset synchronized. For example, if you're creating lagged features or handling missing values, make sure each operation maintains alignment between your features (X_train) and targets (y_train).
      * Handling Missing Data: If your preprocessing steps (like calculating differences or dropping rows) introduce missing values, ensure that you handle these consistently across both feature and target datasets. For instance, if you drop rows with NaN values in X_train, do the same for y_train:
      # Assuming you've identified rows with NaNs in X_train
      X_train = X_train.dropna()
      y_train = y_train.loc[X_train.index] # Align y_train with X_train
      *Review Data Preparation: Go back and review the steps where you prepare X_train and y_train. There might be a step where the data gets out of sync, such as when splitting the data or creating features.
      By ensuring that both X_train and y_train are correctly aligned and contain the same number of samples, you should be able to resolve this error. If the issue persists, feel free to share more details about how you are preparing your datasets, and I’ll help you debug further!

  • @shilpashingari4111
    @shilpashingari4111 9 місяців тому

    In splitdata into train and test:
    Coding of minmaxscaler feature range (-1,1) show error found arry with 0samplee while a minimum of 1 is requires by minmaxscaler how to fix this value error

    • @BekBrace
      @BekBrace  7 місяців тому

      Hey friend!
      The error you're encountering suggests that your dataset has some features with zero samples, and the HeMinMaxScaler requires at least one sample for each feature to determine the scaling parameters.
      To fix this issue, you should ensure that your dataset has at least one sample for each feature before applying the MinMaxScaler.

  • @archangelYtube
    @archangelYtube Рік тому

    Hi bek!how to future predict in this method?for example for next 3 or 6 months?

    • @BekBrace
      @BekBrace  Рік тому +1

      Depends

    • @Vlapstone
      @Vlapstone Рік тому

      Hi there mate... did you find out how to do it?

  • @fidelinojuls
    @fidelinojuls Рік тому

    Hi, Bek! How could I predict for the next months using the same methodology?

    • @BekBrace
      @BekBrace  Рік тому +3

      Good question! That may trigger a future video to explain in details

    • @Zirea.eya69
      @Zirea.eya69 Рік тому +1

      @@BekBrace i need this too Sir

  • @arshadsyed3653
    @arshadsyed3653 Рік тому

    Sir what are the accuracy percentage of this project ?? Means how the accurate is the prediction ??

    • @BekBrace
      @BekBrace  Рік тому

      Hi there.
      The accuracy of a sales forecast in this tutorial depends on various factors, including the quality and relevance of the data, the appropriateness of the assumptions made, and the complexity of the sales patterns being modeled. As you saw, I used linear regression which is a commonly used technique for sales forecasting because it provides a straightforward way to model the relationship between independent variables (e.g., time, marketing spend) and sales.
      However, the accuracy of the predictions produced by a linear regression model can vary. It is important to evaluate the model's performance using appropriate metrics such as mean squared error (MSE), mean absolute error (MAE), or R-squared (coefficient of determination). These metrics quantify the difference between the predicted sales values and the actual sales values.
      I hope this answers your question, and don't leave the channel, as soon I am going be doing Credit Card Fraud detection analysis tutorial.

  • @presidenthotsauce
    @presidenthotsauce Рік тому +1

    Hi Bek, could you also share the test dataset, please? Thank you.

    • @BekBrace
      @BekBrace  Рік тому

      Yeah about that, unfortunately i cannot do it for the moment 😔 but i promise to do that later today

  • @aliyilbasi8587
    @aliyilbasi8587 4 місяці тому

    thanks for your effort ı really appreciated, but ı stuck to figure out logic behind supervised data can someone please explain it?

    • @BekBrace
      @BekBrace  2 місяці тому

      Thank you :)
      The supervised data creation is about structuring the dataset for supervised learning. We transform the time series data into a supervised learning problem by creating input-output pairs :)
      Shift the Data: We use the shift method to create lagged versions of the data. For example, if you want to predict sales based on the previous month, you shift the sales data by one month.
      Concatenate the Data: Combine these lagged features with the original data, aligning them properly to ensure each row contains the sales data for the current and previous months.
      Drop NaNs: Any rows that have NaN values (which occur because of shifting) are dropped to maintain a consistent dataset.
      This results in a dataset where each row can be used as an input-output pair for training a model. The input features are the lagged sales data, and the output is the sales for the current month.
      Here's a small code snippet to illustrate this:
      supervised_data = pd.concat([monthly_sales.shift(i) for i in range(1, n+1)], axis=1)
      supervised_data.dropna(inplace=True)
      beware my friend, n is the number of lagged months you want to use as input features.
      it's a long answer, but hopefully this cleared any mysteries for you :)

  • @fidelinojuls
    @fidelinojuls Рік тому

    Hi! If in my dataset, I've gotten a negative R2 score what does it mean?

    • @BekBrace
      @BekBrace  Рік тому

      Hi Julian :) - well, the R-squared (R2) score is a statistical measure that indicates the proportion of the variance in the dependent variable that can be explained by the independent variables in a regression model. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data.
      If you obtained a negative R2 score, it means that the regression model you used performed worse than a horizontal line (i.e., a constant model that ignores the independent variables) in explaining the variance in the dependent variable. In other words, the model's predictions are even worse than simply using the mean value of the dependent variable as a constant.

  • @funwithfriends6601
    @funwithfriends6601 Рік тому +5

    Your voice is awsm

    • @BekBrace
      @BekBrace  Рік тому

      Thank you 😊

    • @Vlapstone
      @Vlapstone Рік тому

      indeed... really smooth and you know how to explain it well. Congrats... just wish you could have a video showing how to predict the upcoming sales for the next 3 months.

  • @nelsonstropa8352
    @nelsonstropa8352 Рік тому

    I can't read the csv file,
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 17: invalid start byte
    What am i doing wrong?

    • @BekBrace
      @BekBrace  Рік тому

      Hey friend.
      This error occurs when the CSV file you're trying to read contains characters that are not in the UTF-8 character encoding, and I think one way to solve it before reading the file is as follows:
      import codecs
      with codecs.open('file.csv', 'rb', encoding='iso-8859-1') as f:
      # read the file here
      Try that and let me know

  • @amanpiyushsharma
    @amanpiyushsharma Рік тому

    at 16:31 i m getting an error 'DatetimeProperties' object has no attribute 'to_timestamp'
    help me please

    • @BekBrace
      @BekBrace  Рік тому

      The error message you are seeing, "'DatetimeProperties' object has no attribute 'to_timestamp'", suggests that you are trying to use the to_timestamp method on a DatetimeProperties object, but this method is not available for that particular object.
      In Python's standard library, there is no built-in to_timestamp method for the DatetimeProperties class. However, the to_timestamp method is available for datetime objects in Python, which allows converting a datetime object to a Unix timestamp.
      If you have a DatetimeProperties object and you want to convert it to a timestamp, you can use the timestamp() method available for datetime objects.

  • @ivankalarasati
    @ivankalarasati 5 місяців тому

    why mine 'DataFrame' object has no attribute 'reset' how to solve it?

    • @BekBrace
      @BekBrace  2 місяці тому

      The error "'DataFrame' object has no attribute 'reset'" likely occurs due to a typo. The correct method is reset_index(). Ensure you use:
      df.reset_index(drop=True)
      Double-check your DataFrame's name to avoid referencing errors.

  • @akshayshinde6641
    @akshayshinde6641 3 місяці тому

    How to use this model to predict 2018 forecast

    • @BekBrace
      @BekBrace  2 місяці тому

      # Assuming 'model' is your trained linear regression model and 'scaler' is your Min-Max scaler
      # Last 12 months sales data from 2017
      input_features = np.array([Dec_2016_sales, Jan_2017_sales, ..., Nov_2017_sales, Dec_2017_sales])
      # Scale the features as the model expects scaled input
      scaled_features = scaler.transform([input_features])
      # Make prediction
      predicted_sales_Jan_2018 = model.predict([scaled_features])
      # Inverse scale if the output was scaled
      predicted_sales_Jan_2018 = scaler.inverse_transform([predicted_sales_Jan_2018])
      # Use predicted_sales_Jan_2018 to update your dataset for the next prediction if necessary

  • @sikiruyusuff1246
    @sikiruyusuff1246 5 місяців тому +1

    how could the sum of sales for 2013-02-01 be 459417?

    • @BekBrace
      @BekBrace  2 місяці тому

      This total likely represents the aggregated sales across all stores and items for the entire month of January 2013. In data preprocessing, the dates might be shifted or labeled to reflect the period they represent, such as using the first day of the following month to indicate the total sales of the previous month. Always ensure that the date handling aligns with how your data is structured and aggregated.

  • @ForexPotatoe
    @ForexPotatoe 6 місяців тому

    i dont get the logic behind monthly_Sales =df.groupby('Date').sum().reset_index()
    monthly_Sales grouping by with month when you will later convert again to timestamp later on

    • @naren2412
      @naren2412 6 місяців тому

      This just gives you the total sales for a particular month ...first grouped by month and then take the sum of all the sales in that month...He changes the data type of the 'date' for the sake of time series plot .

    • @BekBrace
      @BekBrace  6 місяців тому

      Thank you Naren for the answer

  • @tf015_nihlatilmaula2
    @tf015_nihlatilmaula2 2 місяці тому

    is this code applicable for multiple linear regression?

    • @BekBrace
      @BekBrace  2 місяці тому

      Yes, the code can be adapted for multiple linear regression. Multiple linear regression simply involves more input features.
      from sklearn.linear_model import LinearRegression
      import numpy as np
      # Assuming 'train_data' is your training data with multiple features
      X_train = train_data.drop('target', axis=1)
      y_train = train_data['target']
      # Initialize and train the model
      model = LinearRegression()
      model.fit(X_train, y_train)
      # Prepare the last 12 months of features for prediction
      last_12_months_features = np.array([last_12_months_data])
      # Prepare list to store predictions
      future_predictions = []
      for _ in range(12):
      # Predict next month's target
      next_month_prediction = model.predict(last_12_months_features)

      # Append the prediction to future_predictions
      future_predictions.append(next_month_prediction)

      # Update last_12_months_features for next prediction
      last_12_months_features = update_features(last_12_months_features, next_month_prediction)
      # future_predictions now contains the forecast for the next 12 months

  • @khushbugupta6970
    @khushbugupta6970 5 місяців тому

    We didnt use XG Boost and Random forest as we intented first

  • @shreyagoyal2847
    @shreyagoyal2847 Рік тому

    I am not able to download this train dataset from github, if anyone could please guide me…

    • @BekBrace
      @BekBrace  Рік тому

      Click on the file, then click view raw, then copy the data and paste it into an excel file saved under csv file extension

    • @shreyagoyal2847
      @shreyagoyal2847 Рік тому

      Done, now while writing the code, I am facing issues while downloading library for tensorflow

    • @BekBrace
      @BekBrace  11 місяців тому

      ​@@shreyagoyal2847what's the error ?

  • @tf015_nihlatilmaula2
    @tf015_nihlatilmaula2 2 місяці тому

    just wanna add, if you face some error with this code line
    monthly_sales = store_sales.groupby('date').sum().reset_index()
    change to this
    monthly_sales = store_sales.groupby('date').agg({'sales':'sum'}).reset_index()

  • @shwetalpatil4461
    @shwetalpatil4461 Рік тому +1

    Hi Nice explanation, can you give google colab file to me?

    • @BekBrace
      @BekBrace  Рік тому

      Unfortunately it was lost and didn't keep a copy of it, I'll look through my old files though and keep you posted. Thank you for watching 🙂

  • @harrisjunejanani5887
    @harrisjunejanani5887 4 місяці тому

    can you post the codes that you are using?

  • @NagaManojG
    @NagaManojG 2 місяці тому

    collab link ?

  • @DipanSadekeen
    @DipanSadekeen 3 місяці тому

    where can I find the code?

  • @user-tv1ir5fc3x
    @user-tv1ir5fc3x 6 місяців тому

    awasome sir, can I ask for the code?

  • @prashantipenumatsa8814
    @prashantipenumatsa8814 Рік тому +1

    Can you provide the total code if possible

    • @BekBrace
      @BekBrace  11 місяців тому +1

      Sorry, but I lost the code somehow

  • @ivankalarasati
    @ivankalarasati 2 місяці тому

    how to predict 1 year in the future after this?

    • @BekBrace
      @BekBrace  2 місяці тому

      import numpy as np
      # Assuming 'model' is your trained model and 'scaler' is your Min-Max scaler
      # Last 12 months sales data
      input_features = np.array([last_12_months_sales])
      # Prepare list to store predictions
      future_predictions = []
      for _ in range(12):
      # Scale features
      scaled_features = scaler.transform([input_features])

      # Predict next month's sales
      next_month_prediction = model.predict(scaled_features)

      # Inverse scale the prediction
      next_month_sales = scaler.inverse_transform(next_month_prediction)

      # Store the prediction
      future_predictions.append(next_month_sales)

      # Update input features for next prediction
      input_features = np.append(input_features[1:], next_month_sales)
      # future_predictions now contains the sales forecast for the next 12 months

    • @ivankalarasati
      @ivankalarasati 21 день тому

      @@BekBrace the result of future prediction is like that, is that true?
      ]
      0s
      future_predictions
      [array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706]),
      array([0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706,
      0.123706, 0.123706, 0.123706, 0.123706, 0.123706, 0.123706])]

  • @DineshR-uu8rq
    @DineshR-uu8rq Рік тому +2

    lr_mse = np.sqrt(mean_squared_error(predict_df['Linear Prediction'], monthly_sales['sales'][-12:]))
    lr_mae = mean_absolute_error(predict_df['Linear Prediction'], monthly_sales['sales'][-12:])
    lr_r2 = r2_score = (predict_df['Linear Prediction'], monthly_sales['sales'][-12:])
    print("Linear Regression MSE", lr_mse)
    print("Linear Regression MAE", lr_mae)
    print("Linear Regression R2", lr_r2)
    Bro, I have run this code but the accuracy is not displaying .
    I follow what you said to cut this specific code and give runall and again i paste and run the code it again show like that only not show the accuracy?What to do now?

    • @saloualakhdar6659
      @saloualakhdar6659 Рік тому +2

      i have the same problem as yours

    • @PaulDenman
      @PaulDenman Рік тому +1

      Hi @user-ev8cs6yw4r and @saloualakhdar6659
      You need to change this line from:
      lr_r2 = r2_score = (predict_df['Linear Prediction'], monthly_sales['sales'][-12:])
      To:
      lr_r2 = r2_score(predict_df['Linear Prediction'], monthly_sales['sales'][-12:])
      The change in the video happens at 52:24 -> 52:25, but it isn't mentioned ;)

  • @spitfirelast8761
    @spitfirelast8761 Рік тому +1

    How can you make this into a website feature?

    • @BekBrace
      @BekBrace  Рік тому

      that is to look in deeper, do not have a ready answer now, but i suspect it is very possible to convert the algorithms into an interactive web app for deployment