Linear Regression vs Decision Trees

Поділитися
Вставка
  • Опубліковано 14 жов 2024

КОМЕНТАРІ • 30

  • @chymoney1
    @chymoney1 2 роки тому +13

    This was great Dimitri you should do more technical stuff

  • @philipnye5947
    @philipnye5947 2 роки тому +9

    Great video! I’d love to see more content discussing model selection.

    • @DimitriBianco
      @DimitriBianco  2 роки тому +1

      I'll see if I can create a few. Videos like this one often come from questions someone asked me.

  • @fahmyboy1
    @fahmyboy1 2 роки тому +1

    LOVED this video
    I’m all for anything related to practical modeling

  • @willgriffin9793
    @willgriffin9793 2 роки тому +2

    Great video, hope we see more model comparisons in the future.

  • @manuelangelsuarezalvarez3355
    @manuelangelsuarezalvarez3355 2 роки тому +2

    Love these more practical videos. Thank you really much for sharing this!
    Greetings from Spain.

  • @Apuryo
    @Apuryo Рік тому +1

    I am new to this, but I have a question. Shouldn't the x^2 be a parabola? why does the plot look like x=y^2? isn't the plot showing y=x^[1/2]?

  • @georgez.7278
    @georgez.7278 Рік тому

    very nice and helpful video, thank you

  • @andresrossi9
    @andresrossi9 2 роки тому +1

    Ok I'm going to trade off some precious sleep to see this❤️

  • @Isaiah_McIntosh
    @Isaiah_McIntosh 2 роки тому +2

    Not on the topic of the video at all but if I am making any methodology mistakes any advice would be appreciated, or if it's possible to treat with the serial correlation and instability at the end without changing the variables. I'm taking my first econometrics class which is currently working on OLS but my International trade lecturer wants me to examine the effects of competitive export threat from china, balassa Samuelson, reer and natural resource rents on domestic manufacturing sector. Trying to determine the impact of export threat (china) on our domestic manufacturing sector, the controls being based in productivity ratio (balassa Samuelson), and natural resource reliance/ dutch disease (oil and gas revenues). So below is my reasoning so far......honestly I'm out of ideas besides changing the variables.
    Given that manufacturing value added (as a percentage of GDP) depends on lags of itself and other variables in the model we’re looking towards Autoregressive models: VAR, VECM, ARDL or ECM. First when testing all variables for stationarity using the Kwiatkowski-Phillips-Schmidt-Shin test we found that there was a mix of I(0) and I(1) data, which encouraged the use of an ARDL model. KPSS was referenced instead of the standard ADF or PPP tests due to clearly incorrect stationarity claims for the Balassa Samuelson data in the standard methods, which were absent in the KPSS test. Data was previously logged to remove an I(2) stationarity result.
    We then checked the lag order selection criteria. All criteria recommended a lag length of 1 for the model. A long run form and bound test was then estimated using log (manufacturing value added) as the dependent variable, this confirmed no cointegration as the F test stat was less than the I(1) bound. An ARDL model was then estimated at lag length 1, using manufacturing value added as the dependent variable. Tests were then performed for serial correlation and stability. We found serial correlation and apparent evidence of a structural break. I thought it could be possible to treat with serial correlation problems by increasing the lag length to 2 but I don't have a thetorical basis for that change so my tutor denied it as an option.
    I'm completely out of idea short of returning to variable selection, but I am hoping it's just a model specification error that I can fix. Variables currently being used are log of Manu value added %GDP = f(log(real effective exchange rate), log(balassa Samuelson), log(Static index of competitive threat with respect to manufacturing), log( oil and natural gas revenues))

    • @DimitriBianco
      @DimitriBianco  2 роки тому +1

      With any model, the residuals will tell you what's wrong. Have you ran an ACF and PACF test on the residuals of your current model? This will tell you if there are serial correlation issues not correctly addressed. I'm certain you'll have serial correlation remaining.
      Are you specifying any AR terms? If so try using a seasonal lag meaning just that lag. For example Yt-2 instead of lags 1 and 2 for an AR(2). Also plot the cross correlation between the dependent and independent variables. Sometimes you need to lag your independent variables for them to have significance.

    • @Isaiah_McIntosh
      @Isaiah_McIntosh 2 роки тому +1

      @@DimitriBianco I managed to get a serviceable model. I needed to drop the balassa Samuelson exchange rate variable, this was the I(0) so I was able to run a VAR after removing it, and the real effective exchange rate, it was very correlated with oil and natural gas revenues which was leading to multicollinearity issues, as well as serial correlation on its own. I can capture the effect I wanted to investigate from those variables more readily in a separate model.
      I also had to replace the static index of competitive threat with the dynamic index of competitive threat. I box-cox transformed my remaining variables, rather than blindly logging as I was doing before, to improve normality of residuals. Between the transformation and variable reselection I manged to get a model with normal residuals according to Jacques Bera, no serial correlation according to LM/Breusch Godfrey test, AR roots all within the unit circle so no stability/structural break issues. I really hope I didn't botch this and will have to start over again.
      Real life data is a headache.

  • @rosaevee274
    @rosaevee274 Рік тому

    I would be interested to see a comparison between trees and polynomial regression. Or between trees and lasso/ridge.

  • @charlesmcdowell9436
    @charlesmcdowell9436 Рік тому +1

    Great video.

  • @daanialahmad1759
    @daanialahmad1759 2 роки тому +1

    Great video Dimitri

  • @umanggarg970
    @umanggarg970 2 роки тому +1

    Really, statistics videos will help a lot!

  • @mimi-kv6qu
    @mimi-kv6qu 2 роки тому

    Great content as always. Do you think it is a good idea to conduct a long video about how to conduct Algorithmic trading which cover the programming , the data processing, the trading logic and the backtest. I think it would be very valuable in term of demonstrate the bigger picture that the work related to Quant Finance!

    • @DimitriBianco
      @DimitriBianco  2 роки тому

      It is a hard area for me to cover especially because I'm not involved in it. To do it right you also need a lot of data which others don't have.

    • @mimi-kv6qu
      @mimi-kv6qu 2 роки тому +1

      ​@@DimitriBianco I thought quant understand trading so that they can build a trading strategies by math model. Is there a misunderstanding for me about quant job? Thank you.

    • @DimitriBianco
      @DimitriBianco  2 роки тому +1

      @@mimi-kv6qu quants build models for finance as a whole industry. For example, some price stocks (trading) and others price loans (banking). We also build a wide range of models to solve problems like portfolio optimization, detect fraud in credit card transactions, model volatility, determine how often to call customers, predict counter party risk, or predict GDP.

    • @DimitriBianco
      @DimitriBianco  2 роки тому

      For me I just like building models to solve interesting problems.

  • @desaint9469
    @desaint9469 2 роки тому +1

    Great Video, sir !!

  • @vfxvision723
    @vfxvision723 11 місяців тому

  • @sentralorigin
    @sentralorigin 2 роки тому

    why are decision trees specifically mentioned here as opposed to other techniques such as neural networks, SVMs, Bayes, nearest neighbor, etc.

    • @DimitriBianco
      @DimitriBianco  2 роки тому +1

      Because the video needs to be short and concise. There are many other statistical methods that could have also been used as well.

    • @sentralorigin
      @sentralorigin 2 роки тому +1

      @@DimitriBianco ah ok, i thought there was a specific reason of choice, like some special relationship between linear regressions and decision trees

  • @daanialahmad1759
    @daanialahmad1759 2 роки тому

    Dimitri if nodes are large Decision Trees can cause overfitting