All Machine Learning Beginner Mistakes explained in 17 Min

Поділитися
Вставка
  • Опубліковано 29 гру 2024

КОМЕНТАРІ • 66

  • @divyanshpandey8355
    @divyanshpandey8355 3 дні тому +10

    I was actually gonna watch this while having some pasta, but 2 minutes later I realised I need to get my notebook and pen ASAP! Golden Content my guy, pure GOLD.

  • @nikunjdeeep
    @nikunjdeeep 8 днів тому +20

    first 3 mins and i added this to my liked section ....pure gem

  • @hellblazer7
    @hellblazer7 2 дні тому +3

    This video is insane. It's so good that it should be included in any ml based academic book as a synopsis.

  • @TinkerRaw
    @TinkerRaw 2 дні тому +1

    I submit an abstract on the 7th featuring my first ever ML work in my field, i’ve been very nervous about making simple errors or not presenting the research in a way that ML people would feel satisfied by. This was super helpful, thank you

  • @mogaolimpiu7190
    @mogaolimpiu7190 4 години тому

    this is great, its something i wish i had when i was banging my head against the wall when my models weren't behaving how I thought they would, this video mentions all the issues i spent weeks working one, and more !, great tool.

  • @damori3604
    @damori3604 4 дні тому +4

    I don't understand anything but I'm still hooked because my brain tells me it's helpful

  • @petersilie9702
    @petersilie9702 3 дні тому +1

    As an amateur data Analyst/scientist, I think This is insanely useful Information. Thanks for Sharing

  • @tonycincera3353
    @tonycincera3353 2 дні тому

    Having recently retired after having worked as a Data Scientist for over 3 decades, this is a very, very good summary of the issues and fixes not just for ML but for any predictive modeling project.

  • @prison9865
    @prison9865 7 днів тому +6

    Seriously good refresher. I like this type of videos. Quick and to the point. Got job

  • @PotatoMan1491
    @PotatoMan1491 2 дні тому

    I am just a new hobbyist, this content is awesome, I find it very helpful.

  • @anrichvanderwalt1108
    @anrichvanderwalt1108 2 дні тому

    Great video! All the lessons I had to learn the hard way in my first 2 years!

  • @cycla
    @cycla 3 дні тому

    I just love these types of organized information, it's the data scientist's way

  • @michaelmcallister9519
    @michaelmcallister9519 3 дні тому

    This is a good video, way better than others I’ve seen.

  • @Blooper1980
    @Blooper1980 9 днів тому +6

    THIS IS AWSOME!

  • @NilasScweisthal
    @NilasScweisthal 6 днів тому +1

    This video is a perfect checklist😂. Thanks🙏

  • @narangfamily7668
    @narangfamily7668 9 днів тому +2

    Thanks! I try to avoid most of these

  • @newbie8051
    @newbie8051 5 днів тому

    2:20 ahaaaa, had this asked in an interview as well
    6:07 read about annealing learning rates, will try to implement that as well

  • @alin50248
    @alin50248 4 дні тому +1

    Very good content!

  • @bitcode_
    @bitcode_ 7 днів тому

    Amazing content, thank you for helping out a noob

  • @olavictor6286
    @olavictor6286 9 днів тому

    Your video is top notch as always, just diving into the world of ML

  • @gnkartha
    @gnkartha 7 днів тому +1

    Great summary!

  • @Riku-pv5dc
    @Riku-pv5dc 9 днів тому

    Great and informative video! I'm sure I'll rewatch it many times.
    I started learning programming just a month and a half ago (through Udemy courses), and I'm already building my first dataset on EV chargers installed in Europe (I have a dataframe with over a million parameters!). Once I clean it up, I'll move on to running ML algorithms on it.
    Thank you for the effort you're investing in future generations, Mr. Nobody!

    • @Riku-pv5dc
      @Riku-pv5dc 9 днів тому

      During my university years (I'm finishing my degree in Engineering Management this semester), I studied statistics, linear algebra, calculus (ended somewhere around Hessian matrices), and optimization over the past 2-3 years. It feels like a dream come true to now apply these concepts in a programming environment, which I previously only worked on theoretically with pen and paper.

  • @ardgeorge4175
    @ardgeorge4175 6 днів тому

    Common sense, but very good refresher. Thanks!

  • @RolandoLopezNieto
    @RolandoLopezNieto 7 днів тому +1

    Great content, thanks

  • @eduardo.z6909
    @eduardo.z6909 9 днів тому +1

    Thanks for video!

  • @anatolsonntag857
    @anatolsonntag857 9 днів тому +1

    You have high quality Videos.
    If you keep up with those you will be very succesful.
    Keep up the good work.
    I ll bet you ll achieve 100k subscribers in the 6 Months.

  • @andrelim2428
    @andrelim2428 День тому

    thank you for creating this video! quick question... re not shuffling data 9:58 for time series data, wouldn't shuffling data introduce train / test set contamination? also, wouldn't the order be important for time series data and shuffling it ruin the time arrow? thank you!

  • @notVinodRodrigo
    @notVinodRodrigo 9 днів тому

    Wow man! This is gold!

  • @bathalamallikarjuna2316
    @bathalamallikarjuna2316 8 днів тому

    Absolutely fantastic

  • @ramayuda6578
    @ramayuda6578 8 днів тому +5

    What about data created dishonestly? Basically, I’m not an IT programmer, but I’m learning data science. As a practitioner, I’ve occasionally created or reported dishonest data. I think, as a human, others might do the same. Can this affect the accuracy of the model in general?

    • @joshuadavid6810
      @joshuadavid6810 7 днів тому +4

      Yeah definitely. If the data is wrong, no model can save it.

  • @hanyanglee9018
    @hanyanglee9018 3 дні тому

    ver control, docs, are very good. I never shuffle data.

  • @piggybox
    @piggybox 3 дні тому

    This is years of experience and lessons learnt for "beginner"

  • @bluesbucker70
    @bluesbucker70 3 дні тому +2

    Ignoring domain knowledge is the worst of these by far. If you don't understand the domain, you will generate trivial, weak or useless solutions even if you do everything else right.

  • @ozgurdenizcelik
    @ozgurdenizcelik 9 днів тому

    it's lot to take note from one session i believe you have more detailed videos on your channel. I'll check out later. Thanks

  • @maitreyimandal8910
    @maitreyimandal8910 5 днів тому

    Very good

  • @carlosandrescastromarin7775
    @carlosandrescastromarin7775 2 дні тому +2

    I stopped the video right away when SMOTE was suggested as a solution to class imbalance.

    • @jamestriveri304
      @jamestriveri304 2 дні тому

      Same!

    • @somnath3986
      @somnath3986 День тому +1

      Why? is it not the actual solution

    • @carlosandrescastromarin7775
      @carlosandrescastromarin7775 День тому +2

      @@somnath3986 SMOTE is more likely to create synthetic samples of the majority class instead of the minority. To assess this issue under-sampling is preferred to oversampling, however class imbalance is not a problem but usually the nature of the data, so best solution would be to actually use a loss function that penalizes the majority class.

  • @abhishekdas8807
    @abhishekdas8807 7 днів тому

    This is good stuff

  • @emont
    @emont 16 годин тому

    Now I understand why Splunk makes sense in AI world, instead of create AI you must assure to have the better dataset first.

  • @Onlyhas99
    @Onlyhas99 7 днів тому

    instant subscribe

  • @prison9865
    @prison9865 7 днів тому

    actually, could you make an explanation when to do scaling of factors? is it needed only for distance based algorithms and how do you deploy a model if you did the scaling?

  • @Glorytroly9092
    @Glorytroly9092 6 днів тому

    Hey, please do the same videos for statistics and other concepts which are foundation ...

  • @Glorytroly9092
    @Glorytroly9092 6 днів тому

    Will you create similar content for statistics concepts like your previous videos ?

  • @klane3514
    @klane3514 5 днів тому

    well, I would love to do hyperparameter optimization and use cross validation but each epoch is 16h and we need to publish a paper so :(

  • @alexandrodisla6285
    @alexandrodisla6285 3 дні тому

    Ohhh there are such things has model validation.

  • @Qris_7711
    @Qris_7711 7 днів тому +1

    Non-stationary data is being missed

    • @acasualviewer5861
      @acasualviewer5861 6 днів тому

      what do you mean by this? Can you explain?

    • @FedeAlbertini
      @FedeAlbertini 5 днів тому

      @@acasualviewer5861Essentially, unless you have a degree in stats or maths, avoid time series data

    • @FedeAlbertini
      @FedeAlbertini 5 днів тому

      @@acasualviewer5861Stationarity is a property of some time series data. It essentially ensures that the distribution of out which the time series data is generated does not change over time (this is for strict stationarity. For weak stationarity only the first two moments and autocovariance need to stay the same when analysing two time points that are h time steps apart). But yeah, unless you know what you are doing, stay away from time series

    • @acasualviewer5861
      @acasualviewer5861 5 днів тому

      @@FedeAlbertini you mean like temperatures tend to be cooler in the winter vs the summer? So you could say its non-stationary?

    • @FedeAlbertini
      @FedeAlbertini 5 днів тому

      @@acasualviewer5861 yes, temperatures are non stationary. They have a trend(global warming) and seasonal components. Most processes are in fact non stationary, but pretty much all of the time series modelling techniques assume stationarity. Therefore, to model correctly, you need to know how to turn non stationary processes into stationary ones. Common techniques are differencing, de trending and log transformations.

  • @Nino21370
    @Nino21370 8 днів тому

    🔥

  • @__krossell__
    @__krossell__ 4 дні тому

    Using SMOTE is a beginner mistake

  • @Itz_Akashi.134
    @Itz_Akashi.134 9 днів тому +4

    First 😂

  • @init_yeah
    @init_yeah 9 днів тому +1

    Duo

  • @jamesrosicky2912
    @jamesrosicky2912 2 дні тому

    Amazing, thanks!