I was actually gonna watch this while having some pasta, but 2 minutes later I realised I need to get my notebook and pen ASAP! Golden Content my guy, pure GOLD.
I submit an abstract on the 7th featuring my first ever ML work in my field, i’ve been very nervous about making simple errors or not presenting the research in a way that ML people would feel satisfied by. This was super helpful, thank you
this is great, its something i wish i had when i was banging my head against the wall when my models weren't behaving how I thought they would, this video mentions all the issues i spent weeks working one, and more !, great tool.
Having recently retired after having worked as a Data Scientist for over 3 decades, this is a very, very good summary of the issues and fixes not just for ML but for any predictive modeling project.
Great and informative video! I'm sure I'll rewatch it many times. I started learning programming just a month and a half ago (through Udemy courses), and I'm already building my first dataset on EV chargers installed in Europe (I have a dataframe with over a million parameters!). Once I clean it up, I'll move on to running ML algorithms on it. Thank you for the effort you're investing in future generations, Mr. Nobody!
During my university years (I'm finishing my degree in Engineering Management this semester), I studied statistics, linear algebra, calculus (ended somewhere around Hessian matrices), and optimization over the past 2-3 years. It feels like a dream come true to now apply these concepts in a programming environment, which I previously only worked on theoretically with pen and paper.
You have high quality Videos. If you keep up with those you will be very succesful. Keep up the good work. I ll bet you ll achieve 100k subscribers in the 6 Months.
thank you for creating this video! quick question... re not shuffling data 9:58 for time series data, wouldn't shuffling data introduce train / test set contamination? also, wouldn't the order be important for time series data and shuffling it ruin the time arrow? thank you!
What about data created dishonestly? Basically, I’m not an IT programmer, but I’m learning data science. As a practitioner, I’ve occasionally created or reported dishonest data. I think, as a human, others might do the same. Can this affect the accuracy of the model in general?
Ignoring domain knowledge is the worst of these by far. If you don't understand the domain, you will generate trivial, weak or useless solutions even if you do everything else right.
@@somnath3986 SMOTE is more likely to create synthetic samples of the majority class instead of the minority. To assess this issue under-sampling is preferred to oversampling, however class imbalance is not a problem but usually the nature of the data, so best solution would be to actually use a loss function that penalizes the majority class.
actually, could you make an explanation when to do scaling of factors? is it needed only for distance based algorithms and how do you deploy a model if you did the scaling?
@@acasualviewer5861Stationarity is a property of some time series data. It essentially ensures that the distribution of out which the time series data is generated does not change over time (this is for strict stationarity. For weak stationarity only the first two moments and autocovariance need to stay the same when analysing two time points that are h time steps apart). But yeah, unless you know what you are doing, stay away from time series
@@acasualviewer5861 yes, temperatures are non stationary. They have a trend(global warming) and seasonal components. Most processes are in fact non stationary, but pretty much all of the time series modelling techniques assume stationarity. Therefore, to model correctly, you need to know how to turn non stationary processes into stationary ones. Common techniques are differencing, de trending and log transformations.
I was actually gonna watch this while having some pasta, but 2 minutes later I realised I need to get my notebook and pen ASAP! Golden Content my guy, pure GOLD.
first 3 mins and i added this to my liked section ....pure gem
This video is insane. It's so good that it should be included in any ml based academic book as a synopsis.
I submit an abstract on the 7th featuring my first ever ML work in my field, i’ve been very nervous about making simple errors or not presenting the research in a way that ML people would feel satisfied by. This was super helpful, thank you
this is great, its something i wish i had when i was banging my head against the wall when my models weren't behaving how I thought they would, this video mentions all the issues i spent weeks working one, and more !, great tool.
I don't understand anything but I'm still hooked because my brain tells me it's helpful
As an amateur data Analyst/scientist, I think This is insanely useful Information. Thanks for Sharing
Having recently retired after having worked as a Data Scientist for over 3 decades, this is a very, very good summary of the issues and fixes not just for ML but for any predictive modeling project.
Seriously good refresher. I like this type of videos. Quick and to the point. Got job
I am just a new hobbyist, this content is awesome, I find it very helpful.
Great video! All the lessons I had to learn the hard way in my first 2 years!
I just love these types of organized information, it's the data scientist's way
This is a good video, way better than others I’ve seen.
THIS IS AWSOME!
This video is a perfect checklist😂. Thanks🙏
Thanks! I try to avoid most of these
2:20 ahaaaa, had this asked in an interview as well
6:07 read about annealing learning rates, will try to implement that as well
Very good content!
Amazing content, thank you for helping out a noob
Your video is top notch as always, just diving into the world of ML
Great summary!
Great and informative video! I'm sure I'll rewatch it many times.
I started learning programming just a month and a half ago (through Udemy courses), and I'm already building my first dataset on EV chargers installed in Europe (I have a dataframe with over a million parameters!). Once I clean it up, I'll move on to running ML algorithms on it.
Thank you for the effort you're investing in future generations, Mr. Nobody!
During my university years (I'm finishing my degree in Engineering Management this semester), I studied statistics, linear algebra, calculus (ended somewhere around Hessian matrices), and optimization over the past 2-3 years. It feels like a dream come true to now apply these concepts in a programming environment, which I previously only worked on theoretically with pen and paper.
Common sense, but very good refresher. Thanks!
Great content, thanks
Thanks for video!
You have high quality Videos.
If you keep up with those you will be very succesful.
Keep up the good work.
I ll bet you ll achieve 100k subscribers in the 6 Months.
thank you for creating this video! quick question... re not shuffling data 9:58 for time series data, wouldn't shuffling data introduce train / test set contamination? also, wouldn't the order be important for time series data and shuffling it ruin the time arrow? thank you!
Wow man! This is gold!
Absolutely fantastic
What about data created dishonestly? Basically, I’m not an IT programmer, but I’m learning data science. As a practitioner, I’ve occasionally created or reported dishonest data. I think, as a human, others might do the same. Can this affect the accuracy of the model in general?
Yeah definitely. If the data is wrong, no model can save it.
ver control, docs, are very good. I never shuffle data.
This is years of experience and lessons learnt for "beginner"
Ignoring domain knowledge is the worst of these by far. If you don't understand the domain, you will generate trivial, weak or useless solutions even if you do everything else right.
it's lot to take note from one session i believe you have more detailed videos on your channel. I'll check out later. Thanks
Very good
I stopped the video right away when SMOTE was suggested as a solution to class imbalance.
Same!
Why? is it not the actual solution
@@somnath3986 SMOTE is more likely to create synthetic samples of the majority class instead of the minority. To assess this issue under-sampling is preferred to oversampling, however class imbalance is not a problem but usually the nature of the data, so best solution would be to actually use a loss function that penalizes the majority class.
This is good stuff
Now I understand why Splunk makes sense in AI world, instead of create AI you must assure to have the better dataset first.
instant subscribe
actually, could you make an explanation when to do scaling of factors? is it needed only for distance based algorithms and how do you deploy a model if you did the scaling?
Hey, please do the same videos for statistics and other concepts which are foundation ...
Will you create similar content for statistics concepts like your previous videos ?
well, I would love to do hyperparameter optimization and use cross validation but each epoch is 16h and we need to publish a paper so :(
Ohhh there are such things has model validation.
Non-stationary data is being missed
what do you mean by this? Can you explain?
@@acasualviewer5861Essentially, unless you have a degree in stats or maths, avoid time series data
@@acasualviewer5861Stationarity is a property of some time series data. It essentially ensures that the distribution of out which the time series data is generated does not change over time (this is for strict stationarity. For weak stationarity only the first two moments and autocovariance need to stay the same when analysing two time points that are h time steps apart). But yeah, unless you know what you are doing, stay away from time series
@@FedeAlbertini you mean like temperatures tend to be cooler in the winter vs the summer? So you could say its non-stationary?
@@acasualviewer5861 yes, temperatures are non stationary. They have a trend(global warming) and seasonal components. Most processes are in fact non stationary, but pretty much all of the time series modelling techniques assume stationarity. Therefore, to model correctly, you need to know how to turn non stationary processes into stationary ones. Common techniques are differencing, de trending and log transformations.
🔥
Using SMOTE is a beginner mistake
pls elaborate
First 😂
Makes me feel like a rock star :-D
You are to me🎉
Duo
Amazing, thanks!