Variable Length Features and Deep Learning

Data Talks

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 лип 2024
If you’re like me, you don’t really need to train self-driving car algorithms or make a cat-image-detectors. Instead, you're likely dealing with practical problems and normal looking data.
The focus of this series is to help the practitioner develop intuition about when and how to use Deep Learning (DL) models in normal situations with normal data, e.g. structured (i.e. something you can read into pandas) data. I will teach you the fundamentals-the building blocks of DL.
There are many courses that teach DL on computer vision, NLP, etc. This is not that. We are about teaching the practitioner how to transform normal machine learning (ML) models into DL models-and I have a lot of experience doing just that.
Some existing DL courses are either overly theoretical (not useful to practitioners), overly simplistic (belying the sophistication), or even overly practical (providing the practitioner with a false sense of security). DL is hard. Real data science is hard. We want to steer you away from the most common mistakes.
By starting with tabular data, we can introduce you to the DL toolbox in a more intuitive way. Note, this series is not about the underlying math for neural networks or the like.
This series is aimed most directly at intermediate level users.
Helpful links:
Link to Deep Learning Building Blocks Series:
• Python Keras - Deep Le...
Link to GitHub repo including tabular data lesson:
github.com/knathanieltucker/d...

КОМЕНТАРІ • 14

@beagle989 2 роки тому
Adding the context back in with the original data is such a pro move.
@MrJak3d 4 роки тому ⁺¹
Really enjoying this series, thanks for doing it, It is right around my level of understanding.
@PanMarhewa 4 роки тому
I just met your channel and I expect it to be most viewed by me for some time, good job
@Mrchungcc1 11 місяців тому
Thank you for sharing this video. I am still not clear about your dataset. class1_points and class2_points have a shape of (5000, 10, 30). That means you created a dataset of 5000 people, 10 credit cards, and 30 features of credit cards?
@helenjude3746 4 роки тому
Hi Nate! Could you explain what RepeatVector does? Did you use it to learn 10 different features by the same conv1d operation?
@DataTalks 4 роки тому ⁺¹
Great question!!! This has to do with how a convolution layer works. A convolution layer applied 1 learned function (think a mini linear regression) to X set of inputs (the inputs themselves can be different sizes). So in the example above I had a set of 10 credit card records, each with 20 features (think current balance, date opened, etc.). So my convolution was 1 learned function applied to each of those 20 features. The same function each time.
That function when it is applied has no knowledge of anything but the 20 features that it runs on. In fact, when running on the first credit card, it does not know about the second credit card (different from RNNs). So how can we provide a convolution with knowledge about things other than the 20 features itself? The natural way is to add to those 20 features! But those 20 features don't just happen once, they happen X times - in our case they happen 10 times! So I had to add on the same info to the original 20 features 10 times, thus I used a RepeatVector!
Great question - and I hope this clarifies some things for you!
@emifo6018 3 роки тому
Thank you!
@mahery_ranaivoson 3 роки тому
Hi Nate, initially you've created a dataset of 4 classes, how you ended up with 'binary_crossentropy' at the end?
@DataTalks 3 роки тому
Great question! In this synthetic dataset there are 4 types of credit cards: 4 classes. And 2 types of customers: will pay loan back and won't. The customers have different distributions of each of the cards!
@Andrea-cg6im 2 роки тому
I was just thinking about how to apply a Conv1D Layer to aggregate variable length (non-temporal) features and this video confirmed I'm not a total freak haha.
My concern is the following:
- The features I want to aggregate are multimodal (so different from your approach where you aggregate same features for different cards) and each modality has multiple features; also, not all modalities are available for all samples.
Would this approach still be feasible? e.g. I could use a kernel size which treats all modalities independently and I could pad with zeros for missing modalities. Great video!
@DataTalks 2 роки тому
I would certainly use a kernel that treats each modality differently - for almost all data you won't need to worry about the kernel size (you are generally saving so much model memory by using a CONV anyways!).
Do think hard about zero padding for your example. For example if your features naturally have 0 as an input - you'd be feeding seeming meaningful info into the model by zero padding.
@sanjaykrish8719 4 роки тому
Why can't a dense layer be used in this case? Is it just for reducing the number of parameters?
@DataTalks 4 роки тому
Exactly! I think there are two arguments against a dense layer: 1) it has many more features 2) it does not fit the problem aka it is the wrong inductive bias
With a dense layer you will have separate parameters for the first, second... and last credit card. But ultimately there is no inherent order to the cards. This makes the learning much harder and can give you odd results if the order of the cards is scrambled or lost later on.
Great intuition and keep up the deep learning :)
@Izzy-ve3xz Рік тому
lol this video could have been 1 sentence long based on the method you showed: Padding is how to Deal with Variable Length Features and Deep Learning

Наступне

Автоматичне відтворення

How to do Deep Learning with Ordered Variable Length Features