Dear Abhishek, I can't thank you enough to make time for those videos. A lot of newbies such as myself find a hard time finding such succinct and elaborate material, yet you've done it for us by making these videos. I could only hope that you keep more of these coming. The labor of love and with the level of your dexterity is something so many will find eternally resourceful. Thank you so much man.
Sir, I tried to modify your code into multi-class classification, but I got a `Shapes (None, 1) and (None, 8) are incompatible`. Apart from the Dense output layer should be y = layers.Dense(8,activation='sigmoid')(X) # I have eight classes and the loss function should be loss='categorical_crossentropy' # for the categorical classification Could you please give me a guide for where I should change the code? Thank you very much
Very Nice Video , Sir was trying to refer for Entity Embeddings for time series forecasting competition for retail domain ! this provided nice starting point.Thanks
very useful, thanks for sharing. Can you share the code on how we can add numerical features into the model as well. Also, Is there a reason why labelencoder was used instead of ordinalencoder ?
Thanks for taking time out to do this. Great video. I was a bit lost in the create model. Have you explained it in a prior video on the use of the keras layers?
Can you show us how to extract the entity embedding vectors of each class, if not a link to a code file. I want to extract them and use them in XGBoost.
Awesome video. Thank you for this clear explanation. I was hoping to know if with entity embedding can we still retain feature interaction importance of categorical variables?
Thank you! It would be appreciated if you could (or someone else) answer the following questions: Q1: When we deploy this model, what the embedding layer should output if it has never seen an example in the training process? Q2: Do we add the embedding layer to our Neural Network model that we build for "let's say" a classification problem? or we should train two different model separately, one for embedding learning and one for prediction (main problem)?
How to handle different values in test dataset in better way because we might not have access to test dataset beforehand. Can you elaborate little more about the rare value concept you mentioned in video?
Doesn't it infer data leakage when you merge and label encode the features of train and test together? The test set is there for submission and only that. In normal circumstances, you wouldn't have access to it but when using it for cat labeling you are peaking into the future. It is like having "sunny" and "rain" in training and have an additional "snow" in the test only. You can't possibly know about the 'snow' until after some time in the future when you see it for the 1st time in your data. But the model will know and expect it if you combine the two sets. Else great video, keep it up ;)
Depends. For kaggle (or where you know the test data) you can do this and it will be a form for semi supervised learning. For industrial purposes you can not :)
Great video, learnt something new from it. I had a question, I was applying the technique in another problem and I am stuck as I am not able to improve my model performance. Can you suggest how to improve the Neural Network performance?
Again you managed to make some intersant video, I was wondering if you might be interested in making video about stacking (to be more precise multi level stacking).
How do you know what size of the embeddings to choose to represent each categorical variable? Obviously the embedding size will be equal to or less than the number of unique categories for each variable, but it's not obvious whether to represent a variable that has 1000 unique categories with an embedding of size 128 or 64 or 256. How to choose? I suppose you could technically grid-search over every embedding size from 1 to n, but that seems inefficient and would take forever. Is there an analytical way to determine embedding size, or is it more of an empirical/heuristic approach? Thanks Abhishek.
Hello! Thanks for the great video. I would like to ask how can I avoid overfitting with categorical embeddings? I tried extracting the weights from the embedding layer to use them in a lightgbm but it's causing my score in kaggle to decrease even though my cross validation Score increases considerably. I am using spatial1ddropout and dropouts after the dense layers too. I also checked the distribution of the categorical variables in both train and test are very very similar.
Hi Abhishek, I was following the video. At model.fit got an error InvalidArgumentError Traceback (most recent call last) in 1 # X_train=[train.loc[:,f].values for f in features] ----> 2 model.fit([train.loc[:,f].values for f in features], train.target.values) Can you suggest how to overcome it
@@abhishekkrthakur Thanks because for text to speech, I understand there is tacotron 2 but for speech to text, I couldn't find anything for speech to text other than spectogram with 1d conv units which does have some problems with Indian English.
Hi Abhishek , Thank you for your video . I have an issue I hope you can help me , I have different observations in my test set , I got an error for that which is InvalidArgumentError: indices[3,0] = 18 is not in [0, 13) What can I do in a case that test set have different values from my train set Thank you in advance :)
Dear Abhishek, I can't thank you enough to make time for those videos. A lot of newbies such as myself find a hard time finding such succinct and elaborate material, yet you've done it for us by making these videos. I could only hope that you keep more of these coming.
The labor of love and with the level of your dexterity is something so many will find eternally resourceful.
Thank you so much man.
OMG, this is the first and the only video I understand. Thank you, sir, for the clear explanation!!
Sir, I tried to modify your code into multi-class classification, but I got a `Shapes (None, 1) and (None, 8) are incompatible`.
Apart from the Dense output layer should be
y = layers.Dense(8,activation='sigmoid')(X) # I have eight classes
and the loss function should be
loss='categorical_crossentropy' # for the categorical classification
Could you please give me a guide for where I should change the code? Thank you very much
Thanks a lot for explaining very clearly the idea behind embeddings with code!
Very Nice Video , Sir was trying to refer for Entity Embeddings for time series forecasting competition for retail domain ! this provided nice starting point.Thanks
I was going through the list of your videos. This is great content !
very useful, thanks for sharing.
Can you share the code on how we can add numerical features into the model as well.
Also, Is there a reason why labelencoder was used instead of ordinalencoder ?
really liked this notebook you made
I love your videos, keep it up!
This has been an amazing series. Top notch! :D
That's what I was waiting for! Thankfully my bell works fine xd
Thanks for taking time out to do this. Great video. I was a bit lost in the create model. Have you explained it in a prior video on the use of the keras layers?
Can you show us how to extract the entity embedding vectors of each class, if not a link to a code file. I want to extract them and use them in XGBoost.
feature_weights = [layer.get_weights()[0] for layer in model.layers if isinstance(layer, Embedding)]. Something along those lines.
Awesome video. Thank you for this clear explanation. I was hoping to know if with entity embedding can we still retain feature interaction importance of categorical variables?
Abhishek thanks a ton for your tutorials your videos are really unique ,and really like your beard🙂
Thanks for this great video 🙏
Glad you liked it!
Isn't it should be just num_unique_vals in place of num_unique_vals + 1 in embedding layer.? Why +1?
Thank you so much for sharing.
Thanks for watching!
Thank you for the video! Anyone can explain how to use this with LSTM models?
How can I do entity embeddings but when it is an unsupervised learning problem?
Thank you for this Video
Thank you!
It would be appreciated if you could (or someone else) answer the following questions:
Q1: When we deploy this model, what the embedding layer should output if it has never seen an example in the training process?
Q2: Do we add the embedding layer to our Neural Network model that we build for "let's say" a classification problem? or we should train two different model separately, one for embedding learning and one for prediction (main problem)?
How to handle different values in test dataset in better way because we might not have access to test dataset beforehand. Can you elaborate little more about the rare value concept you mentioned in video?
I was waiting for this thank you
Hope you enjoyed it!
Doesn't it infer data leakage when you merge and label encode the features of train and test together? The test set is there for submission and only that. In normal circumstances, you wouldn't have access to it but when using it for cat labeling you are peaking into the future.
It is like having "sunny" and "rain" in training and have an additional "snow" in the test only. You can't possibly know about the 'snow' until after some time in the future when you see it for the 1st time in your data. But the model will know and expect it if you combine the two sets.
Else great video, keep it up ;)
Depends. For kaggle (or where you know the test data) you can do this and it will be a form for semi supervised learning. For industrial purposes you can not :)
Great video, learnt something new from it. I had a question, I was applying the technique in another problem and I am stuck as I am not able to improve my model performance. Can you suggest how to improve the Neural Network performance?
Again you managed to make some intersant video, I was wondering if you might be interested in making video about stacking (to be more precise multi level stacking).
yes. its on my list.
How do you know what size of the embeddings to choose to represent each categorical variable?
Obviously the embedding size will be equal to or less than the number of unique categories for each variable, but it's not obvious whether to represent a variable that has 1000 unique categories with an embedding of size 128 or 64 or 256. How to choose? I suppose you could technically grid-search over every embedding size from 1 to n, but that seems inefficient and would take forever.
Is there an analytical way to determine embedding size, or is it more of an empirical/heuristic approach? Thanks Abhishek.
I've got a syntax error (invalid syntax) with inp and out for the get_model. Do you know where it comes from ?
Hello! Thanks for the great video. I would like to ask how can I avoid overfitting with categorical embeddings? I tried extracting the weights from the embedding layer to use them in a lightgbm but it's causing my score in kaggle to decrease even though my cross validation Score increases considerably. I am using spatial1ddropout and dropouts after the dense layers too. I also checked the distribution of the categorical variables in both train and test are very very similar.
Hi Abhishek, I've read your book and you mentioned that there's a min size for the emb_dim depending on the unique values. Is that true?
hi Abhishek can you make one video on custom NER using BERT?
By the way, which module do you use most often for kaggle competitions (keras of pytorch)?
both :) now-a-days, mostly pytorch
Hi Abhishek, I was following the video. At model.fit got an error
InvalidArgumentError Traceback (most recent call last)
in
1 # X_train=[train.loc[:,f].values for f in features]
----> 2 model.fit([train.loc[:,f].values for f in features], train.target.values)
Can you suggest how to overcome it
Please please please make videos on audio data like wavenet or speech to text.
Hopefully soon :)
@@abhishekkrthakur Thanks because for text to speech, I understand there is tacotron 2 but for speech to text, I couldn't find anything for speech to text other than spectogram with 1d conv units which does have some problems with Indian English.
Hi Abhishek , Thank you for your video .
I have an issue I hope you can help me , I have different observations in my test set , I got an error for that which is
InvalidArgumentError: indices[3,0] = 18 is not in [0, 13)
What can I do in a case that test set have different values from my train set
Thank you in advance :)
What is difference bw binary n one hot encoding?
Ive explained it here: ua-cam.com/video/vkXEHpuu03A/v-deo.html
It is tough to reach level of understanding your vedioes.
SOLID
😍😍😍😍
who is watching this for a kaggle competition ?