Multi-class Text Classification using Tensorflow - Imbalanced dataset

AIEngineering

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 11 жов 2024
#datascience #textclassification #nlp
Link to video where I show end to end multi class text classification using traditional algorithm - • End to End Text Classi...
In this video we will create an end to end NLP pipeline starting from cleaning text data, setting NLP pipeline, model selection, model evaluation, handling imbalanced dataset among others
In next set of videos we will use complex models to see how we can improve the performance of this model

КОМЕНТАРІ • 74

@AIEngineeringLife 4 роки тому ⁺⁶
If you want to add a LSTM layer instead of regular ANN you can use this as model and change the layer size of info
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Reshape( target_shape=(128 , 1 ) ))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=True)))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(16)))
for units in [128, 64 ]:
model.add(tf.keras.layers.Dense(units, activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Dense(6, activation='softmax'))
model.summary()
@majdoubwided6666 4 роки тому
Thanks again and again
@majdoubwided6666 4 роки тому
Please how can we improve the model accuracy ? i had 0.4 !!
@AIEngineeringLife 4 роки тому ⁺²
If you are using Neural Network one reason you might get low accuracy is due to low volume of input data. If your dataset size is not huge better to stay with normal ML technique as shown in my previous video. Mostly to increase accuracy that is so low you might have to collect more data or just keep layers to minimum and see if it helps
@majdoubwided6666 4 роки тому
@@AIEngineeringLife Yes thats it, thank you
@sandeepgupta2 4 роки тому
@@AIEngineeringLife Can we try Language Models when we have less amount of data ?
@ijeffking 4 роки тому ⁺³
This is HUGE.....I am indebted to you in terms of gratitude. Appreciate your efforts !
@faraza5161 3 роки тому ⁺²
Awesome tutorial!! Really appreciate the hard work you put into making this tutorial
@ajithshenoy5566 4 роки тому ⁺²
Thanks a ton . Please conclude this series by industrial deployment demo.
@AIEngineeringLife 4 роки тому ⁺¹
Ajith.. I already have a tensorflow NLP classifier deployment deploy along with scaling it using k8s - ua-cam.com/play/PL3N9eeOlCrP4VXtFJTjmGsqI-Emk2keVL.html
Are you looking for anything in specific?
@ajithshenoy5566 4 роки тому
No no exactly what I was looking for. Thanks
@rohansohani5570 2 роки тому
Hi, Thanks for all your contents and knowledge sharing. very informative.
@anweshapal8339 3 роки тому ⁺¹
Very nice and detailed video
@vijaykarthikeyan5794 3 роки тому ⁺¹
Such a nice video. Very well explained bro!!
@madhavimehta6010 3 роки тому
Indeed the right one for me to work . Thanks a ton for making such vedios
@bigbena23 3 роки тому ⁺¹
Hi - great video.
Any chance you can share the Jupyter notebook file?
Also - assuming I'm having a (relatively) small dataset (~17000 samples) can I still use this aproach?
@sharatchandra2045 3 роки тому ⁺¹
Excellent tutorial
@ashwinideshmukh2513 Рік тому
sir your video has been great for me. sir I have one question now that model is trained i want to pass a real data and check for output please can you tell me the code for it. As we have taken tarain_data_f and not just x_train how to write code for model checking. sir plz waiting for a reply.
@utkar1 4 роки тому ⁺¹
Thank you so much for this!
@adwaitanand1470 3 роки тому ⁺¹
Very nice video just a doubt... what should be the code if we want to test a single complaint and get its product/target.....like a function where we pass a single complaint string and it returns the product/label
@AIEngineeringLife 3 роки тому ⁺¹
Adwait you can pass single complaint as well here but in case if you are looking for exposing it as Flask then you can load the model and take single instance of prediction as well. I have done it for another model for same data but not this one - ua-cam.com/video/-F0CRcaNeao/v-deo.html
@adwaitanand1470 3 роки тому
@@AIEngineeringLife Thanks but before deploying if I want to test it what should be the code because when I am passing single string in model.predict() it's giving error saying list index out of range
@utku_yucel 4 роки тому ⁺²
Thanks!
@ariouathanane 3 роки тому ⁺¹
Hello, thank you very much for your video. But i have a question if you can help me. how to do with multiple text features and one output label please?
@AIEngineeringLife 3 роки тому ⁺¹
You can give both text features as well by creating a embedding of the other feature
@srikanth1107262 4 роки тому ⁺¹
Thanks a lot. Great Explanation. Could you share the notebook and dataset link
@AIEngineeringLife 4 роки тому ⁺³
Here you go - github.com/srivatsan88/Natural-Language-Processing/blob/master/Text_classification_Tensorflow_Multiclass.ipynb
dataset link is within notebook
@srikanth1107262 4 роки тому ⁺¹
Thank you for sharing. Looking for small example of end to end process of a ml project ( including deployment). That would help me alot. If you have already have t, pls share the link.
@AIEngineeringLife 4 роки тому ⁺¹
@@srikanth1107262 I have multiple this is one of it where I cover end to end
ua-cam.com/play/PL3N9eeOlCrP4VXtFJTjmGsqI-Emk2keVL.html
@valerysalov8208 4 роки тому ⁺¹
why have you not removed XXXXX values which are there? How to deal with such values? In which cases to use upsampling and in which cases use down sampling? When to use synthetic points?
@AIEngineeringLife 4 роки тому
You can tokenize outside like I did in my previous video to clean the text and then use that to create embedding before feeding into TF model. My previous videos here - ua-cam.com/video/EHt_x8r1exU/v-deo.html
Upsampling I am not that sure as in high dimension space it might add noise as well to data. Downsampling is good when your data is highly skewed like say 99% to 1%
@madhavimehta6010 3 роки тому
Can u make vedios for keyword extraction which makes much meaningful 3 grams keyphrases?
@Dirkster___ 3 роки тому
Can I also pass in a string tensor:
object when using a GloVe embedding layer or is the ability to consume strings directly (without tokenizer) a special feature of the hub embedding layer?
Is there a reference documentation for this direct way of consuming text without having to tokenize it first?
@AIEngineeringLife 3 роки тому
It is a special feature of hub models. For others you might have to encode it before passing
@devarshraval6668 3 роки тому
Where do i find syntax to convert an imported tf dataset from slices into a list of text to tokenize it? In your From Scratch video, you tokenized a tfds in built dataset but how to tokenize custom imported dataset like in this video? I could not find it in the docs.
@AIEngineeringLife 3 роки тому
You can still do the same as finally both tfds and tf dataset data type is same. Check this function that can help you with it - www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
@rishabhpatel7588 3 роки тому ⁺¹
class_weight error
`class_weight` not supported for 3+ dimensional targets.
When you are running :
history = model.fit(train_data_f,
epochs=4,
validation_data=test_data_f,
verbose=1,
class_weight = weights)
@greysonnewton6284 2 роки тому
I get the same error, but when I rerun the previous 5 blocks the error disappears. Need to implement sample weights to correctly fix this issue though
@ahanchatterjee8311 3 роки тому
@
AIEngineering
getting this error while running show batch: TypeError: Signature mismatch. Keys must be dtype , got .
Can you pls solve this.
@iqrayousaf260 2 роки тому
Can you share the code how use CNN instead of regular NN , Like you LSTM model code.
@dipinpaul5894 4 роки тому ⁺¹
Excellent!
Can you share the guthub code path?
@AIEngineeringLife 4 роки тому
Here you go - github.com/srivatsan88/Natural-Language-Processing/blob/master/Text_classification_Tensorflow_Multiclass.ipynb
@Reddie23 2 роки тому
I'm missing how to tokenize and embedding since I was doing it in a spanish dataset and I can't use the google reviews dataset...
@ahmedfahmyaee 2 роки тому
I need input_shape to pass to build() function to build this model when using LSTM model ..... what is input_shape?
@hariprasad1744 4 роки тому
Hi sir..Can you please give us insights and models that can use how transactions were categorized in real time, with an example..To be more precise, if I have transaction description and category code in the train data..
@AIEngineeringLife 4 роки тому
Sorry Hari.. Did not follow the question completely. Are you asking about real time inference of this model?. If so I have similar videos where i have shown in this playlist - ua-cam.com/play/PL3N9eeOlCrP5PlN1jwOB3jVZE6nYTVswk.html
@hariprasad1744 4 роки тому
AIEngineering I am not referring to this video/lecture actually...Asking you, in general how can we do transaction categorisation will be done in ML/DL?
@JiminPark-ld2xx 2 роки тому
When I type this,
class_weights = list(class_weight.compute_class_weight('balanced', np.unique(df['Priority']),df['Priority']))
It gives me this error,
TypeError Traceback (most recent call last)
in ()
----> 1 class_weights = list(class_weight.compute_class_weight('balanced', np.unique(df['Priority']),df['Priority']))
TypeError: compute_class_weight() takes 1 positional argument but 3 were given
Why it is?
@abhishekthombre9318 3 роки тому
hi i am getting the following error ValueError: `class_weight` not supported for 3+ dimensional targets. for passing class weights...any help on this
@AIEngineeringLife 3 роки тому
Somewhere dimensions of you input data seem to be problematic. Here is my code have you checked it to verify - github.com/srivatsan88/Natural-Language-Processing/blob/master/Text_classification_Tensorflow_Multiclass.ipynb
@reenaramachandralokare3123 2 роки тому
For same model I am getting 99% accuracy. Is it overfitting? Please answer
@sujankumar215 2 роки тому
Please let me know where can I find this Code file sir
@edddw1601 3 роки тому
How to handle a highly imbalanced multi-class (64 classes) image dataset in which some classes don't occur and in which some classes only occur 1 time?
@AIEngineeringLife 3 роки тому
Eddy.. Very Low occurrence classes are difficult to handle in multi class classification. Ideally it is good to collect more data. Being text data here upsampling and all might not be efficient
@stimes2210 Рік тому
Can I get link to this Colab file ?
@shubhamsingh8578 3 роки тому
can we leverage LSTM model for binary classification?
@AIEngineeringLife 3 роки тому ⁺¹
Yes just change the model loss function
@shubhamsingh8578 3 роки тому
@@AIEngineeringLife is there any social media handle where I can reach you directly
@AIEngineeringLife 3 роки тому
@@shubhamsingh8578 I am on LinkedIn.Check my channel home page for my linkedin profile link
@prakashkafle454 3 роки тому
Bert implementation and fine-tuning how to do
@cx4917 4 роки тому ⁺¹
could you add on why do we add this? @tf.function
@AIEngineeringLife 4 роки тому
basically all tf functions complies into a tensorflow graph when we call model save. If it is regular python functions we have to re-import it again separately apart from the model itself
@cx4917 3 роки тому
@@AIEngineeringLife Thanks. Yes I think i got an idea.
@girrajjangid4681 2 роки тому
Can you please provide notebook link
@debojitmandal8670 Рік тому
Aur y havent you preprocessed the data
@naveenmami7438 2 роки тому
can u share the code sir
@malikrumi1206 3 роки тому ⁺¹
Why are you talking so fast? Breathe!
@AIEngineeringLife 3 роки тому
Sure Malik.. I try to but once I get started kind of gets too fast. Will try to slow it down in future. 0.9x speed has benefited others who have watched my old videos
@malikrumi1206 3 роки тому
@@AIEngineeringLife Well, 0.9 speed doesn't help *you*. We can't learn anything if you pass out! ;-)

Наступне

Автоматичне відтворення

Object Detection Model on Custom Dataset