Multi-Class Language Classification With BERT in TensorFlow

Поділитися
Вставка
  • Опубліковано 11 жов 2024

КОМЕНТАРІ • 75

  • @MaryamYassi
    @MaryamYassi Рік тому +2

    I wanted to express my sincere appreciation for your videos on UA-cam. They have been immensely helpful to me in my Ph.D. thesis, particularly in understanding how to pre-train using MLM and fine-tune the BERT model.
    I thoroughly enjoy watching your videos, and they have provided valuable insights and guidance for my research. Thank you for creating such informative and engaging content.

  • @aditya_01
    @aditya_01 2 роки тому +2

    The best video regarding how to use bert in tensorflow ,thank u

  • @tildo64
    @tildo64 7 місяців тому

    I don't comment on videos, but your video is so clear and easy to understand I had to just say thank you! I have been trying to solve a multi class problem with an LLM for months without significant progress. Using your video, I was able to make more progress by training a BERT model in a few days than I have in months! Please keep posting. It's immensely helpful for the rest of us.

  • @kennethnavarro3496
    @kennethnavarro3496 2 роки тому +1

    Thank you so much for this tutorial. Most tutorials really piss me off because they always refer back to other videos they made regarding why things work but you explained each step as you did it and this is super good for someone with a temperant like mine. Appreciate it, you're a beast!

    • @jamesbriggs
      @jamesbriggs  2 роки тому

      haha thanks Kenneth, I try to assume we're starting at the start for every video :)

  • @achrafoukouhou1016
    @achrafoukouhou1016 3 роки тому +3

    This video is excellent sir, I was looking for video like that in 2 straight days.

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      That's awesome to hear, happy you found it, thanks!

  • @meredithhurston
    @meredithhurston 2 роки тому +2

    Thanks so much, James. On my 1st attempt I was able to get to ~51% accuracy. I will need to make some tweaks, but I'm so excited about this! Woohoo!

  • @krishnanvs5946
    @krishnanvs5946 3 роки тому

    Very crisp and nicely structured, with the objective of the exercise stated right at the start

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      thanks, useful to know stating the objective helps!

  • @anityagangurde5329
    @anityagangurde5329 2 роки тому

    Thank you so much!! I was really stuck with the prediction part for a very long time. This will help me a lot.

  • @chrisp.784
    @chrisp.784 2 роки тому

    Thank you so much sir! Best video ever seen on UA-cam, clearly explain each steps.

  • @alexaskills3447
    @alexaskills3447 2 роки тому +1

    This was great! One question what if you wanted to use additional features besides the Bert embeddings in the training data set. What would be the best approach? Do some type of model stacking where you take the output of the sentiment model and use that combined with other features as input to another model? Or is the a better way to merge/concatenate the additional features onto the BERT word vector training data?

  • @asimsultan8191
    @asimsultan8191 3 роки тому +1

    Thank you for such an amazing collection:) Just 1 question, While loading the model, I get this error: ValueError: Cannot assign to variable bert/embeddings/token_type_embeddings/embeddings:0 due to variable shape (2, 768) and value shape (512, 768) are incompatible.
    Can you let me know why is that so? Thank you so much in advance.

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      Hey Asim, I would double check that you are tokenizing everything correctly, the 512 that you see is the standard number of tokens consumed by BERT, which we set when encoding our text with the tokenizer :)

    • @asimsultan8191
      @asimsultan8191 3 роки тому

      @@jamesbriggs I got it and solved the problem. Thank you so much :)

  • @simonlindgren
    @simonlindgren 2 роки тому +1

    This is a fantastic tutorial! Excellent stuff, even for non experts. I wonder how one would go about should one want to add (domain specific) tokens to the BERT tokenizer, before training. Where in the workflow can that be done?

    • @jamesbriggs
      @jamesbriggs  2 роки тому +1

      Hi Simon, there are two approaches, you train from scratch (obviously this takes some time) OR you can add tokens, I want to cover this soon but here's an example github.com/huggingface/transformers/issues/1413#issuecomment-538083512

    • @simonlindgren
      @simonlindgren 2 роки тому

      @@jamesbriggs Great! So add tokens to tokenizer before training on the labeled data, right?

  • @luiscao7241
    @luiscao7241 3 роки тому

    Hi James Briggs, I found that following the way of dividing validation/train data, validation and train sets vary all the time. When I save the trained model and load it to evaluate for validation data again, I got different results for each run time. Should I divide train/and validation data from beginning and do not need to use SPLIT = 0.9 or others? does it compromise the accuracy of the trained model? Thanks

  • @luiscao7241
    @luiscao7241 3 роки тому

    Great tutorial! Thanks

  • @serhatkalkan2339
    @serhatkalkan2339 2 роки тому

    Great! Tutorial! I wonder if the seq_length has to be that long if we work with short phrases ?

  • @plashless3406
    @plashless3406 Рік тому

    This is awesome.

  • @manuadd192
    @manuadd192 Рік тому

    Hey Great Video, just got a question in my data set some texts have multiple lables. Can i just set multiple lables to 1 in the labels[] array at 13:47?

  • @dhivyasubburaman8828
    @dhivyasubburaman8828 3 роки тому

    Really good tutorial! Thank you so much an awsome teacher.....made the model understanding easy and simple,is there any similar tutorial for bertformultilabelsequenceclassification .....or the same code can be used for mulilabel classification

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Thanks! You should be able to use the same code, just change the output layer dimensions to align with your new number of output labels :)

  • @meylyssa3666
    @meylyssa3666 3 роки тому

    Great tutorial, like always, thanks!

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Thanks I appreciate these comments a lot! :)

  • @adityanjsg99
    @adityanjsg99 3 роки тому

    This video helped thanks. Usage of BERT does need a GPU subscription though.

  • @faressayah9897
    @faressayah9897 3 роки тому +1

    Amazing tutorial 👏👏👏.
    If you are going to use your model on another machine it's better to h5 format.
    # Saving the model
    model = model.save("your_model.h5")
    # Loading the model in another machine
    import tensorflow as tf
    import transformers
    model = tf.keras.models.load_model('your_model.h5',
    custom_objects={'CustomMetric':transformers.TFBertMainLayer})

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      hey Fares, thanks and appreciate the info - I assume you recommend so due to us then only having a single file to transfer - rather than several?

    • @faressayah9897
      @faressayah9897 3 роки тому +1

      @@jamesbriggs
      I am working on a hate speech detection project, I trained the model on kaggle and after saving it, it worked in the same notebook but in my local machine it didn't. saving directly need to save the configuration also.
      I didn't find how to so, so I save the model to h5 format.

  • @salmanshaikh4866
    @salmanshaikh4866 2 роки тому

    Hi there, I am trying to generate a confusion matrix, but due to the dataset being shuffled I'm not able to, and it's giving me random values. Any ideas what to do? (The accuracy and loss is pretty good whilst training the model)

  • @maxhuttmann4760
    @maxhuttmann4760 2 роки тому

    James thank you! I had stuck before with extracting Bert embedding for tf layer as now almost everyone shows this part with use of other libraries like tensor flow hub, text etc. and I cannot use them in my project due to limitations
    Will try your algorithm. Thanks a lot

  • @Moxgusa
    @Moxgusa 3 роки тому

    Hi James, first of all good tutoriel !
    I tried implementing the same architecture with a different dataset but the model training time is insane it's +50h do you have any clue of the reason it takes so much time ?
    thank you !

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      it can be a long time, it will depend on the hardware setup you have, I'm using a 3090 GPU so it is reasonably fast, I would double check that you are using GPU (if you have a compatible GPU). If you search something like 'tensorflow GPU setup' you should find some good explanations - hope that helps!

  • @panophobia8527
    @panophobia8527 2 роки тому

    After training I get around 60% accuracy. When I try to predict I never get the model to predict Sentiment 0 or 4. Do you have any idea why the model has problems with these?

  • @agahyucel4502
    @agahyucel4502 2 роки тому

    hi, first of all thank you for this nice video. How can we make a confisition matrix and classification report here?

  • @henkhbit5748
    @henkhbit5748 3 роки тому

    Nice example! Could u also use the same technique if you want to classify text in more than 5 categories, for example 10 or 20? And each class is not perfectly balanced and it is NOT an englist text? 😉

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      haha yes you could, you have different language BERT models that are pretrained - if there was not the language you wanted, we'd want to train from scratch on the new language (mentioned in the last comment) - as for training with more categories, yes we could do that using the same code we use here, we just switch our training data to the new 10-20 class data, and update classifier layer output size to match :)

  • @gloriaabuka5644
    @gloriaabuka5644 2 роки тому

    Thank you for this very explanatory video. I tried following along with another dataset but each time I try to one-hot-encode my labels with these 3 lines of code
    arr = df['rating'].values
    labels = np.zeros((num_samples, arr.max()))#(my label values are from 1-10)
    labels[np.arange(num_samples), arr] = 1
    numpy.float64' object cannot be interpreted as an integer".

  • @MdSaeemHossainShanto
    @MdSaeemHossainShanto Рік тому

    at 42:00
    on cell 9, it returns an array of what? What those numbers mean ?

  • @datascientist7802
    @datascientist7802 2 роки тому +1

    HI Sir, great explanation, and I followed to implement the same, but I got this error when training the model :
    InvalidArgumentError: Data type mismatch at component 0: expected double but got int32.
    [[node IteratorGetNext (defined at :1) ]] [Op:__inference_train_function_20701]

    • @jamesbriggs
      @jamesbriggs  2 роки тому

      seems like one of the datatypes for (probably) your inputs is wrong, you will need to add something like dtype=float32 to your input layer definitions
      OR it may be that your data must be converted to float first before being processed by the model

    • @abhishekchack8065
      @abhishekchack8065 2 роки тому

      Xids= np.float64(Xids)
      Xmask=a= np.float64(Xmask)
      dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels))
      before creating pipeline just convert Xids and Xmask to float64

  • @Mrwheelsful
    @Mrwheelsful 3 роки тому

    Hi James, at the very end when you predicted your new sentiment data with your model you assigned it to.
    probs = model.predict(test)
    I would like to know how to export that data you predicted into CSV format so that one can submit it on Kaggle.
    test['sentiment'] = model.predict(test['phrase'])
    submission = test[['tweetid', 'sentiment']]
    submission.to_csv('bertmodel.csv',index=False)
    Is this the correct way of going about it :) because I want it in sentiment values when exported.

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      I think you might need to perform a np.argmax() operation on the model.predict output, to convert from output logits to predicted labels, but otherwise it looks good :)

  • @harveenchadha
    @harveenchadha 3 роки тому +1

    Excellent! Where can I fnd the code used in the video?

    • @jamesbriggs
      @jamesbriggs  3 роки тому +2

      Code is split between a few different notebooks on Github - they're all in this repo folder: github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model - hope it helps :)

    • @harveenchadha
      @harveenchadha 3 роки тому

      @@jamesbriggs Thanks. That surely helps! Keep up the good work James, I see you are working on a Transformers course. Will be looking forward to it!

  • @marwamiimi1935
    @marwamiimi1935 2 роки тому

    Hello, thank you for this great video
    I follow the steps but i have error
    Can you help me please ?

  • @minhajulislamchowdhury1101
    @minhajulislamchowdhury1101 2 роки тому

    how can i find confusion matrix for this kind of dataset?

  • @gloriaabuka9129
    @gloriaabuka9129 2 роки тому

    Thank you for this great video. I tried following along with another dataset but each time I try to one-hot-encode my labels I keep getting an error that says " numpy.float64 object cannot be interpreted as an interger". Any idea how to fix this? Thank you.

    • @abAbhi105
      @abAbhi105 2 роки тому

      same here did you find any solution ?

    • @gloriaabuka9129
      @gloriaabuka9129 2 роки тому

      @@abAbhi105 Yes, I did. I casted my array elements to integer.
      arr = arr.astype(int)
      Labels[np.arange(num_samples), arr-1] = 1.

  • @gokulgupta1021
    @gokulgupta1021 3 роки тому

    Nice informative video. It would be nice if you can help me to know how can I change this to pytorch
    # create the dataset object
    dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels))
    def map_func(input_ids, masks, labels):
    # we convert our three-item tuple into a two-item tuple where the input item is a dictionary
    return {'input_ids': input_ids, 'attention_mask': masks}, labels
    # then we use the dataset map method to apply this transformation
    dataset = dataset.map(map_func)

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      I'm not using PyTorch for sentiment analysis in this example, instead for masked language modeling, but the dataset build logic is very similar, this video at ~14:57:
      ua-cam.com/video/R6hcxMMOrPE/v-deo.html

  • @lasimanazrin6212
    @lasimanazrin6212 2 роки тому

    Getting this error: Unknown layer: Custom>TFBertMainLayer. Please ensure this object is passed to the `custom_objects`
    Anybody have any idea?

  • @soysasu
    @soysasu 3 роки тому

    Hi sir, I'm trying step by step at Google Colab but it running out of RAM. They give me 12.69GB; in the most cases that happens due code problems. Any idea? thank you!

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Google Colab can be difficult with the amount of memory you're given, transformers use *a lot* - one thing that can help is loading your data in batches (so you're not storing it all in memory), one of my recent videos covers this, it might help: ua-cam.com/video/r-zQQ16wTCA/v-deo.html

    • @soysasu
      @soysasu 3 роки тому

      @@jamesbriggs Okay, I'll see it. Thank you!

  • @amitjaiswar8593
    @amitjaiswar8593 3 роки тому

    Its an implementation or fine-tuning?
    #model.layers[2].trainable = False

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      hey Amit, this sets the internal BERT layers to not train, but still allows us to train the classifier layers (which are layers 3, 4, etc), we can actually train the BERT layer too by removing that line, but training time will be much longer

  • @digvijayyadav4168
    @digvijayyadav4168 2 роки тому

    Hi there, please can you share the notebook?

    • @jamesbriggs
      @jamesbriggs  2 роки тому

      Hey it's not necessarily exactly the same, but you will find very similar code here github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model

  • @faisalq4092
    @faisalq4092 Рік тому

    I want something from scratch

  • @vidopulos
    @vidopulos 3 роки тому +1

    Hi. Excellent tutorial! I have a problem. When I'm trying to replicate your code and in the part when I'm using tokenizer.encode_plus() and i get ValueError: could not broadcast input array from shape (15) into shape (512)Thanks. It says that the error is here - Xids[i, :] = tokens['input_ids']

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Does it work if you write Xids[:, i] = tokens['input_ids']? Otherwise, double-check the Xids dimensionality with Xids.shape and make sure it lines up to what we would expect (eg num_samples and 512)

    • @francesniu
      @francesniu 3 роки тому

      I had the same issue, and I solved it by putting pad_to_max_length = True instead of padding = 'max_length'.

  • @madhavimourya1157
    @madhavimourya1157 3 роки тому

    HI James, great explanation and I followed to implement the same , but I got this error :
    InvalidArgumentError: indices[2,2] = 29200 is not in [0, 28996)
    [[node model/bert/embeddings/Gather (defined at /usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_tf_bert.py:188) ]] [Op:__inference_train_function_488497]
    I know it's related to embedding token id. can u help me how can I resolve this ?

    • @madhavimourya1157
      @madhavimourya1157 3 роки тому

      Luckily, I got the solution :)

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      @@madhavimourya1157 Oh good to hear, was it in your dataset definition?