Today Unsupervised Sentence Transformers, Tomorrow Skynet (how TSDAE works)

Поділитися
Вставка
  • Опубліковано 1 лют 2025

КОМЕНТАРІ •

  • @charmz973
    @charmz973 3 роки тому +1

    This is a masterpiece given my research hindrances in unsupervised training. Thanks bro

    • @jamesbriggs
      @jamesbriggs  3 роки тому +2

      Haha I'm glad to hear it helps, definitely going to do more on unsupervised methods in the future

  • @liolaeezan5927
    @liolaeezan5927 3 роки тому +2

    Thanks for the consistency and clear explanations!

  • @guanglilu2182
    @guanglilu2182 2 роки тому

    Super clear as always! Thank you so much!

  • @ismailashraq9697
    @ismailashraq9697 3 роки тому

    Wow. Great explanation, going to attempt this soon. Thanks for the video😊

    • @jamesbriggs
      @jamesbriggs  3 роки тому

      Happy to see your comment, I hope you get it working!

  • @shanborwillwarjri
    @shanborwillwarjri Рік тому

    Thank you James..wonderfully explained. Can we use this on a new language if we do have a pretrained bert based language model for it? Thanks in advance

  • @malikrumi1206
    @malikrumi1206 2 роки тому

    Around 16:48, looking at table 5, there is less than 6 points between the top and bottom. Is this difference actually meaningful or significant? Would a human domain expert, handed a sample result, be able to notice these differences?

  • @ax5344
    @ax5344 2 роки тому

    @19:31 the training data is from "oscar"-- "train"; @37:32 the evaluation set is "stsb" --"validation". Did you say you chose "stsb"--"validation" because the model is trained on the "train" split? How so? Are the datasets "oscar" and "stsb" the same dataset?

    • @jamesbriggs
      @jamesbriggs  2 роки тому +1

      It's just a precaution that I tend to try and avoid testing on training sets *just in case* there is any overlap between two datasets, as in some cases there are datasets that have been built by merging several other datasets - in this case there's no overlap (afaik)

  • @d_b_
    @d_b_ Рік тому

    Have any other SOTA unsupervised fine tuning techniques overtaken this in the past year? Is this the best choice for creating sentence embeddings on custom documents with rare words?

  • @elahehsalehirizi618
    @elahehsalehirizi618 Рік тому

    Hi i have a question please. Could we use fined tuned model on source dataset ( which is already trained on target data )as the pretrained model for SetFit?

  • @brianhance4337
    @brianhance4337 3 роки тому

    Thanks for the vid! Do you think applying TSDAE on the raw transformer models of the more advanced sbert models will yield even better results compared to the advanced sbert models without tsdae training on their base model?

  • @DJNed12
    @DJNed12 3 роки тому

    Brilliant video, thank you! I know it's possible to "continue" training an existing model from the sentence-transformers library on a different dataset, and the docs seem to suggest this continuation of training can take effectively any training objective/loss function even if it differs from what was used in initial pre-training or fine-tuning. Is it possible to use this fact to use TSDAE or any other unsupervised technique to fine-tune an existing pre-trained sentence-transformers model, rather than just a transformers model as you did in the video? If so I'm wondering if that has the potential to produce even better results on a particular dataset.

    • @jamesbriggs
      @jamesbriggs  3 роки тому +1

      It's not something I've tried but one of the primary use cases for TSDAE is 'domain adaption', which I understood as taking an existing sentence transformer and fine-tuning with TSDAE so that it can be applied to another domain.
      If the model has already been trained on that domain though, I'm not sure TSDAE can be used to improve performance further, I haven't seen anyone trying this however

  • @nicolaithomsen7005
    @nicolaithomsen7005 2 роки тому

    Hi James, thank you so much for this awesome guide. I'm trying to use this technique - almost 1:1 - with 100K+ new training instances. However, this results in the following error in training
    - RuntimeError: CUDA error: device-side assert triggered
    Which, when explored a bit, seems to relate to
    - RuntimeError: CUDA out of memory. Tried to allocate 512 MiB (GPU 0; 2.00 GiB total capacity; 584.97 MiB already allocated; 13.81 MiB free; 590.00 MiB reserved in total by PyTorch) - With the actual specific numbers being specific to my setup of course.
    Is this process just very memory intensive? And is there any way to get around this?
    Thanks again!

    • @jamesbriggs
      @jamesbriggs  2 роки тому +1

      It's relatively memory intensive, a GPU with 2GB capacity won't be enough to run this on any models I know of, I think you need something like 10-15GB for BERT-base, you can reduce training batch size to reduce memory needed, but even at a single item you will need a larger GPU unfortunately - I think there is a paid version of Google Colab that is large enough
      I hope you manage to find something!