Language Learning with BERT - TensorFlow and Deep Learning Singapore

Поділитися
Вставка
  • Опубліковано 22 сер 2024

КОМЕНТАРІ • 19

  • @rohitdhankar360
    @rohitdhankar360 5 років тому +3

    @10:05 - Excellent explanation of Byte-Pair Encodings , thanks.

  • @daewonyoon
    @daewonyoon 5 років тому +5

    Thank you. This summary/introduction is very very helpful.

  • @autripat
    @autripat 5 років тому +8

    The presenter says, these models "do not use RNNs" (correct), instead "they use CNNs" (incorrect, no use of convolution kernels). They use simple linear transformations of the type XW(transpose) + b

    • @jaspreetsahota1840
      @jaspreetsahota1840 5 років тому +2

      You can model convolution operations with transformers.

    • @mouduge
      @mouduge 4 роки тому +8

      IMHO, that's debatable. Indeed, think of what happens when you apply the same dense layer to each input in a sequence? Well you're effectively running a 1D convolutional layer with kernel size 1. If you're familiar with Keras, try building a model with:
      TimeDistributed(Dense(10, activation="relu"))
      then replace it with this:
      Conv1D(10, kernel_size=1, activation="relu")
      You'll see that it gives precisely the same result (assuming you use the same random seeds).
      Since the Transformer architecture applies the same dense layers across all time steps, you can think of the whole architecture as a stack of 1D-Convolutional layers with kernel size 1 (then of course there's the important Multihead attention part, which is a different beast altogether).
      Granted, it's not the most typical CNN architecture, which usually use fairly few convolutional layers with kernel size 1, but still, it's not really an error to say the Transformer is based on convolutions. I think Martin's goal was mostly to highlight the fact that, contrary to RNNs, every time step gets processed in parallel.
      Just my $.02! :))

  • @archywillhe1379
    @archywillhe1379 4 роки тому

    wow engineers sg sure haz come a long way ha! great talk

  • @chirpieful
    @chirpieful 5 років тому +2

    Very good updates for nlp enthusiasts

  • @MegaBlizzardman
    @MegaBlizzardman 4 роки тому +1

    Very clear and helpful talk

  • @zingg7203
    @zingg7203 4 роки тому

    BERT uses wordpiece. Albert uses sentence piece

  • @hiyassat
    @hiyassat 5 років тому +3

    Can we have link to slides please

  • @prakashsharma-uv4pj
    @prakashsharma-uv4pj 5 років тому +1

    Very Informative.

  • @monart4210
    @monart4210 4 роки тому

    Could I extract word embeddings from BERT and use them for unsupervised learning, e.g. topic modeling? :)

  • @revolutionarybitnche
    @revolutionarybitnche 5 років тому

    thank you!

  • @mkpandey4909
    @mkpandey4909 4 роки тому +1

    Where to get this PPT; Please share the link

  • @janekou2482
    @janekou2482 5 років тому

    Does bpe also works well for non english languages like chinese and french?

  • @xiaochengjin6478
    @xiaochengjin6478 5 років тому

    very nice speech!

  • @zingg7203
    @zingg7203 4 роки тому +1

    How is it CNN based?

  • @ishishir
    @ishishir 5 років тому

    Nice !

  • @chriscannon303
    @chriscannon303 4 роки тому

    what in gods name are you talking about?? what is an LSTM chain?? I came here because I need to know im writing the correct content for my website and I haven't a fucking clue what the hell you are on about.