Please note, with the automatic dubbing from UA-cam /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.
Byte-level LLMs are obviously the way forward for that first round of training where you're predicting 1..n tokens given the prefix, particularly for multi-language models. Tokenization is clearly a hack, like in the dark ages of image neural networks, where we would hand-craft feature detection kernels.
Thank you so much for covering this paper! I had been thinking about this specific implementation for a year and i believe its a significant step towards having truly general learning architecture that is minimizing hand crafted human priors.
BLT seems the way to go in an ideal world, but there are definetly problems with it, I think tokenizers have accomplished tremendous work and we are on this state thanks to improving the vocab size and the tokenizations mechanisms, but from this point we may have the technology and resources to try to perform BLT on a model ( I still don't think it would work that much better)
Can you clarify that the pre training will have to use the BLT embeddings. I.e. unless models pre trained using BLT start appearing on huggingface or elsewhere we mere mortals will not be able to take advantage of this new method?
Please note, with the automatic dubbing from UA-cam /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.
Byte-level LLMs are obviously the way forward for that first round of training where you're predicting 1..n tokens given the prefix, particularly for multi-language models. Tokenization is clearly a hack, like in the dark ages of image neural networks, where we would hand-craft feature detection kernels.
Thank you so much for covering this paper! I had been thinking about this specific implementation for a year and i believe its a significant step towards having truly general learning architecture that is minimizing hand crafted human priors.
i'm having a plantbased BLT right now
BLT seems the way to go in an ideal world, but there are definetly problems with it, I think tokenizers have accomplished tremendous work and we are on this state thanks to improving the vocab size and the tokenizations mechanisms, but from this point we may have the technology and resources to try to perform BLT on a model ( I still don't think it would work that much better)
Can you clarify that the pre training will have to use the BLT embeddings. I.e. unless models pre trained using BLT start appearing on huggingface or elsewhere we mere mortals will not be able to take advantage of this new method?
very very cool
Does the small transformer have bpe then in the H(xi) is it finding the cross entropy. 26:13
Bacon Lettuce Tomato