Generative AI in a Nutshell - how to survive and thrive in the age of AI

Transformers (how LLMs work) explained visually | DL5

DeepSeek is a Game Changer for AI - Computerphile

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Анна Трінчер - Треш (Official Music Video)

Advantages of Letter based Tokenization for Machine Learning

Stephen Blum

Переглядів 48

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 лют 2025
Let's chat about letter-based tokenization in machine learning models. TikTok folks asked about the advantages of using letters for tokenization, especially when dealing with the attention mechanism. Well, there are several.
Letter-based tokenization gives us fine granularity. It catches more details since it's looking at each character, which is crucial for rare words and nuanced meanings. It also handles misspelled words better.
Instead of needing a huge dictionary of words, we keep the vocabulary small, just letters, numbers, and some punctuation, making it less compute-heavy and more flexible. Plus, it keeps the input size consistent, which helps when working with varying word lengths. With character-level cues, models generalize better, even across different but similar words, adding some noise robustness like managing typos and slang.
This method works great with various models like RNNs, LSTMs, CNNs, and transformers. Now, speaking of models, you might wonder if big ones like GPT-3 use letter-based tokenization. They mix it with other methods, but early models like the char-CNN or deep emoji heavily relied on it.
Newer ones like CharFormer are also exploring it. Other tokenization methods include word-based, sub-word, sentence-based, and n-gram tokenization. Each has its perks and downfalls, but a mix often works best, especially in large-scale models.

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

DeepSeek is a Game Changer for AI - Computerphile

DeepSeek is a Game Changer for AI - Computerphile

«Просив пробачення, що не уберіг Діму» - історія братів Василя Репчука і Дмитра Мурару #shorts

«Просив пробачення, що не уберіг Діму» — історія братів Василя Репчука і Дмитра Мурару #shorts

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

But what is a neural network? | Deep learning chapter 1

But what is a neural network? | Deep learning chapter 1

🤖 DeepSeek-R1:14b Jailbreaking & Prompt Engineering Basics: Demonstration [Educational only]

🤖 DeepSeek-R1:14b Jailbreaking & Prompt Engineering Basics: Demonstration [Educational only]

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Build Vs Buy Software DIY or Pay for APIs

Build Vs Buy Software DIY or Pay for APIs

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

DeepSeek facts vs hype, model distillation, and open source competition

DeepSeek facts vs hype, model distillation, and open source competition

Machine Learning Batch Size

Machine Learning Batch Size

Prompt Engineering Tutorial - Master ChatGPT and LLM Responses

Prompt Engineering Tutorial – Master ChatGPT and LLM Responses

AI, Machine Learning, Deep Learning and Generative AI Explained

AI, Machine Learning, Deep Learning and Generative AI Explained

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

ШАЛОСТЬ (смешное видео, приколы, юмор, поржать)

Син ПОВАЛІЙ ПЛЮНУВ ЇЙ в ОБЛИЧЧЯ! Скандальне ПРИВІТАННЯ для ЗРАДНИЦІ! | OBOZ.LIFE

Син ПОВАЛІЙ ПЛЮНУВ ЇЙ в ОБЛИЧЧЯ! Скандальне ПРИВІТАННЯ для ЗРАДНИЦІ! | OBOZ.LIFE

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok

"ВСЯ УЛИЦА полетела" - курянка про обстріли рф

"ВСЯ УЛИЦА полетела" — курянка про обстріли рф

“Don’t stop the chances.”

“Don’t stop the chances.”

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)