What is Retrieval Augmented Generation (RAG) and JinaAI?

Training a new tokenizer

Brilliant Blue Area Question

How many pencils can hold me up?

어른의 힘으로만 할 수 있는 버블티 마시는법

БОГДАН "ТАВР" КРОТЕВИЧ / ЧАСТ.2 / ХАЙПОЖЕРСТВО НА СМЕРТЯХ / НЕ ДАЛИ ЗАБРАТИ ЛЕОПАРДИ / БІЙКИ В АЗОВ

how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

Chris Hay

Переглядів 1 727

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 січ 2024
chris breaks down the chatgpt (gpt-4) tokenizer and shows why large language models such as gpt, llama-2 and mistral struggle to reverse words. chris looks at how words, programming languages, different languages and even how morse code is tokenized, and shows how tokenizers tend to be biased towards english languages and programming languages,
Наука та технологія

КОМЕНТАРІ • 6

@ernestuz Місяць тому ⁺¹
The funny thing is the most complete the vocabulary the less pressure in the upper layers, so it's not only cheaper because of fewer tokens, but in processing, I wonder if somebody has prepared a semi handcrafted tokenizer, where, let's say the first 30K tokens come from a dictionary and the rest is generated.
@chrishayuk 10 днів тому
exactly. tbh, i wouldn't' be surprised if someone goes that direction
@feniyuli 2 місяці тому ⁺¹
It is very helpful to understand how the tokenization works. Thanks! Do you think data that we encode using tiktoken will be sent to the AI?
@chrishayuk 10 днів тому
definitely not, it's all local
@ilyanemihin6029 3 місяці тому ⁺²
Thanks, very interesting information
@chrishayuk 3 місяці тому ⁺¹
glad it was useful

Наступне

Автоматичне відтворення

What is Retrieval Augmented Generation (RAG) and JinaAI?

What is Retrieval Augmented Generation (RAG) and JinaAI?

Training a new tokenizer

Training a new tokenizer

Brilliant Blue Area Question

Brilliant Blue Area Question

How many pencils can hold me up?

How many pencils can hold me up?

어른의 힘으로만 할 수 있는 버블티 마시는법

어른의 힘으로만 할 수 있는 버블티 마시는법

БОГДАН "ТАВР" КРОТЕВИЧ / ЧАСТ.2 / ХАЙПОЖЕРСТВО НА СМЕРТЯХ / НЕ ДАЛИ ЗАБРАТИ ЛЕОПАРДИ / БІЙКИ В АЗОВ

БОГДАН "ТАВР" КРОТЕВИЧ / ЧАСТ.2 / ХАЙПОЖЕРСТВО НА СМЕРТЯХ / НЕ ДАЛИ ЗАБРАТИ ЛЕОПАРДИ / БІЙКИ В АЗОВ

Продовження роду | Український серіал, що зворушує до сліз | Серія 1 (2024)

Продовження роду | Український серіал, що зворушує до сліз | Серія 1 (2024)

Inside the LLM: Visualizing the Embeddings Layer of Mistral-7B and Gemma-2B

Inside the LLM: Visualizing the Embeddings Layer of Mistral-7B and Gemma-2B

Python's 5 Worst Features

Python's 5 Worst Features

Real-Time Rust: Building WebSockets with Tokio Tungstenite

Real-Time Rust: Building WebSockets with Tokio Tungstenite

Programming A Pixel Clock for Devs using Python, JavaScript, C#, Rust and others!

Programming A Pixel Clock for Devs using Python, JavaScript, C#, Rust and others!

Getting Started with OLLAMA - the docker of ai!!!

Getting Started with OLLAMA - the docker of ai!!!

All Rust features explained

All Rust features explained

5 Good Python Habits

5 Good Python Habits

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Apple watch hidden camera

Apple watch hidden camera

The Most Awkward Upgrade…. AMD $5000 Ultimate Tech Upgrade

The Most Awkward Upgrade…. AMD $5000 Ultimate Tech Upgrade

Дени против умной колонки😁

Дени против умной колонки😁

ПЕРВАЯ РОССИЙСКАЯ ИГРОВАЯ ПРИСТАВКА! ИМПОРТОЗАМЕЩЕНИЕ В ОТВЕТ Н.А.Т.О!!!

ПЕРВАЯ РОССИЙСКАЯ ИГРОВАЯ ПРИСТАВКА! ИМПОРТОЗАМЕЩЕНИЕ В ОТВЕТ Н.А.Т.О!!!

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

Good Tool Cutting And Recycling Circuit Board Easily- Wisdom Tips Machine Easy Easyway Easywork !

Good Tool Cutting And Recycling Circuit Board Easily- Wisdom Tips Machine Easy Easyway Easywork !

iPad Pro 11 на M4 - КАК APPLE ЛИШИЛАСЬ ДУШИ

iPad Pro 11 на M4 - КАК APPLE ЛИШИЛАСЬ ДУШИ

НЕДЕЛЯ с POCO F6 Pro - сколько стоит ЛЮБОВЬ? | ЧЕСТНЫЙ ОТЗЫВ

НЕДЕЛЯ с POCO F6 Pro — сколько стоит ЛЮБОВЬ? | ЧЕСТНЫЙ ОТЗЫВ