Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more

An Attempt at Making a Better String for C++

Characters, Symbols and the Unicode Miracle - Computerphile

ВІКТОРИНА #31. ЗІРКИ СПОРТУ ПРОТИ ВКВ: ЮЛІЯ ЛЕВЧЕНКО ТА ІРИНА ГЕРАЩЕНКО х КУРАН ТА ВЕНЯ

ПРОВЕРКА НЯНИ НА ПРОЧНОСТЬ🫢 Ролик уже на канале Димас Блог #димасблог #аняищук #проверка

Серіал Одна родина 2024 серія 2 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Expanding the UTF-8 Character Set to Infinity

Mashpoe

Переглядів 4 034

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 бер 2019
Expanding the UTF-8 character set to infinity
Наука та технологія

КОМЕНТАРІ • 16

@ybungalobill 2 роки тому ⁺²⁹
The proposed scheme breaks another genius property of UTF-8: that it's self-synchronizing. You can always determine if a byte is the beginning of a character just by looking at it. This is crucial not only for iterating back and forth through the string, but also for being able to search for substrings using a simple strstr. You can fix your scheme by filling in those ones into the x'es of 10xxxxxx bytes. Eg:
11111111 10111111 10111111 10111111 10110xxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx ...
@MatheusAugustoGames 3 роки тому ⁺¹⁹
Ok I just want to point out the genius that was the creation of UTF-8. Old computers, if they found 8 bits set to 0 in a byte, would interpret the string as finished. This pattern on UTF guarantees that will never happen accidentally.
@lelouchvibritannia69yearsa78 2 роки тому ⁺¹²
The beginning of a Legendary Game Developer's journey!
@lelouchvibritannia69yearsa78 2 роки тому ⁺⁵
Ayo I hot a heart from the creator! Les gooooooooo
@sarahdehart1027 5 років тому ⁺¹⁸
Lol! That ending was epic! Loved it!
@halftwins 2 роки тому ⁺³
I see a couple problems with this, mainly for example, not having clarification on if a character has just started with a byte or is preceded by 11111111. Maybe there's something I'm not noticing, but it seems like for it to really last forever an ending sequence of some kind would be needed(?) Anyway, the video was great and early congrats on 1k!
@Magnogen 2 роки тому ⁺³
That's a good point, I was half expecting him to say that if the byte started with 0, then _that_ would be the terminating byte. Something like
*1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx*
would then be the corresponding utf-infinity code, and ascii would be the base case of just 0xxxxxxx. Backwards compatibility and all.
But hey, that's just a thought. I'm not sure how feasible it'd be in practice, as I don't tend to work with memory allocation, but I'd like to know how well it'd work/if it'd work at all.
@BGBTech 2 роки тому
@@Magnogen That scheme is actually used for encoding numbers in some file formats.
One other scheme I had used in some of my formats is:
0xxxxxxx (0-127), 10xxxxxx-xxxxxxxx (0-16K), 110xxxxx-xxxxxxxx-xxxxxxxx (0-2M), ...
A lot depends on what properties one wants. There are also various ways these schemes can be extended for signed numbers, to encode variable-length floating point values, ...
OTOH, while UTF-8 doesn't have the most efficient representation, it does allow re-synchronizing, and in a few odd-cases non-standard encodings are possible (for example, I had used "transposed UTF-8" values in string tables as to encode string length prefixes), noting that it is possible to unambiguously differentiate between normal coded and transposed encodings (and in some cases, it might be preferable to have some way to be able to encode an explicit string length, without needing to count characters until the NUL byte).
@luca__3044 2 роки тому ⁺¹
Cant wait to express my feelings in a 420bit alien langue!
@PC_YouTube_Channel 2 роки тому
lmao amazing ending. your channel really gives off some Tom 7 vibes.
@sullivanbarnett6904 5 років тому ⁺¹
Thank you jacob!
@TimJSwan 2 роки тому ⁺¹
lol 256 bits enough? more than all the plank lengths in the universe represented...
@bored_person 2 роки тому ⁺¹
Patents expire after 20 years.
@robloxxer593 2 роки тому
Wait why tf are they adding four entire 1's two chracters already had 4 combinations and wouldn't you know when it ends from the bits that told you how long it is? what's the point of the bits in the front of the byte
@decare696 2 роки тому ⁺³
it's so that a byte that's in the middle of some character can't be mistaken for a correct ascii byte by old or bad/lazy software
@robloxxer593 2 роки тому
@@decare696 stupid lazy old software

Наступне

Автоматичне відтворення

Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more

Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more

An Attempt at Making a Better String for C++

An Attempt at Making a Better String for C++

Characters, Symbols and the Unicode Miracle - Computerphile

Characters, Symbols and the Unicode Miracle - Computerphile

ВІКТОРИНА #31. ЗІРКИ СПОРТУ ПРОТИ ВКВ: ЮЛІЯ ЛЕВЧЕНКО ТА ІРИНА ГЕРАЩЕНКО х КУРАН ТА ВЕНЯ

ВІКТОРИНА #31. ЗІРКИ СПОРТУ ПРОТИ ВКВ: ЮЛІЯ ЛЕВЧЕНКО ТА ІРИНА ГЕРАЩЕНКО х КУРАН ТА ВЕНЯ

ПРОВЕРКА НЯНИ НА ПРОЧНОСТЬ🫢 Ролик уже на канале Димас Блог #димасблог #аняищук #проверка

ПРОВЕРКА НЯНИ НА ПРОЧНОСТЬ🫢 Ролик уже на канале Димас Блог #димасблог #аняищук #проверка

Серіал Одна родина 2024 серія 2 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Серіал Одна родина 2024 серія 2 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Sigma Girl Education #sigma #viral #comedy

Sigma Girl Education #sigma #viral #comedy

4D Miner Devlog #1: New Lighting and World Generation!

4D Miner Devlog #1: New Lighting and World Generation!

Amazing 4D Miner Builds, Fanart, & More!

Amazing 4D Miner Builds, Fanart, & More!

I Made a 1D Game 🎮

I Made a 1D Game 🎮

How Many ERRORS Can You Fit in a Video?!

How Many ERRORS Can You Fit in a Video?!

Fixing the Alphabet

Fixing the Alphabet

Reveal Invisible Motion With This Clever Video Trick

Reveal Invisible Motion With This Clever Video Trick

4D Miner Post-Kickstarter Update

4D Miner Post-Kickstarter Update

Biomedical Scientist Answers Pseudoscience Questions From Twitter | Tech Support | WIRED

Biomedical Scientist Answers Pseudoscience Questions From Twitter | Tech Support | WIRED

Making a Video Game in a Browser's Tab Icon!

Making a Video Game in a Browser's Tab Icon!

Apple iPhone 15 Pro Max With Smallrig Professional Photography kit #shorts

Apple iPhone 15 Pro Max With Smallrig Professional Photography kit #shorts

The Amazing Digital Circus Smart Phone Password Templates || This is The Best Alarm Clock Ever! #10

The Amazing Digital Circus Smart Phone Password Templates || This is The Best Alarm Clock Ever! #10

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

Building the ENDGAME invisible PC

Building the ENDGAME invisible PC

Трагичная История Девушки 😱🔥

Трагичная История Девушки 😱🔥

Что такое SysWOW64, или как Windows стала 64-БИТНОЙ

Что такое SysWOW64, или как Windows стала 64-БИТНОЙ

Проблемы с безопасностью ИИ OpenAI | В Китае ожил размороженный мозг | Большой Брат ИИ от Microsoft

Проблемы с безопасностью ИИ OpenAI | В Китае ожил размороженный мозг | Большой Брат ИИ от Microsoft

Самая НЕобычная Механическая клавиатура

Самая НЕобычная Механическая клавиатура