Thank you for a great talk with useful visuals! I make a Unix program (durdraw) for drawing Unicode and other text art, and find myself working with different character encoding regularly. Perhaps I missed it, but Utf-8's backwards compatibility with ASCII is worth considering when choosing an encoding scheme. I also liked the useful "od" syntax. I rarely encounter Utf-16, but thanks to your video I will now be able to recognize it in a hex dump.
Excellent visual explanation! Couldn't be clearer! I didn't know that it would choose the correct length to each character. I thought it always has a fixed length. I really would like to know more, about this in general... Headers, BO, LE, etc. I also find it very interesting and very useful to work with ETL in data engineering. If you think of something else besides the links you already shared in the description please let me know. Thank you for making this video.
Hi Erik, congratulations on the video and thanks for sharing your knowledge. I am migrating an Oracle database on Solaris Sparc that is using UTF-16BE, while the destination uses UTF-8. In your opinion, what would be the best approach to converting the data source?
If you have a CPU where every address is 16 bits wide, you may as well use UTF-16 as default. If memory is 8 bits wide, use UTF-8. For 32 bit (or 64 bit) you can store multiple characters per RAM address, no matter what system you choose.
In the end, if you care about memory efficiency, UTF-8 may be the best choice if you mostly use ASCII characters. But there (sadly) is no generally best default choice.
or general into additional byte info for example in txt files, bom withaut bom, maby how to add additional info into a jpg file (without damaging it.) ..
A byte order mark depends on the format you are using. Specifically in Unicode the byte order mark talks about byte order in UTF-16. How to do it another day to four minutes is a very different question. For UTF-16, the byte order mark signals whether the Unicode file uses big endian or little endian format.
Nobody can escape globalization, sooner or later you have to support more than just ASCII or the fragmented ISO 8859 character sets. At that point, Unicode and very likely UTF-8 become your best friends.
@@ErikWilde I'm having a hard time grasping the fact that native windows api only accepted utf-16 encoded strings up to day, such a rubbish decision! This explains why windows takes up a huge RAM, not to mention that completely redundant cpu cost for the sake of utf transformation.
That is exactly what I want to know, showing some great tools. Thank you for sharing, and have a great day:-)
Thank you!!
That was great explanation!
Thanks!
Thank you for this very useful video!
Thank you for a great talk with useful visuals! I make a Unix program (durdraw) for drawing Unicode and other text art, and find myself working with different character encoding regularly. Perhaps I missed it, but Utf-8's backwards compatibility with ASCII is worth considering when choosing an encoding scheme. I also liked the useful "od" syntax. I rarely encounter Utf-16, but thanks to your video I will now be able to recognize it in a hex dump.
Great explanation. Much appreciated.
Thanks a lot, @qeetcode!
great and informative video ; thanks
@@brm901, thanks for the kind words!
Excellent visual explanation! Couldn't be clearer! I didn't know that it would choose the correct length to each character. I thought it always has a fixed length. I really would like to know more, about this in general... Headers, BO, LE, etc. I also find it very interesting and very useful to work with ETL in data engineering. If you think of something else besides the links you already shared in the description please let me know. Thank you for making this video.
Sir, Very well explained. Thanks.
@@VishwaMukh , thanks for the kind words!
very good explanation thanks.
Hi Erik, congratulations on the video and thanks for sharing your knowledge. I am migrating an Oracle database on Solaris Sparc that is using UTF-16BE, while the destination uses UTF-8. In your opinion, what would be the best approach to converting the data source?
Whatever migration tool you are using should really give you that option. If it does not give you that option I would look for a different tool.
Thanks. Very informative.
If you have a CPU where every address is 16 bits wide, you may as well use UTF-16 as default. If memory is 8 bits wide, use UTF-8.
For 32 bit (or 64 bit) you can store multiple characters per RAM address, no matter what system you choose.
In the end, if you care about memory efficiency, UTF-8 may be the best choice if you mostly use ASCII characters. But there (sadly) is no generally best default choice.
thanks, very interesting video
Thanks for sharing that!
6:29 please go into the details "byte order mark" in utf 16
or general into additional byte info for example in txt files, bom withaut bom, maby how to add additional info into a jpg file (without damaging it.) ..
A byte order mark depends on the format you are using. Specifically in Unicode the byte order mark talks about byte order in UTF-16. How to do it another day to four minutes is a very different question. For UTF-16, the byte order mark signals whether the Unicode file uses big endian or little endian format.
Please how do I find encoding of my file
you have a linkedin handle?
I find this very interesting
www.linkedin.com/in/erikwilde
¡Muchas gracias!
Hi thanks man, thanks for the video
Danke!
It seems that windows switched to utf8 either, speaking of win10 21H2 and later.
Nobody can escape globalization, sooner or later you have to support more than just ASCII or the fragmented ISO 8859 character sets. At that point, Unicode and very likely UTF-8 become your best friends.
@@ErikWilde I'm having a hard time grasping the fact that native windows api only accepted utf-16 encoded strings up to day, such a rubbish decision!
This explains why windows takes up a huge RAM, not to mention that completely redundant cpu cost for the sake of utf transformation.
I wonder Microsoft's office still can't open files in utf8 😳
Hi utf8.46
Johnson Eric Thomas Jose Perez Elizabeth
UTF-16 should be abandoned because it is so problematical.
Maybe it's problematic, but be prepared to have to deal with it for many years to come.