What are UTF-8 and UTF-16? Working with Unicode encodings

Поділитися
Вставка
  • Опубліковано 24 гру 2024

КОМЕНТАРІ • 37

  • @lalpremi
    @lalpremi 11 місяців тому +10

    That is exactly what I want to know, showing some great tools. Thank you for sharing, and have a great day:-)

  • @vladislavkaras491
    @vladislavkaras491 Місяць тому +1

    That was great explanation!
    Thanks!

  • @higiniofuentes2551
    @higiniofuentes2551 11 місяців тому +2

    Thank you for this very useful video!

  • @IndianaJoenz
    @IndianaJoenz 4 місяці тому +1

    Thank you for a great talk with useful visuals! I make a Unix program (durdraw) for drawing Unicode and other text art, and find myself working with different character encoding regularly. Perhaps I missed it, but Utf-8's backwards compatibility with ASCII is worth considering when choosing an encoding scheme. I also liked the useful "od" syntax. I rarely encounter Utf-16, but thanks to your video I will now be able to recognize it in a hex dump.

  • @qeetcode
    @qeetcode Рік тому +2

    Great explanation. Much appreciated.

  • @brm901
    @brm901 2 роки тому +7

    great and informative video ; thanks

    • @ErikWilde
      @ErikWilde  2 місяці тому

      @@brm901, thanks for the kind words!

  • @nervocalm
    @nervocalm Рік тому +1

    Excellent visual explanation! Couldn't be clearer! I didn't know that it would choose the correct length to each character. I thought it always has a fixed length. I really would like to know more, about this in general... Headers, BO, LE, etc. I also find it very interesting and very useful to work with ETL in data engineering. If you think of something else besides the links you already shared in the description please let me know. Thank you for making this video.

  • @VishwaMukh
    @VishwaMukh 2 місяці тому +1

    Sir, Very well explained. Thanks.

    • @ErikWilde
      @ErikWilde  2 місяці тому

      @@VishwaMukh , thanks for the kind words!

  • @2bitsbyab
    @2bitsbyab Місяць тому +1

    very good explanation thanks.

  • @flaviomelo7893
    @flaviomelo7893 Рік тому +1

    Hi Erik, congratulations on the video and thanks for sharing your knowledge. I am migrating an Oracle database on Solaris Sparc that is using UTF-16BE, while the destination uses UTF-8. In your opinion, what would be the best approach to converting the data source?

    • @ErikWilde
      @ErikWilde  Рік тому +1

      Whatever migration tool you are using should really give you that option. If it does not give you that option I would look for a different tool.

  • @nournote
    @nournote Рік тому +1

    Thanks. Very informative.

  • @Soupie62
    @Soupie62 6 місяців тому

    If you have a CPU where every address is 16 bits wide, you may as well use UTF-16 as default. If memory is 8 bits wide, use UTF-8.
    For 32 bit (or 64 bit) you can store multiple characters per RAM address, no matter what system you choose.

    • @ErikWilde
      @ErikWilde  6 місяців тому

      In the end, if you care about memory efficiency, UTF-8 may be the best choice if you mostly use ASCII characters. But there (sadly) is no generally best default choice.

  • @pazaresosset6348
    @pazaresosset6348 4 місяці тому +1

    thanks, very interesting video

  • @gersoncjunior
    @gersoncjunior 5 місяців тому

    Thanks for sharing that!

  • @parsifal8232
    @parsifal8232 Рік тому +1

    6:29 please go into the details "byte order mark" in utf 16

    • @parsifal8232
      @parsifal8232 Рік тому

      or general into additional byte info for example in txt files, bom withaut bom, maby how to add additional info into a jpg file (without damaging it.) ..

    • @ErikWilde
      @ErikWilde  Рік тому +2

      A byte order mark depends on the format you are using. Specifically in Unicode the byte order mark talks about byte order in UTF-16. How to do it another day to four minutes is a very different question. For UTF-16, the byte order mark signals whether the Unicode file uses big endian or little endian format.

  • @akshardrashti
    @akshardrashti 5 місяців тому

    Please how do I find encoding of my file

  • @AshisRout-b4q
    @AshisRout-b4q Рік тому +1

    you have a linkedin handle?
    I find this very interesting

    • @ErikWilde
      @ErikWilde  Рік тому

      www.linkedin.com/in/erikwilde

  • @LuisHernandez-dv4xu
    @LuisHernandez-dv4xu 2 роки тому +1

    ¡Muchas gracias!

  • @human4566vv
    @human4566vv Рік тому

    Hi thanks man, thanks for the video

  • @gt10i
    @gt10i 7 місяців тому

    Danke!

  • @sabitkondakc9147
    @sabitkondakc9147 Рік тому

    It seems that windows switched to utf8 either, speaking of win10 21H2 and later.

    • @ErikWilde
      @ErikWilde  Рік тому +3

      Nobody can escape globalization, sooner or later you have to support more than just ASCII or the fragmented ISO 8859 character sets. At that point, Unicode and very likely UTF-8 become your best friends.

    • @sabitkondakc9147
      @sabitkondakc9147 Рік тому

      @@ErikWilde I'm having a hard time grasping the fact that native windows api only accepted utf-16 encoded strings up to day, such a rubbish decision!
      This explains why windows takes up a huge RAM, not to mention that completely redundant cpu cost for the sake of utf transformation.

  • @MrJloa
    @MrJloa Рік тому

    I wonder Microsoft's office still can't open files in utf8 😳

  • @صالحمحمد-ص2ك1ك
    @صالحمحمد-ص2ك1ك Рік тому

    Hi utf8.46

  • @RobertHernandez-t5q
    @RobertHernandez-t5q 2 місяці тому

    Johnson Eric Thomas Jose Perez Elizabeth

  • @Tapajara
    @Tapajara Рік тому

    UTF-16 should be abandoned because it is so problematical.

    • @ErikWilde
      @ErikWilde  Рік тому +1

      Maybe it's problematic, but be prepared to have to deal with it for many years to come.