I feel like this talk painted a fairly unfavourable picture of Han unification. Although the limited space in Unicode could perhaps be seen as one of the driving forces behind it, in reality what was unified and what wasn't was decided mostly on the basis of what East Asian encodings already did. And those East Asian encodings decided their principles for unification based on fairly sound principles. You can't encode every minor glyph variation separately or else you end up with a mess of a system where text is encoded incredibly inconsistently due to the large number of duplicates. Even if Unicode were to be designed from scratch today without any space limits, Han unification would end up very similar to how it is now.
TBH, Most pragmatically useful content is at the beginning and end. IMO, the content in the middle while nice background information is a little light on utility. @ ~ 1:20 Ms. Manning points out difference between string.len() & string.char().count() @ ~ 15:40 Ms. Manning talks over UTF-8, UTF-16, & UTF-32
I feel like this talk painted a fairly unfavourable picture of Han unification. Although the limited space in Unicode could perhaps be seen as one of the driving forces behind it, in reality what was unified and what wasn't was decided mostly on the basis of what East Asian encodings already did. And those East Asian encodings decided their principles for unification based on fairly sound principles. You can't encode every minor glyph variation separately or else you end up with a mess of a system where text is encoded incredibly inconsistently due to the large number of duplicates. Even if Unicode were to be designed from scratch today without any space limits, Han unification would end up very similar to how it is now.
TBH, Most pragmatically useful content is at the beginning and end. IMO, the content in the middle while nice background information is a little light on utility.
@ ~ 1:20 Ms. Manning points out difference between string.len() & string.char().count()
@ ~ 15:40 Ms. Manning talks over UTF-8, UTF-16, & UTF-32