@@ABChinese I absolutely agree with you on DuChinese: flashcards haven't helped me much, except in the beginning when I studied the basic characters, however, reading content has helped me rank up very fast, as I learned more than 1000 characters in 6-8 months. Then I took a few months break, but now I can still read content. Reading content is the key in my opinion. I'm subscribed to Mandarin Bean and Du Chinese
Something I realize is how much the source text can influence how easily one can understand. An example: 滚 is #1572 on the frequency chart you used, yet it is one of about 80 symbols I currently recognise because it gets used so frequently in Cdrama to represent "scram", "get out of here" or other less polite invitations to exit the scene... lol ... Meanwhile frequently used words like 国 [#20] are not on my list because Cdrama characters don't use them as much. This is a fascinating experiment ... thanks to your dad and Grace for making it possible!
That's a good point about 滚. I just watched a movie today where ppl kept saying 'gun' (didn't see the character), and your comment reminded me to look it up-- lo and behold, the very same word!
The thing is, with any frequency list you are basing it on some corpus, but word frequencies actually depend a lot on the corpus. Drama, chat, technical document, news report, youtube comment, workplace, etc all demand different vocabularies. The only characters that really are used in every context at high frequency are pronouns, numbers, prepositions, auxiliary verbs and grammatical particles.
It confirms my fear. Like, maybe you don't need to know how "ice" is said in any language. Yet, if you have to deal with it, even though it's so rare, you'd miss your target completely. Rare words get their revenge by providing a lot of context that you otherwise will never be able to manage to understand.
This was a really fun video! I'm a big fan of Grace's videos, so it was great to see this collab. I'm also a DuChinese premium user, I love that app. I think a video about the details that went into your experiment would be pretty interesting
This was a very interesting experiment! Feeling more motivated to continue mis studies thinking I won't need a ton of characters to understand Chinese content. Also, I was just this past week I started looking for a better alternative for a learning language app, since I paid for Duolingo last year and I think it didn't do much to improve my knowledge. I just got Du Chinese and it's very interesting. I will stick with this one this year! Thank you for the very helpful content! Hopefully this year I make it to HSK 3 :')
There are 2663 individual characters in the hsk 6 vocab (5000 words assembled by 9662 characters). You can have a very big vocab with 500 individual characters. I guess 1000 is like a hsk 5 level. //edit: Thought it's quite interesting so I did some coding. You are able to form: -1175 words with 500 chars -2413 with 1000 -3520 with 1500 The wordlist is the hsk 6 one. Ofc a native speaker will be able to make up even more words with the given chars.
I like you! This is a very interesting video. I have studied Chinese for 8 years, about 20 years ago. I guess we learned about 1500 to 2000 characters, but I am still not fluent at all. However, when I travel to China, I can travel around without any problems. Chinese is such an interesting language!
I have experienced being barely literate in this way, and can confirm you can enjoy reading at this level of vocabulary. Also a big fan of Grace, nice to see her here. :+ ) I have made pieces of software like your father did, I have been cleaning up *word* lists rather than character lists as well.
Awesome experiment. We’ve all heard that we need to know a certain number of characters, but this is the first time I’ve seen it actually tested out. I know you’ve been working hard on this video and it turned out great. I enjoyed watching🙂Thank you. btw that’s a very cool transition at 6:43!
I signed up for duchinese after your first video about it but I totally forgot to use your signup code. Sorry about that! But I really appreciate the recommendation, it totally rules. I only wish they had things listed using HSK 3.0 levels, but that's a minor complaint.
Thank you for this, and all your videos, they are very encouraging. I've just started (trying to) read the mandarin companion breakthrough book "我的老师是火星人”. I still have some way to go😊
This is one of the most interesting videos about Chinese language I've ever seen on UA-cam.I didn't know about that program you mentioned at the beginning of the video before, so I copied the 现代汉语常用字表 on word, then deleted the characters I didn't know so word will count the number of characters I can read😂.The result was around 2100 characters but I would say I can recognize only around 2000 because some of them are not very clear to me.Even with this number of characters, I still regularly see new ones which I can't recognize.
@@ruffhakes7419 but they read content, their school material, ect in hangul form....... It's very different with Chinese n Japanese people, that implementing their Chinese character knowledge in real situation
feels like a Chinese reading Japanese articles. You don't understand the pure Japanese part but you can still understand the meaning from the remaining Chinese characters.
Having the plot presented to you and then having native chinese speech be in this plot is really fantastic for getting in the groove of the language. So much better than if she had given her thoughts in English
There is a big problem with this test: the native speakers already know what the characters some words consists of, so if they see a word with only one character they can fill in the missing character with the help of context. But this won't be possible with non-native speakers who won't know or understand words that contain characters outside the 500 characters they know. E.g 双xx, like in the video, could easily be filled in as 双包胎 by a native speaker, but a non native speaker who doesn't know what characters the Chinese word for twins is consisted of probably couldn't guess the word.
Yeah, I realized that was one problem, but after reading your comment, I suddenly thought of a way I could've fixed it. maybe? If I make every string of unknown characters just one "X" then she wouldn't be able to tell how many characters there are and can't guess based on length of words. So like 双胞胎 would be 双X instead of 双XX.
I can read over 4,000 characters. But that doesn't mean that I know all the vocabulary. I read several documents out loud at the museums in Beijing and Shanghai that were written in older script to a local Chinese to tell me what I was reading. It was too fun!
I'm new subscriber and this is like my sec vid and i realy like the content and ur personality the way u explain thing it's a gift buddy god bless u I'm abut to finish HSK 2 and im glad that i watched this video I'm already using anki to memorize characters and lots of graded reading it's really encouraging to kn I'm on the right track thanks a lot ❤
i think it makes a lot of sense. the most frequent 150 convey almost no meaning because theyre almost just grammar, and grammar is necessary in every single text so of course theyre the most frequent ones. after those grammar bases are set, the first 350-650 characters are all going to start being super important to start conveying meaning, and like grace said eventually they all become rarer and more specific.
Fascinating experiment! I think the 3rd grader texts were probably quite difficult and not all that common hahaa I would have tried also some novels and simple stories and longer chats.
Grade school textbook have different vocabulary than HSK, since HSK focuses more on functional vocabulary and textbooks tend to focus more on literature, including old literature. I would definitely have tried with more text, but this video was already 20 minutes long with 4 text, so... I don't think people would watch a longer video.
The way chinese letter uses 偏旁 to articulate more complex characters must really helps too. Even if one does not know a character, one might know a part of it and therefore make some sense
Great video! Indeed, the most important take away is that 500 characters is NOWHERE near enough for understanding texts. That native girl simply knows pretty much all Chinese words, so she can guess what it could mean. I'm currently doing HSK2 and would have trouble with these texts even though I know far more than 500 characters.
please make a video with a random japanese article and try to translate it in english and then tell how much you understood in percentage after translating the real meaning. would be more cool if you could repeat the vice versa with a native japanese speaker about chinese
I'm wondering if characters is the best way to select what's readable as opposed to words like is generally done for other languages. I'd like to see this same experiment but with words rather than characters. For example, take the characters that are in the top 500 words and just allow those. The number of characters certainly won't match and may even be less than 500, but the set might be different from the 500 most common characters. I'm curious how that affects readability.
This is interesting to me as a native Japanese speaker (our Daily Use characters are 2000-2100 letters) who is learning Cantonese. I often find the most hurdles are when we use differently the characters from their original meaning, and then I sorta get stuck and overloaded like overtaxed RAM if that weren't a reasonable metaphor 電腦wouldn't be a word)
as a non-native, i think it's worth mentioning that even the words we don't understand can sometimes have clues for us to guess their meaning (ex: two characters words in which we know one of the characters, radicals, certain clues of whether it's a noun, adverb, adjective etc). so yeah we can probably read even better than you expect with only 500 characters
that actually freaks me out that u used an article abt twins being born 87 days apart because i just read an article about that earlier. and it wasn't new, i thought about it and searched for it LOL
Great experiment, well done to both of you! Everybody says - input input input, yeah you just need to read, good there are some online materials to start from (DuChinese you pointed, MandarinBean, some phone apps etc). Hope to be able to read normal books any time soon :)
So, ~ 100 characters accounts for things like numbers, grammatical particles, and common affixes. ~500 characters accounts for those and very common basic nouns and verbs. Together, these account for ~ 75% of text you encounter. This is true, but mostly this isn't helpful because you will still be missing the most important (and much rarer statistically speaking) nouns and verbs that are the key content. Every text will be different, for example very basic, more or less artificial language in textbooks or graded readers will be designed to make sense with a low character count. Social media also tends to be pretty basic. Anything more though, and you will quickly discover that you *need* those rare content words to understand what you are reading or hearing! Studies with reading comprehension show that if you understand less than 90% of the text, you will not understand and be frustrated. If you understood between 93% to 97% of what you read/heard then you can get the message with a lot of work. If you understand about 98% of the message, you can figure out the meaning of any words you don't know. So, if you have a text of 100 characters, and you understand only 90, odds are that you are missing those key rare content words. If you assume an upper conversational-level text that uses about 2000 characters then you need to be able to understand ~1,800 of them to potentially muddle through towards understanding the message. This is all complicated by the fact that every text differs, not only in its level, but in how many words--as opposed to characters--there are in the text, along with how much other context like pictures, tables, or graphs there is in the text. In my experience, social media is really easy to get, but even trying to read the front page of a newspaper is really challenging at my level.
This was a very cool thing to visually see how important characters are (of course!). But honestly my problem isn't the characters themselves, I love learning them, but the words! Yeah of course I know the character 的 for example, but I may not know the WORDS 目的、的確、地款等
I feel you should recruit an actual learner for the experiment. The reason is that Grace know way way more than the test quantity of characters that even if there are some missing characters, she has the ability to fill in the missing character from her deep knowledge and skew the comprehensive higher. But great video. Very informative non the less.
I don't know Chinese, but I guess some words are made up of more than one character? Then wouldn't you need to know the meaning of combination in addition to the single characters or would you be able to guess?
Right, so a character in Chinese is a morpheme, it can be a word or can be part of a compounded word. A native speaker can kind of guess even if some characters are missing because the characters carry meaning even by themselves.
Agreed it's the number of words, not the character number that is important! Sometimes you can guess the meaning say 冰茶 ice tea, be part right 红茶 red tea--》English is black tea, or not have a clue 清淡 literally clear weak which means not spicy or greasy of food.
hey there! quick question, do you think it's possible to learn how to read chinese, but not actually know the language ? learn the meaning of the characters and group of characters, but never know how to pronounce them ? or is there something I'm missing out that would prevent this ?
Pareto sends their regards 😎 But the thing is with chinese, that characters do not equal words. You need to know both characters and their combined mining in 2+ character words. I know 1500 characters, but I can’t read many things even when I know all the characters in a text, because i don’t know the words in the text yet. Obviously Grace knows a lot of words as characters too. Edit: Finished the video I see you said this exactly the same points 😂
have you completed the experiment? I know some 700-800 characters and can read simple texts in Japanese trivially and mostly extract the vital information from intermediate texts, but stumble pretty hard on any slightly advanced or specialised topics. I'd be interested to hear your experiences.
Realistic version: The Chinese teacher deducts 1 mark for every word you don't know or mispronounce, and then you fail your Chinese oral, and you get put in Chinese remedial class and have to stay back after school. Then your parents look at your grades and say "Never mind boy I also scored F9 for Chinese in my time, I will send you for Chinese tuition class."
Interesting experiment, but for the reasons you mentioned in the video, it tells you absolutely nothing about how much a learner would understand. That would depend mostly on how many WORDS the learner knows. I bet if you found a learner who has only learned about 500 characters and gave him the test, he would understand very little. Of course, it also depends very heavily on what the text is. I have no idea how many characters I know, but I think at least 2000. (But there are probably also a lot that I wouldn’t know individually, but know in context.) And it is still not enough to understand everything I read. A learner just can’t compare to a native speaker in an experiment like this. Also, I never understood the focus on characters instead of words. Like the example you gave, if you see the word 存在, just knowing the character 在 would be of no help to you at all. This gets much worse at slightly higher levels. Also, what does “knowing a character” even mean? Does it mean you know how to pronounce it and its basic meaning? I think if you don’t know all the various meanings it could have in different contexts, you don’t really know it. You’ve only started to come to know it.
I completely agree with you and had the same confusion on why people don't count words. I actually contacted Dr. Jun Da and asked him that question in passing. He told me the reason people don't count words is because it's almost impossible with the nature of Chinese and current technology. Since Chinese doesn't use spaces, and each character serves as a morpheme (that also CAN be a word), it's very difficult for machines to identify "words." There's a formula he used in his study to estimate the number of words in his corpus, but it's only an estimate. Who knows, maybe when AI takes over, we'll finally be able to count Chinese words!
@@ABChinese Chinese parsing is actually quite accurate. I've used a python library called 'jieba' in some projects. Even if the accuracy isn't perfect, it's more than enough to get a solid estimate at a word count.
Based on an approx. trend (the inverse difference formula with R^2=.9996) you get the number of characters compared to understanding: Number Understand % 150 23.5% (20% she said) 500 79.6% (80% she said) 1000 90.0% (90% she said) 1969 95.0% 2000 95.1% 2663 96.3% HSK 6 3916 97.5% 9756 99.0% 97350 99.9%
As a native speaker this is intriguing; especially the difference between 150 and 500. Just goes to show just because you can read 50% of the words doesn’t you can understand 50% of the content
I'm interested in learning to read some Chinese mostly so I can more quickly scan through electronics vendor websites and part datasheets, anyone have any tips? Should I be focusing on studying that kind of technical content using tools like the Zhongwen browser plugin, or would it be more effective to start with more traditional "beginner" type content? Output, writing/speaking, isn't as important to my goal as input, but would it be substantially beneficial to practice anyway?
It was a really cool experiment! I had a lot of fun😆 Thank you for having me!
So good to have you on my channel!! Thanks for collaborating with me❤
Thank you @ABChinese and @GraceMandarinChinese. I look forward to possibly seeing more collaborations with you two in the future.
@@ABChinese I absolutely agree with you on DuChinese: flashcards haven't helped me much, except in the beginning when I studied the basic characters, however, reading content has helped me rank up very fast, as I learned more than 1000 characters in 6-8 months. Then I took a few months break, but now I can still read content.
Reading content is the key in my opinion. I'm subscribed to Mandarin Bean and Du Chinese
omg your my fav youtuber thank you so much 😭
I see Grace I click. Great to hear her casually speaking Chinese 😂
Something I realize is how much the source text can influence how easily one can understand. An example: 滚 is #1572 on the frequency chart you used, yet it is one of about 80 symbols I currently recognise because it gets used so frequently in Cdrama to represent "scram", "get out of here" or other less polite invitations to exit the scene... lol ... Meanwhile frequently used words like 国 [#20] are not on my list because Cdrama characters don't use them as much.
This is a fascinating experiment ... thanks to your dad and Grace for making it possible!
That's a good point about 滚. I just watched a movie today where ppl kept saying 'gun' (didn't see the character), and your comment reminded me to look it up-- lo and behold, the very same word!
❤ love that she did this experiment!
what was the frequency chart he used??
@@brunocardoso7132 He posted a link to it in the description.
The thing is, with any frequency list you are basing it on some corpus, but word frequencies actually depend a lot on the corpus. Drama, chat, technical document, news report, youtube comment, workplace, etc all demand different vocabularies. The only characters that really are used in every context at high frequency are pronouns, numbers, prepositions, auxiliary verbs and grammatical particles.
It confirms my fear. Like, maybe you don't need to know how "ice" is said in any language. Yet, if you have to deal with it, even though it's so rare, you'd miss your target completely. Rare words get their revenge by providing a lot of context that you otherwise will never be able to manage to understand.
Ice is a rare word?
@@xuexizhongwen ranks 2096 according to Wiktionary, so yeah, it's rare
This was a really fun video! I'm a big fan of Grace's videos, so it was great to see this collab. I'm also a DuChinese premium user, I love that app. I think a video about the details that went into your experiment would be pretty interesting
You got all the best resources;)
This was a very interesting experiment! Feeling more motivated to continue mis studies thinking I won't need a ton of characters to understand Chinese content. Also, I was just this past week I started looking for a better alternative for a learning language app, since I paid for Duolingo last year and I think it didn't do much to improve my knowledge. I just got Du Chinese and it's very interesting. I will stick with this one this year! Thank you for the very helpful content! Hopefully this year I make it to HSK 3 :')
There are 2663 individual characters in the hsk 6 vocab (5000 words assembled by 9662 characters). You can have a very big vocab with 500 individual characters. I guess 1000 is like a hsk 5 level. //edit: Thought it's quite interesting so I did some coding.
You are able to form:
-1175 words with 500 chars
-2413 with 1000
-3520 with 1500
The wordlist is the hsk 6 one. Ofc a native speaker will be able to make up even more words with the given chars.
I like you! This is a very interesting video.
I have studied Chinese for 8 years, about 20 years ago. I guess we learned about 1500 to 2000 characters, but I am still not fluent at all. However, when I travel to China, I can travel around without any problems. Chinese is such an interesting language!
Thanks!
Thank you!
I have experienced being barely literate in this way, and can confirm you can enjoy reading at this level of vocabulary.
Also a big fan of Grace, nice to see her here. :+ )
I have made pieces of software like your father did, I have been cleaning up *word* lists rather than character lists as well.
Awesome experiment. We’ve all heard that we need to know a certain number of characters, but this is the first time I’ve seen it actually tested out. I know you’ve been working hard on this video and it turned out great. I enjoyed watching🙂Thank you.
btw that’s a very cool transition at 6:43!
Thanks for watching Josh!
the last test would have been better at 300 words for context
Great vlog. This experiment makes me much more optimistic about the possibility of learning to read Chinese. Great insights!
Your editing improves every upload, it’s definitely noticeable keep up the good work
Thank you Erin~~
I signed up for duchinese after your first video about it but I totally forgot to use your signup code. Sorry about that! But I really appreciate the recommendation, it totally rules. I only wish they had things listed using HSK 3.0 levels, but that's a minor complaint.
Hahaha no worries
Thank you for this, and all your videos, they are very encouraging. I've just started (trying to) read the mandarin companion breakthrough book "我的老师是火星人”. I still have some way to go😊
As a Chinese learner I enjoyed this video a lot. This is the collab that I didn't know I needed 😄
This is one of the most interesting videos about Chinese language I've ever seen on UA-cam.I didn't know about that program you mentioned at the beginning of the video before, so I copied the 现代汉语常用字表 on word, then deleted the characters I didn't know so word will count the number of characters I can read😂.The result was around 2100 characters but I would say I can recognize only around 2000 because some of them are not very clear to me.Even with this number of characters, I still regularly see new ones which I can't recognize.
I'm South Korean and Korean has some Chinese loan words.
I think I'll be able to read Chinese IF I learn those 500 characters!
Don't South Koreans learn hanja at school as part of the curriculum?
@@ruffhakes7419 but they read content, their school material, ect in hangul form....... It's very different with Chinese n Japanese people, that implementing their Chinese character knowledge in real situation
super interesting to visually see what the amount of characters you know do in regards to reading.
feels like a Chinese reading Japanese articles. You don't understand the pure Japanese part but you can still understand the meaning from the remaining Chinese characters.
Having the plot presented to you and then having native chinese speech be in this plot is really fantastic for getting in the groove of the language.
So much better than if she had given her thoughts in English
There is a big problem with this test: the native speakers already know what the characters some words consists of, so if they see a word with only one character they can fill in the missing character with the help of context. But this won't be possible with non-native speakers who won't know or understand words that contain characters outside the 500 characters they know. E.g 双xx, like in the video, could easily be filled in as 双包胎 by a native speaker, but a non native speaker who doesn't know what characters the Chinese word for twins is consisted of probably couldn't guess the word.
Yeah, I realized that was one problem, but after reading your comment, I suddenly thought of a way I could've fixed it. maybe? If I make every string of unknown characters just one "X" then she wouldn't be able to tell how many characters there are and can't guess based on length of words. So like 双胞胎 would be 双X instead of 双XX.
Man! You're a genius. ^^ Thanks~
Thats it! Im adding the top 500 most frequent chinese characters to my anki deck along side my hsk3!
Thanks!
Thank you~
This was super interesting, thanks for putting in the hours to create this video!
Great video! I really loved seeing you two collab, my fav two chinese language teachers
I can read over 4,000 characters. But that doesn't mean that I know all the vocabulary. I read several documents out loud at the museums in Beijing and Shanghai that were written in older script to a local Chinese to tell me what I was reading. It was too fun!
Heyyyy it's Grace . I'm one of her subs too. 欢迎来到频道!
Mandarin is a beautiful language
This is great! Many thanks to you and who made this computer programm! It is superb❤!
Amazing experiment! You rule!
I'm new subscriber and this is like my sec vid and i realy like the content and ur personality the way u explain thing it's a gift buddy god bless u I'm abut to finish HSK 2 and im glad that i watched this video I'm already using anki to memorize characters and lots of graded reading it's really encouraging to kn I'm on the right track thanks a lot ❤
i think it makes a lot of sense. the most frequent 150 convey almost no meaning because theyre almost just grammar, and grammar is necessary in every single text so of course theyre the most frequent ones.
after those grammar bases are set, the first 350-650 characters are all going to start being super important to start conveying meaning, and like grace said eventually they all become rarer and more specific.
Fascinating experiment! I think the 3rd grader texts were probably quite difficult and not all that common hahaa I would have tried also some novels and simple stories and longer chats.
Grade school textbook have different vocabulary than HSK, since HSK focuses more on functional vocabulary and textbooks tend to focus more on literature, including old literature. I would definitely have tried with more text, but this video was already 20 minutes long with 4 text, so... I don't think people would watch a longer video.
Thanks for this Andrew.
DuChinese is fantastic!
You are an excellent content creator! Keep up the great work and energy!
Thank you! I'll keep trying:)
great video. very encouraging for a learner! 非常谢谢
I currently know probably about 650-700 characters. If I can read that much from knowing 1,000, I can't wait til I can read at least 1,000.
The way chinese letter uses 偏旁 to articulate more complex characters must really helps too. Even if one does not know a character, one might know a part of it and therefore make some sense
although some characters have very different pronunciation with different parts
I love to see unexpected collabs between two channels I follow 😁😁😁😁
She can’t explain it without more than 500 characters if she didn’t know more than 500
Great video!
Indeed, the most important take away is that 500 characters is NOWHERE near enough for understanding texts. That native girl simply knows pretty much all Chinese words, so she can guess what it could mean. I'm currently doing HSK2 and would have trouble with these texts even though I know far more than 500 characters.
please make a video with a random japanese article and try to translate it in english and then tell how much you understood in percentage after translating the real meaning. would be more cool if you could repeat the vice versa with a native japanese speaker about chinese
I'm wondering if characters is the best way to select what's readable as opposed to words like is generally done for other languages. I'd like to see this same experiment but with words rather than characters. For example, take the characters that are in the top 500 words and just allow those. The number of characters certainly won't match and may even be less than 500, but the set might be different from the 500 most common characters. I'm curious how that affects readability.
Great video! Really loved seeing Grace again here! Yay! Keep these videos coming.
Jun Da website showing all his research is absolutily amazing for chinese begginers like me your channel never stop to surprise me
This is interesting to me as a native Japanese speaker (our Daily Use characters are 2000-2100 letters) who is learning Cantonese. I often find the most hurdles are when we use differently the characters from their original meaning, and then I sorta get stuck and overloaded like overtaxed RAM if that weren't a reasonable metaphor 電腦wouldn't be a word)
it might be easier to start with Mandarin, because Cantonese created their own characters, then you can go on
@@danielzhang1916Probably all other dialects also
.
as a non-native, i think it's worth mentioning that even the words we don't understand can sometimes have clues for us to guess their meaning (ex: two characters words in which we know one of the characters, radicals, certain clues of whether it's a noun, adverb, adjective etc). so yeah we can probably read even better than you expect with only 500 characters
Was rooting for the wechat dude
that actually freaks me out that u used an article abt twins being born 87 days apart because i just read an article about that earlier. and it wasn't new, i thought about it and searched for it LOL
Great experiment, well done to both of you! Everybody says - input input input, yeah you just need to read, good there are some online materials to start from (DuChinese you pointed, MandarinBean, some phone apps etc). Hope to be able to read normal books any time soon :)
Awesome video top content as always!! - KS
Pareto distribution?
In the real world.
Its amazing how your work exists and your just super awesome!!!!
1000/10 rating %
Your videos are great ❤
You videos are always entertaining and to the point. Keep up the great work. 😊
Thank you for watching! I try my best;)
@@ABChinese you do a great job 👍🏻👍🏻
You've got a new subscriber, thanks for the awesome content! 你的视频非常有趣
So, ~ 100 characters accounts for things like numbers, grammatical particles, and common affixes. ~500 characters accounts for those and very common basic nouns and verbs. Together, these account for ~ 75% of text you encounter. This is true, but mostly this isn't helpful because you will still be missing the most important (and much rarer statistically speaking) nouns and verbs that are the key content. Every text will be different, for example very basic, more or less artificial language in textbooks or graded readers will be designed to make sense with a low character count. Social media also tends to be pretty basic. Anything more though, and you will quickly discover that you *need* those rare content words to understand what you are reading or hearing! Studies with reading comprehension show that if you understand less than 90% of the text, you will not understand and be frustrated. If you understood between 93% to 97% of what you read/heard then you can get the message with a lot of work. If you understand about 98% of the message, you can figure out the meaning of any words you don't know. So, if you have a text of 100 characters, and you understand only 90, odds are that you are missing those key rare content words. If you assume an upper conversational-level text that uses about 2000 characters then you need to be able to understand ~1,800 of them to potentially muddle through towards understanding the message. This is all complicated by the fact that every text differs, not only in its level, but in how many words--as opposed to characters--there are in the text, along with how much other context like pictures, tables, or graphs there is in the text. In my experience, social media is really easy to get, but even trying to read the front page of a newspaper is really challenging at my level.
MTSU REPRESENT! I graduated from there and work with Dr. Jun Da in the Foreign Language Computer Lab.
Great content 🎉
This was so interesting, thanks! I am still scared to learn, but I feel braver. A bit 🤣
This was a very cool thing to visually see how important characters are (of course!). But honestly my problem isn't the characters themselves, I love learning them, but the words!
Yeah of course I know the character 的 for example, but I may not know the WORDS 目的、的確、地款等
加油
As a beginner Chinese learner with good English comprehension, I wonder how this experiment would go with English texts.
It was nice to see the 150 since that's about where I'm at, just a third of the way to 500 😂
if you are a native speaker, you can sometimes even mess up the order of the words in a sentence and you can still understand its meaning.
wow i have hope now thx
13:51 it felt like all she read was "from the ... of the ... to the ... of ..."
0:40 Where is that setting? It looks like a middle school library
It is a public library haha
@@ABChinese Ohh ok lol
I need that software so I can find the commonly used characters I don't know on my own
Check my latest post under the community tab! You can download it there
@@ABChinese tysm!!
I love this video
With 500 characters, you can't even read kindergarten books.
I feel you should recruit an actual learner for the experiment. The reason is that Grace know way way more than the test quantity of characters that even if there are some missing characters, she has the ability to fill in the missing character from her deep knowledge and skew the comprehensive higher. But great video. Very informative non the less.
13:36 Now she knows how I feel when I try to read something in chinese. hahaha
That is pretty impressive with only 500 characters.
Hey that software is really well needed,
When can we get it publically? Cos I like
Language Software (not enough for PC)
Check the post I made in my community tab!
Can you make it again with a chinese leraner from a different level ?
很有创意的节目
老粉丝啊~~谢谢观看
Nice programming!
I wish there was that list of those common characters - I was waiting for it and it isn’t there…
Hi~ here it is: lingua.mtsu.edu/chinese-computing/statistics/
I used the "Modern Chinese" list and the link is also in the description
I don't know Chinese, but I guess some words are made up of more than one character? Then wouldn't you need to know the meaning of combination in addition to the single characters or would you be able to guess?
Right, so a character in Chinese is a morpheme, it can be a word or can be part of a compounded word. A native speaker can kind of guess even if some characters are missing because the characters carry meaning even by themselves.
So Thing Explainer by Randall Munroe is possible in Chinese with relatively few concessions?!
Agreed it's the number of words, not the character number that is important! Sometimes you can guess the meaning say 冰茶 ice tea, be part right 红茶 red tea--》English is black tea, or not have a clue 清淡 literally clear weak which means not spicy or greasy of food.
I wish every chinese video on the internet would use the same type of subtitles.
hey there! quick question, do you think it's possible to learn how to read chinese, but not actually know the language ? learn the meaning of the characters and group of characters, but never know how to pronounce them ? or is there something I'm missing out that would prevent this ?
Pareto sends their regards 😎
But the thing is with chinese, that characters do not equal words. You need to know both characters and their combined mining in 2+ character words.
I know 1500 characters, but I can’t read many things even when I know all the characters in a text, because i don’t know the words in the text yet. Obviously Grace knows a lot of words as characters too.
Edit:
Finished the video I see you said this exactly the same points 😂
Do you have a frequency list for vocabulary?
This is amazingly insightful! Is there anywhere I can download the software? I want to try the same experiment with Japanese.
I can ask my dad if we can make it a downloadable thing
@@ABChinese Yes please! (;^ω^)
have you completed the experiment? I know some 700-800 characters and can read simple texts in Japanese trivially and mostly extract the vital information from intermediate texts, but stumble pretty hard on any slightly advanced or specialised topics. I'd be interested to hear your experiences.
哇! 非常有意思!谢谢。。。where can I obtain that filtering software?
Realistic version: The Chinese teacher deducts 1 mark for every word you don't know or mispronounce, and then you fail your Chinese oral, and you get put in Chinese remedial class and have to stay back after school. Then your parents look at your grades and say "Never mind boy I also scored F9 for Chinese in my time, I will send you for Chinese tuition class."
Interesting experiment, but for the reasons you mentioned in the video, it tells you absolutely nothing about how much a learner would understand. That would depend mostly on how many WORDS the learner knows. I bet if you found a learner who has only learned about 500 characters and gave him the test, he would understand very little. Of course, it also depends very heavily on what the text is.
I have no idea how many characters I know, but I think at least 2000. (But there are probably also a lot that I wouldn’t know individually, but know in context.) And it is still not enough to understand everything I read. A learner just can’t compare to a native speaker in an experiment like this. Also, I never understood the focus on characters instead of words. Like the example you gave, if you see the word 存在, just knowing the character 在 would be of no help to you at all. This gets much worse at slightly higher levels. Also, what does “knowing a character” even mean? Does it mean you know how to pronounce it and its basic meaning? I think if you don’t know all the various meanings it could have in different contexts, you don’t really know it. You’ve only started to come to know it.
I completely agree with you and had the same confusion on why people don't count words. I actually contacted Dr. Jun Da and asked him that question in passing. He told me the reason people don't count words is because it's almost impossible with the nature of Chinese and current technology. Since Chinese doesn't use spaces, and each character serves as a morpheme (that also CAN be a word), it's very difficult for machines to identify "words." There's a formula he used in his study to estimate the number of words in his corpus, but it's only an estimate. Who knows, maybe when AI takes over, we'll finally be able to count Chinese words!
@@ABChinese Chinese parsing is actually quite accurate. I've used a python library called 'jieba' in some projects. Even if the accuracy isn't perfect, it's more than enough to get a solid estimate at a word count.
13:10 Timestamp for my future list
13:38 me at HSK-1 level trying to read Chinese texts 🙈
Based on an approx. trend (the inverse difference formula with R^2=.9996) you get the number of characters compared to understanding:
Number Understand %
150 23.5% (20% she said)
500 79.6% (80% she said)
1000 90.0% (90% she said)
1969 95.0%
2000 95.1%
2663 96.3% HSK 6
3916 97.5%
9756 99.0%
97350 99.9%
As a native speaker this is intriguing; especially the difference between 150 and 500. Just goes to show just because you can read 50% of the words doesn’t you can understand 50% of the content
the last test should have been 300 to make sense of the context
I'm interested in learning to read some Chinese mostly so I can more quickly scan through electronics vendor websites and part datasheets, anyone have any tips? Should I be focusing on studying that kind of technical content using tools like the Zhongwen browser plugin, or would it be more effective to start with more traditional "beginner" type content? Output, writing/speaking, isn't as important to my goal as input, but would it be substantially beneficial to practice anyway?
The beginner stuff will be useful to start, because you'll have to learn the foundational concepts no matter which route you eventually go
6:57
哈喽同学
图书馆
学生证掉了
7:10
谢谢啦
(idk)
两杯起送
12:00
那么
晚上
考试
(next page)
复习
那好的
考试重要
大题
不用了
17:50
居然在饭堂也能遇到
那么特别的缘分
(in Sticker)男大学生
有点巧
(idk)
下次约你
(next page)
下次是什么时候?
不是吧
好吧
中文覆盖率:
1,核心汉字:67%,300个❗️👍
2,基本汉字(高频字):80%,600个!(包括核心300个汉字在内)
3,中频字:400个!高+中=1000个!覆盖率高达90%!
4,低频字:1500个!高+中+低=2500个!覆盖率达到99%❗️👍
5,超低频字:2500个!覆盖率只有1%!😂共5000个覆盖率达到99.99%(包括了古中文)
我会说普通话但没什么会读就让好多的中国好友们都惊讶,虽然我是个24小时忙的大学生 我还是会找时间在提高国语,所以呢父母给孩子们上中华学校挺重要的!!
是台湾人吗?哈哈哈哈哈
@@郭毅-x3y 不是,台湾人不可能用简体字
Can you do this experiment with Japanese?