@@ABChinese I absolutely agree with you on DuChinese: flashcards haven't helped me much, except in the beginning when I studied the basic characters, however, reading content has helped me rank up very fast, as I learned more than 1000 characters in 6-8 months. Then I took a few months break, but now I can still read content. Reading content is the key in my opinion. I'm subscribed to Mandarin Bean and Du Chinese
Something I realize is how much the source text can influence how easily one can understand. An example: 滚 is #1572 on the frequency chart you used, yet it is one of about 80 symbols I currently recognise because it gets used so frequently in Cdrama to represent "scram", "get out of here" or other less polite invitations to exit the scene... lol ... Meanwhile frequently used words like 国 [#20] are not on my list because Cdrama characters don't use them as much. This is a fascinating experiment ... thanks to your dad and Grace for making it possible!
That's a good point about 滚. I just watched a movie today where ppl kept saying 'gun' (didn't see the character), and your comment reminded me to look it up-- lo and behold, the very same word!
The thing is, with any frequency list you are basing it on some corpus, but word frequencies actually depend a lot on the corpus. Drama, chat, technical document, news report, youtube comment, workplace, etc all demand different vocabularies. The only characters that really are used in every context at high frequency are pronouns, numbers, prepositions, auxiliary verbs and grammatical particles.
This was a really fun video! I'm a big fan of Grace's videos, so it was great to see this collab. I'm also a DuChinese premium user, I love that app. I think a video about the details that went into your experiment would be pretty interesting
It confirms my fear. Like, maybe you don't need to know how "ice" is said in any language. Yet, if you have to deal with it, even though it's so rare, you'd miss your target completely. Rare words get their revenge by providing a lot of context that you otherwise will never be able to manage to understand.
This was a very interesting experiment! Feeling more motivated to continue mis studies thinking I won't need a ton of characters to understand Chinese content. Also, I was just this past week I started looking for a better alternative for a learning language app, since I paid for Duolingo last year and I think it didn't do much to improve my knowledge. I just got Du Chinese and it's very interesting. I will stick with this one this year! Thank you for the very helpful content! Hopefully this year I make it to HSK 3 :')
There are 2663 individual characters in the hsk 6 vocab (5000 words assembled by 9662 characters). You can have a very big vocab with 500 individual characters. I guess 1000 is like a hsk 5 level. //edit: Thought it's quite interesting so I did some coding. You are able to form: -1175 words with 500 chars -2413 with 1000 -3520 with 1500 The wordlist is the hsk 6 one. Ofc a native speaker will be able to make up even more words with the given chars.
@@ruffhakes7419 but they read content, their school material, ect in hangul form....... It's very different with Chinese n Japanese people, that implementing their Chinese character knowledge in real situation
I like you! This is a very interesting video. I have studied Chinese for 8 years, about 20 years ago. I guess we learned about 1500 to 2000 characters, but I am still not fluent at all. However, when I travel to China, I can travel around without any problems. Chinese is such an interesting language!
There is a big problem with this test: the native speakers already know what the characters some words consists of, so if they see a word with only one character they can fill in the missing character with the help of context. But this won't be possible with non-native speakers who won't know or understand words that contain characters outside the 500 characters they know. E.g 双xx, like in the video, could easily be filled in as 双包胎 by a native speaker, but a non native speaker who doesn't know what characters the Chinese word for twins is consisted of probably couldn't guess the word.
Yeah, I realized that was one problem, but after reading your comment, I suddenly thought of a way I could've fixed it. maybe? If I make every string of unknown characters just one "X" then she wouldn't be able to tell how many characters there are and can't guess based on length of words. So like 双胞胎 would be 双X instead of 双XX.
Awesome experiment. We’ve all heard that we need to know a certain number of characters, but this is the first time I’ve seen it actually tested out. I know you’ve been working hard on this video and it turned out great. I enjoyed watching🙂Thank you. btw that’s a very cool transition at 6:43!
I have experienced being barely literate in this way, and can confirm you can enjoy reading at this level of vocabulary. Also a big fan of Grace, nice to see her here. :+ ) I have made pieces of software like your father did, I have been cleaning up *word* lists rather than character lists as well.
Thank you for this, and all your videos, they are very encouraging. I've just started (trying to) read the mandarin companion breakthrough book "我的老师是火星人”. I still have some way to go😊
I signed up for duchinese after your first video about it but I totally forgot to use your signup code. Sorry about that! But I really appreciate the recommendation, it totally rules. I only wish they had things listed using HSK 3.0 levels, but that's a minor complaint.
feels like a Chinese reading Japanese articles. You don't understand the pure Japanese part but you can still understand the meaning from the remaining Chinese characters.
I can read over 4,000 characters. But that doesn't mean that I know all the vocabulary. I read several documents out loud at the museums in Beijing and Shanghai that were written in older script to a local Chinese to tell me what I was reading. It was too fun!
This is one of the most interesting videos about Chinese language I've ever seen on UA-cam.I didn't know about that program you mentioned at the beginning of the video before, so I copied the 现代汉语常用字表 on word, then deleted the characters I didn't know so word will count the number of characters I can read😂.The result was around 2100 characters but I would say I can recognize only around 2000 because some of them are not very clear to me.Even with this number of characters, I still regularly see new ones which I can't recognize.
I'm wondering if characters is the best way to select what's readable as opposed to words like is generally done for other languages. I'd like to see this same experiment but with words rather than characters. For example, take the characters that are in the top 500 words and just allow those. The number of characters certainly won't match and may even be less than 500, but the set might be different from the 500 most common characters. I'm curious how that affects readability.
So, ~ 100 characters accounts for things like numbers, grammatical particles, and common affixes. ~500 characters accounts for those and very common basic nouns and verbs. Together, these account for ~ 75% of text you encounter. This is true, but mostly this isn't helpful because you will still be missing the most important (and much rarer statistically speaking) nouns and verbs that are the key content. Every text will be different, for example very basic, more or less artificial language in textbooks or graded readers will be designed to make sense with a low character count. Social media also tends to be pretty basic. Anything more though, and you will quickly discover that you *need* those rare content words to understand what you are reading or hearing! Studies with reading comprehension show that if you understand less than 90% of the text, you will not understand and be frustrated. If you understood between 93% to 97% of what you read/heard then you can get the message with a lot of work. If you understand about 98% of the message, you can figure out the meaning of any words you don't know. So, if you have a text of 100 characters, and you understand only 90, odds are that you are missing those key rare content words. If you assume an upper conversational-level text that uses about 2000 characters then you need to be able to understand ~1,800 of them to potentially muddle through towards understanding the message. This is all complicated by the fact that every text differs, not only in its level, but in how many words--as opposed to characters--there are in the text, along with how much other context like pictures, tables, or graphs there is in the text. In my experience, social media is really easy to get, but even trying to read the front page of a newspaper is really challenging at my level.
i think it makes a lot of sense. the most frequent 150 convey almost no meaning because theyre almost just grammar, and grammar is necessary in every single text so of course theyre the most frequent ones. after those grammar bases are set, the first 350-650 characters are all going to start being super important to start conveying meaning, and like grace said eventually they all become rarer and more specific.
Having the plot presented to you and then having native chinese speech be in this plot is really fantastic for getting in the groove of the language. So much better than if she had given her thoughts in English
This is interesting to me as a native Japanese speaker (our Daily Use characters are 2000-2100 letters) who is learning Cantonese. I often find the most hurdles are when we use differently the characters from their original meaning, and then I sorta get stuck and overloaded like overtaxed RAM if that weren't a reasonable metaphor 電腦wouldn't be a word)
I'm new subscriber and this is like my sec vid and i realy like the content and ur personality the way u explain thing it's a gift buddy god bless u I'm abut to finish HSK 2 and im glad that i watched this video I'm already using anki to memorize characters and lots of graded reading it's really encouraging to kn I'm on the right track thanks a lot ❤
Fascinating experiment! I think the 3rd grader texts were probably quite difficult and not all that common hahaa I would have tried also some novels and simple stories and longer chats.
Grade school textbook have different vocabulary than HSK, since HSK focuses more on functional vocabulary and textbooks tend to focus more on literature, including old literature. I would definitely have tried with more text, but this video was already 20 minutes long with 4 text, so... I don't think people would watch a longer video.
please make a video with a random japanese article and try to translate it in english and then tell how much you understood in percentage after translating the real meaning. would be more cool if you could repeat the vice versa with a native japanese speaker about chinese
Great experiment, well done to both of you! Everybody says - input input input, yeah you just need to read, good there are some online materials to start from (DuChinese you pointed, MandarinBean, some phone apps etc). Hope to be able to read normal books any time soon :)
as a non-native, i think it's worth mentioning that even the words we don't understand can sometimes have clues for us to guess their meaning (ex: two characters words in which we know one of the characters, radicals, certain clues of whether it's a noun, adverb, adjective etc). so yeah we can probably read even better than you expect with only 500 characters
Great video! Indeed, the most important take away is that 500 characters is NOWHERE near enough for understanding texts. That native girl simply knows pretty much all Chinese words, so she can guess what it could mean. I'm currently doing HSK2 and would have trouble with these texts even though I know far more than 500 characters.
that actually freaks me out that u used an article abt twins being born 87 days apart because i just read an article about that earlier. and it wasn't new, i thought about it and searched for it LOL
Agreed it's the number of words, not the character number that is important! Sometimes you can guess the meaning say 冰茶 ice tea, be part right 红茶 red tea--》English is black tea, or not have a clue 清淡 literally clear weak which means not spicy or greasy of food.
This was a very cool thing to visually see how important characters are (of course!). But honestly my problem isn't the characters themselves, I love learning them, but the words! Yeah of course I know the character 的 for example, but I may not know the WORDS 目的、的確、地款等
The way chinese letter uses 偏旁 to articulate more complex characters must really helps too. Even if one does not know a character, one might know a part of it and therefore make some sense
Interesting experiment, but for the reasons you mentioned in the video, it tells you absolutely nothing about how much a learner would understand. That would depend mostly on how many WORDS the learner knows. I bet if you found a learner who has only learned about 500 characters and gave him the test, he would understand very little. Of course, it also depends very heavily on what the text is. I have no idea how many characters I know, but I think at least 2000. (But there are probably also a lot that I wouldn’t know individually, but know in context.) And it is still not enough to understand everything I read. A learner just can’t compare to a native speaker in an experiment like this. Also, I never understood the focus on characters instead of words. Like the example you gave, if you see the word 存在, just knowing the character 在 would be of no help to you at all. This gets much worse at slightly higher levels. Also, what does “knowing a character” even mean? Does it mean you know how to pronounce it and its basic meaning? I think if you don’t know all the various meanings it could have in different contexts, you don’t really know it. You’ve only started to come to know it.
I completely agree with you and had the same confusion on why people don't count words. I actually contacted Dr. Jun Da and asked him that question in passing. He told me the reason people don't count words is because it's almost impossible with the nature of Chinese and current technology. Since Chinese doesn't use spaces, and each character serves as a morpheme (that also CAN be a word), it's very difficult for machines to identify "words." There's a formula he used in his study to estimate the number of words in his corpus, but it's only an estimate. Who knows, maybe when AI takes over, we'll finally be able to count Chinese words!
@@ABChinese Chinese parsing is actually quite accurate. I've used a python library called 'jieba' in some projects. Even if the accuracy isn't perfect, it's more than enough to get a solid estimate at a word count.
I feel you should recruit an actual learner for the experiment. The reason is that Grace know way way more than the test quantity of characters that even if there are some missing characters, she has the ability to fill in the missing character from her deep knowledge and skew the comprehensive higher. But great video. Very informative non the less.
Realistic version: The Chinese teacher deducts 1 mark for every word you don't know or mispronounce, and then you fail your Chinese oral, and you get put in Chinese remedial class and have to stay back after school. Then your parents look at your grades and say "Never mind boy I also scored F9 for Chinese in my time, I will send you for Chinese tuition class."
Based on an approx. trend (the inverse difference formula with R^2=.9996) you get the number of characters compared to understanding: Number Understand % 150 23.5% (20% she said) 500 79.6% (80% she said) 1000 90.0% (90% she said) 1969 95.0% 2000 95.1% 2663 96.3% HSK 6 3916 97.5% 9756 99.0% 97350 99.9%
I'm interested in learning to read some Chinese mostly so I can more quickly scan through electronics vendor websites and part datasheets, anyone have any tips? Should I be focusing on studying that kind of technical content using tools like the Zhongwen browser plugin, or would it be more effective to start with more traditional "beginner" type content? Output, writing/speaking, isn't as important to my goal as input, but would it be substantially beneficial to practice anyway?
have you completed the experiment? I know some 700-800 characters and can read simple texts in Japanese trivially and mostly extract the vital information from intermediate texts, but stumble pretty hard on any slightly advanced or specialised topics. I'd be interested to hear your experiences.
Pareto sends their regards 😎 But the thing is with chinese, that characters do not equal words. You need to know both characters and their combined mining in 2+ character words. I know 1500 characters, but I can’t read many things even when I know all the characters in a text, because i don’t know the words in the text yet. Obviously Grace knows a lot of words as characters too. Edit: Finished the video I see you said this exactly the same points 😂
Let's imagine you do something like this with English vocabulary: 'if you find this video interesting please subscribe' turns into 'if you find this -- -- please -- ' - And that's way more than 50% of the sentence ;) BUT - what's so powerful about conversations is you can rephrase it 'if you like what I do, please follow my work' - and this is the same meaning using only the most basic vocabulary. When it comes to reading - there's no person on the other side to rephrase anything - that's why it makes much harder ;)
I don't know Chinese, but I guess some words are made up of more than one character? Then wouldn't you need to know the meaning of combination in addition to the single characters or would you be able to guess?
Right, so a character in Chinese is a morpheme, it can be a word or can be part of a compounded word. A native speaker can kind of guess even if some characters are missing because the characters carry meaning even by themselves.
It was a really cool experiment! I had a lot of fun😆 Thank you for having me!
So good to have you on my channel!! Thanks for collaborating with me❤
Thank you @ABChinese and @GraceMandarinChinese. I look forward to possibly seeing more collaborations with you two in the future.
@@ABChinese I absolutely agree with you on DuChinese: flashcards haven't helped me much, except in the beginning when I studied the basic characters, however, reading content has helped me rank up very fast, as I learned more than 1000 characters in 6-8 months. Then I took a few months break, but now I can still read content.
Reading content is the key in my opinion. I'm subscribed to Mandarin Bean and Du Chinese
omg your my fav youtuber thank you so much 😭
I see Grace I click. Great to hear her casually speaking Chinese 😂
Something I realize is how much the source text can influence how easily one can understand. An example: 滚 is #1572 on the frequency chart you used, yet it is one of about 80 symbols I currently recognise because it gets used so frequently in Cdrama to represent "scram", "get out of here" or other less polite invitations to exit the scene... lol ... Meanwhile frequently used words like 国 [#20] are not on my list because Cdrama characters don't use them as much.
This is a fascinating experiment ... thanks to your dad and Grace for making it possible!
That's a good point about 滚. I just watched a movie today where ppl kept saying 'gun' (didn't see the character), and your comment reminded me to look it up-- lo and behold, the very same word!
❤ love that she did this experiment!
what was the frequency chart he used??
@@brunocardoso7132 He posted a link to it in the description.
The thing is, with any frequency list you are basing it on some corpus, but word frequencies actually depend a lot on the corpus. Drama, chat, technical document, news report, youtube comment, workplace, etc all demand different vocabularies. The only characters that really are used in every context at high frequency are pronouns, numbers, prepositions, auxiliary verbs and grammatical particles.
This was a really fun video! I'm a big fan of Grace's videos, so it was great to see this collab. I'm also a DuChinese premium user, I love that app. I think a video about the details that went into your experiment would be pretty interesting
You got all the best resources;)
It confirms my fear. Like, maybe you don't need to know how "ice" is said in any language. Yet, if you have to deal with it, even though it's so rare, you'd miss your target completely. Rare words get their revenge by providing a lot of context that you otherwise will never be able to manage to understand.
Ice is a rare word?
@@xuexizhongwen ranks 2096 according to Wiktionary, so yeah, it's rare
Thanks!
Thank you!
This was a very interesting experiment! Feeling more motivated to continue mis studies thinking I won't need a ton of characters to understand Chinese content. Also, I was just this past week I started looking for a better alternative for a learning language app, since I paid for Duolingo last year and I think it didn't do much to improve my knowledge. I just got Du Chinese and it's very interesting. I will stick with this one this year! Thank you for the very helpful content! Hopefully this year I make it to HSK 3 :')
There are 2663 individual characters in the hsk 6 vocab (5000 words assembled by 9662 characters). You can have a very big vocab with 500 individual characters. I guess 1000 is like a hsk 5 level. //edit: Thought it's quite interesting so I did some coding.
You are able to form:
-1175 words with 500 chars
-2413 with 1000
-3520 with 1500
The wordlist is the hsk 6 one. Ofc a native speaker will be able to make up even more words with the given chars.
I'm South Korean and Korean has some Chinese loan words.
I think I'll be able to read Chinese IF I learn those 500 characters!
Don't South Koreans learn hanja at school as part of the curriculum?
@@ruffhakes7419 but they read content, their school material, ect in hangul form....... It's very different with Chinese n Japanese people, that implementing their Chinese character knowledge in real situation
I like you! This is a very interesting video.
I have studied Chinese for 8 years, about 20 years ago. I guess we learned about 1500 to 2000 characters, but I am still not fluent at all. However, when I travel to China, I can travel around without any problems. Chinese is such an interesting language!
There is a big problem with this test: the native speakers already know what the characters some words consists of, so if they see a word with only one character they can fill in the missing character with the help of context. But this won't be possible with non-native speakers who won't know or understand words that contain characters outside the 500 characters they know. E.g 双xx, like in the video, could easily be filled in as 双包胎 by a native speaker, but a non native speaker who doesn't know what characters the Chinese word for twins is consisted of probably couldn't guess the word.
Yeah, I realized that was one problem, but after reading your comment, I suddenly thought of a way I could've fixed it. maybe? If I make every string of unknown characters just one "X" then she wouldn't be able to tell how many characters there are and can't guess based on length of words. So like 双胞胎 would be 双X instead of 双XX.
Great vlog. This experiment makes me much more optimistic about the possibility of learning to read Chinese. Great insights!
Awesome experiment. We’ve all heard that we need to know a certain number of characters, but this is the first time I’ve seen it actually tested out. I know you’ve been working hard on this video and it turned out great. I enjoyed watching🙂Thank you.
btw that’s a very cool transition at 6:43!
Thanks for watching Josh!
the last test would have been better at 300 words for context
I have experienced being barely literate in this way, and can confirm you can enjoy reading at this level of vocabulary.
Also a big fan of Grace, nice to see her here. :+ )
I have made pieces of software like your father did, I have been cleaning up *word* lists rather than character lists as well.
Thank you for this, and all your videos, they are very encouraging. I've just started (trying to) read the mandarin companion breakthrough book "我的老师是火星人”. I still have some way to go😊
Your editing improves every upload, it’s definitely noticeable keep up the good work
Thank you Erin~~
I signed up for duchinese after your first video about it but I totally forgot to use your signup code. Sorry about that! But I really appreciate the recommendation, it totally rules. I only wish they had things listed using HSK 3.0 levels, but that's a minor complaint.
Hahaha no worries
feels like a Chinese reading Japanese articles. You don't understand the pure Japanese part but you can still understand the meaning from the remaining Chinese characters.
Mandarin is a beautiful language
I can read over 4,000 characters. But that doesn't mean that I know all the vocabulary. I read several documents out loud at the museums in Beijing and Shanghai that were written in older script to a local Chinese to tell me what I was reading. It was too fun!
This is one of the most interesting videos about Chinese language I've ever seen on UA-cam.I didn't know about that program you mentioned at the beginning of the video before, so I copied the 现代汉语常用字表 on word, then deleted the characters I didn't know so word will count the number of characters I can read😂.The result was around 2100 characters but I would say I can recognize only around 2000 because some of them are not very clear to me.Even with this number of characters, I still regularly see new ones which I can't recognize.
Great video! I really loved seeing you two collab, my fav two chinese language teachers
Thanks!
Thank you~
As a Chinese learner I enjoyed this video a lot. This is the collab that I didn't know I needed 😄
Man! You're a genius. ^^ Thanks~
She can’t explain it without more than 500 characters if she didn’t know more than 500
I'm wondering if characters is the best way to select what's readable as opposed to words like is generally done for other languages. I'd like to see this same experiment but with words rather than characters. For example, take the characters that are in the top 500 words and just allow those. The number of characters certainly won't match and may even be less than 500, but the set might be different from the 500 most common characters. I'm curious how that affects readability.
This is great! Many thanks to you and who made this computer programm! It is superb❤!
So, ~ 100 characters accounts for things like numbers, grammatical particles, and common affixes. ~500 characters accounts for those and very common basic nouns and verbs. Together, these account for ~ 75% of text you encounter. This is true, but mostly this isn't helpful because you will still be missing the most important (and much rarer statistically speaking) nouns and verbs that are the key content. Every text will be different, for example very basic, more or less artificial language in textbooks or graded readers will be designed to make sense with a low character count. Social media also tends to be pretty basic. Anything more though, and you will quickly discover that you *need* those rare content words to understand what you are reading or hearing! Studies with reading comprehension show that if you understand less than 90% of the text, you will not understand and be frustrated. If you understood between 93% to 97% of what you read/heard then you can get the message with a lot of work. If you understand about 98% of the message, you can figure out the meaning of any words you don't know. So, if you have a text of 100 characters, and you understand only 90, odds are that you are missing those key rare content words. If you assume an upper conversational-level text that uses about 2000 characters then you need to be able to understand ~1,800 of them to potentially muddle through towards understanding the message. This is all complicated by the fact that every text differs, not only in its level, but in how many words--as opposed to characters--there are in the text, along with how much other context like pictures, tables, or graphs there is in the text. In my experience, social media is really easy to get, but even trying to read the front page of a newspaper is really challenging at my level.
super interesting to visually see what the amount of characters you know do in regards to reading.
Thats it! Im adding the top 500 most frequent chinese characters to my anki deck along side my hsk3!
This was super interesting, thanks for putting in the hours to create this video!
i think it makes a lot of sense. the most frequent 150 convey almost no meaning because theyre almost just grammar, and grammar is necessary in every single text so of course theyre the most frequent ones.
after those grammar bases are set, the first 350-650 characters are all going to start being super important to start conveying meaning, and like grace said eventually they all become rarer and more specific.
Having the plot presented to you and then having native chinese speech be in this plot is really fantastic for getting in the groove of the language.
So much better than if she had given her thoughts in English
This is interesting to me as a native Japanese speaker (our Daily Use characters are 2000-2100 letters) who is learning Cantonese. I often find the most hurdles are when we use differently the characters from their original meaning, and then I sorta get stuck and overloaded like overtaxed RAM if that weren't a reasonable metaphor 電腦wouldn't be a word)
it might be easier to start with Mandarin, because Cantonese created their own characters, then you can go on
@@danielzhang1916Probably all other dialects also
.
Heyyyy it's Grace . I'm one of her subs too. 欢迎来到频道!
Amazing experiment! You rule!
Jun Da website showing all his research is absolutily amazing for chinese begginers like me your channel never stop to surprise me
DuChinese is fantastic!
I'm new subscriber and this is like my sec vid and i realy like the content and ur personality the way u explain thing it's a gift buddy god bless u I'm abut to finish HSK 2 and im glad that i watched this video I'm already using anki to memorize characters and lots of graded reading it's really encouraging to kn I'm on the right track thanks a lot ❤
MTSU REPRESENT! I graduated from there and work with Dr. Jun Da in the Foreign Language Computer Lab.
Fascinating experiment! I think the 3rd grader texts were probably quite difficult and not all that common hahaa I would have tried also some novels and simple stories and longer chats.
Grade school textbook have different vocabulary than HSK, since HSK focuses more on functional vocabulary and textbooks tend to focus more on literature, including old literature. I would definitely have tried with more text, but this video was already 20 minutes long with 4 text, so... I don't think people would watch a longer video.
please make a video with a random japanese article and try to translate it in english and then tell how much you understood in percentage after translating the real meaning. would be more cool if you could repeat the vice versa with a native japanese speaker about chinese
I currently know probably about 650-700 characters. If I can read that much from knowing 1,000, I can't wait til I can read at least 1,000.
Great video! Really loved seeing Grace again here! Yay! Keep these videos coming.
Great experiment, well done to both of you! Everybody says - input input input, yeah you just need to read, good there are some online materials to start from (DuChinese you pointed, MandarinBean, some phone apps etc). Hope to be able to read normal books any time soon :)
You are an excellent content creator! Keep up the great work and energy!
Thank you! I'll keep trying:)
as a non-native, i think it's worth mentioning that even the words we don't understand can sometimes have clues for us to guess their meaning (ex: two characters words in which we know one of the characters, radicals, certain clues of whether it's a noun, adverb, adjective etc). so yeah we can probably read even better than you expect with only 500 characters
Thanks for this Andrew.
great video. very encouraging for a learner! 非常谢谢
I love to see unexpected collabs between two channels I follow 😁😁😁😁
if you are a native speaker, you can sometimes even mess up the order of the words in a sentence and you can still understand its meaning.
13:51 it felt like all she read was "from the ... of the ... to the ... of ..."
Pareto distribution?
In the real world.
Its amazing how your work exists and your just super awesome!!!!
1000/10 rating %
Great video!
Indeed, the most important take away is that 500 characters is NOWHERE near enough for understanding texts. That native girl simply knows pretty much all Chinese words, so she can guess what it could mean. I'm currently doing HSK2 and would have trouble with these texts even though I know far more than 500 characters.
that actually freaks me out that u used an article abt twins being born 87 days apart because i just read an article about that earlier. and it wasn't new, i thought about it and searched for it LOL
Agreed it's the number of words, not the character number that is important! Sometimes you can guess the meaning say 冰茶 ice tea, be part right 红茶 red tea--》English is black tea, or not have a clue 清淡 literally clear weak which means not spicy or greasy of food.
You've got a new subscriber, thanks for the awesome content! 你的视频非常有趣
中文覆盖率:
1,核心汉字:67%,300个❗️👍
2,基本汉字(高频字):80%,600个!(包括核心300个汉字在内)
3,中频字:400个!高+中=1000个!覆盖率高达90%!
4,低频字:1500个!高+中+低=2500个!覆盖率达到99%❗️👍
5,超低频字:2500个!覆盖率只有1%!😂共5000个覆盖率达到99.99%(包括了古中文)
Awesome video top content as always!! - KS
wow i have hope now thx
This was a very cool thing to visually see how important characters are (of course!). But honestly my problem isn't the characters themselves, I love learning them, but the words!
Yeah of course I know the character 的 for example, but I may not know the WORDS 目的、的確、地款等
加油
哇! 非常有意思!谢谢。。。where can I obtain that filtering software?
As a beginner Chinese learner with good English comprehension, I wonder how this experiment would go with English texts.
The way chinese letter uses 偏旁 to articulate more complex characters must really helps too. Even if one does not know a character, one might know a part of it and therefore make some sense
although some characters have very different pronunciation with different parts
You videos are always entertaining and to the point. Keep up the great work. 😊
Thank you for watching! I try my best;)
@@ABChinese you do a great job 👍🏻👍🏻
Interesting experiment, but for the reasons you mentioned in the video, it tells you absolutely nothing about how much a learner would understand. That would depend mostly on how many WORDS the learner knows. I bet if you found a learner who has only learned about 500 characters and gave him the test, he would understand very little. Of course, it also depends very heavily on what the text is.
I have no idea how many characters I know, but I think at least 2000. (But there are probably also a lot that I wouldn’t know individually, but know in context.) And it is still not enough to understand everything I read. A learner just can’t compare to a native speaker in an experiment like this. Also, I never understood the focus on characters instead of words. Like the example you gave, if you see the word 存在, just knowing the character 在 would be of no help to you at all. This gets much worse at slightly higher levels. Also, what does “knowing a character” even mean? Does it mean you know how to pronounce it and its basic meaning? I think if you don’t know all the various meanings it could have in different contexts, you don’t really know it. You’ve only started to come to know it.
I completely agree with you and had the same confusion on why people don't count words. I actually contacted Dr. Jun Da and asked him that question in passing. He told me the reason people don't count words is because it's almost impossible with the nature of Chinese and current technology. Since Chinese doesn't use spaces, and each character serves as a morpheme (that also CAN be a word), it's very difficult for machines to identify "words." There's a formula he used in his study to estimate the number of words in his corpus, but it's only an estimate. Who knows, maybe when AI takes over, we'll finally be able to count Chinese words!
@@ABChinese Chinese parsing is actually quite accurate. I've used a python library called 'jieba' in some projects. Even if the accuracy isn't perfect, it's more than enough to get a solid estimate at a word count.
I feel you should recruit an actual learner for the experiment. The reason is that Grace know way way more than the test quantity of characters that even if there are some missing characters, she has the ability to fill in the missing character from her deep knowledge and skew the comprehensive higher. But great video. Very informative non the less.
Hey that software is really well needed,
When can we get it publically? Cos I like
Language Software (not enough for PC)
Check the post I made in my community tab!
Realistic version: The Chinese teacher deducts 1 mark for every word you don't know or mispronounce, and then you fail your Chinese oral, and you get put in Chinese remedial class and have to stay back after school. Then your parents look at your grades and say "Never mind boy I also scored F9 for Chinese in my time, I will send you for Chinese tuition class."
Your videos are great ❤
Great content 🎉
Do you have a frequency list for vocabulary?
It was nice to see the 150 since that's about where I'm at, just a third of the way to 500 😂
13:36 Now she knows how I feel when I try to read something in chinese. hahaha
我会说普通话但没什么会读就让好多的中国好友们都惊讶,虽然我是个24小时忙的大学生 我还是会找时间在提高国语,所以呢父母给孩子们上中华学校挺重要的!!
是台湾人吗?哈哈哈哈哈
@@郭毅-x3y 不是,台湾人不可能用简体字
This was so interesting, thanks! I am still scared to learn, but I feel braver. A bit 🤣
I love this video
Based on an approx. trend (the inverse difference formula with R^2=.9996) you get the number of characters compared to understanding:
Number Understand %
150 23.5% (20% she said)
500 79.6% (80% she said)
1000 90.0% (90% she said)
1969 95.0%
2000 95.1%
2663 96.3% HSK 6
3916 97.5%
9756 99.0%
97350 99.9%
So Thing Explainer by Randall Munroe is possible in Chinese with relatively few concessions?!
0:40 Where is that setting? It looks like a middle school library
It is a public library haha
@@ABChinese Ohh ok lol
13:10 Timestamp for my future list
用拼音学中文,和使用输入法,这些东西都是近几十年才有的,也因为输入法非常方便,有自动校正,甚至根据使用者习惯记录词库,很多人离开学校日常少手写,逐渐提笔忘字,这很正常。
你觉得它没有逻辑也对,因为在逻辑这个英文单词演变出来之前,中文就以不同的形式存在了。语言文字就像工具,很多时候用的不好是人的问题,表达不严谨和废话一堆同样是书写的人有问题,它可以简练高效,你得看是谁用。
I'm interested in learning to read some Chinese mostly so I can more quickly scan through electronics vendor websites and part datasheets, anyone have any tips? Should I be focusing on studying that kind of technical content using tools like the Zhongwen browser plugin, or would it be more effective to start with more traditional "beginner" type content? Output, writing/speaking, isn't as important to my goal as input, but would it be substantially beneficial to practice anyway?
The beginner stuff will be useful to start, because you'll have to learn the foundational concepts no matter which route you eventually go
With 500 characters, you can't even read kindergarten books.
6:57
哈喽同学
图书馆
学生证掉了
7:10
谢谢啦
(idk)
两杯起送
12:00
那么
晚上
考试
(next page)
复习
那好的
考试重要
大题
不用了
17:50
居然在饭堂也能遇到
那么特别的缘分
(in Sticker)男大学生
有点巧
(idk)
下次约你
(next page)
下次是什么时候?
不是吧
好吧
Can you make it again with a chinese leraner from a different level ?
13:38 me at HSK-1 level trying to read Chinese texts 🙈
This is amazingly insightful! Is there anywhere I can download the software? I want to try the same experiment with Japanese.
I can ask my dad if we can make it a downloadable thing
@@ABChinese Yes please! (;^ω^)
have you completed the experiment? I know some 700-800 characters and can read simple texts in Japanese trivially and mostly extract the vital information from intermediate texts, but stumble pretty hard on any slightly advanced or specialised topics. I'd be interested to hear your experiences.
Pareto sends their regards 😎
But the thing is with chinese, that characters do not equal words. You need to know both characters and their combined mining in 2+ character words.
I know 1500 characters, but I can’t read many things even when I know all the characters in a text, because i don’t know the words in the text yet. Obviously Grace knows a lot of words as characters too.
Edit:
Finished the video I see you said this exactly the same points 😂
damn the 6:38 shot is so cool 🤯
Poor A is trying so hard with B lmao
很有创意的节目
老粉丝啊~~谢谢观看
Let's imagine you do something like this with English vocabulary: 'if you find this video interesting please subscribe' turns into 'if you find this -- -- please -- ' - And that's way more than 50% of the sentence ;)
BUT - what's so powerful about conversations is you can rephrase it 'if you like what I do, please follow my work' - and this is the same meaning using only the most basic vocabulary. When it comes to reading - there's no person on the other side to rephrase anything - that's why it makes much harder ;)
I wish there was that list of those common characters - I was waiting for it and it isn’t there…
Hi~ here it is: lingua.mtsu.edu/chinese-computing/statistics/
I used the "Modern Chinese" list and the link is also in the description
That is pretty impressive with only 500 characters.
I need that software so I can find the commonly used characters I don't know on my own
Check my latest post under the community tab! You can download it there
@@ABChinese tysm!!
I don't know Chinese, but I guess some words are made up of more than one character? Then wouldn't you need to know the meaning of combination in addition to the single characters or would you be able to guess?
Right, so a character in Chinese is a morpheme, it can be a word or can be part of a compounded word. A native speaker can kind of guess even if some characters are missing because the characters carry meaning even by themselves.
中文就像程序的代码一样,20%的汉字可以承担80%的使用,甚至更极端。
Nice programming!