i saw someone say that car horn is technically a language with only one phoneme which is used to communicate ideas like frustration, alarm, or a polite reminder based on repetition, timing, and context (for example: sitting at a red light, and giving someone two short honks can be a polite reminder that the light has changed, while one long honk can portray impatience and frustration)
Car horn can also communicate celebration, usually by repeating a pattern consistently, with different patterns used for different celebrations, like a soccer team winning a match
@@dayalasingh5853 I'm sure some sign languages do, in some sense or other, but most of the ones I know don't. Sign languages are more similar to Chinese in that every sign functions as a logogram. Of course, Chinese still has phonemes, because Chinese is a spoken language, so each logogram has associated pronunciations, but this clearly doesn't apply here, since there is no concept of "pronouncing" a sign in sign language, for obvious reasons. Therefore, there is no notion of phoneme. Now, there might be some sign languages out there which build signs from a set of "basic signs," and all other signs are just signed sequences of those basic signs, but I don't know any sign languages actually used in practice by people who use sign languages which have this construction.
I mean... people HAVE wrote entire essays about non-Euclidean geometry... not saying this means its best suited for everyday use or it could convey the majority of complex topics with ease, but it IS possible (edit I'm referring to toki pona if that wasn't clear idk how I forgot to mention that initially)
Here's my attempt at a more mathematical solution to the problem. Let there be c consonants and v vowels in the language, and with a (C)V syllable structure the total number of possible syllables would be # of possible syllables = s = (c + 1) v. If the max number of syllables in a lemma is n, then the total number of possible distinct lemmas would be # of possible lemmas = s + s^2 + ... + s^n = (s^n - 1) / (1 - 1/s) since you can have lemmas of length 1 up to n syllables. If we want this to be at least some lemma count L (we assume L = 2000), then the inequality would be (s^n - 1) / (1 - 1/s) ≥ L To then solve for the lowest max syllable count given s and L, we would get n_lowest = ceil(log_s(L (1 - 1/s) + 1)). (This can be approximated as simply n_lowest_approx ≈ ceil(log_s(L)), and as long as L ≥ s, we have that n_lowest_approx ≥ n_lowest) So for example the Central Rotokas language (with long vowels) has c = 6 consonants and v = 10 vowels, so s = 70, and so n_lowest = ceil(log_70(2000 * (1 - 1/70) + 1)) = ceil(1.786) = 2, i.e. the max syllable count per lemma can be as low as 2. If the language is analyzed as only having short vowels, then s = 35 and n_lowest = ceil(2.130) = 3. Note that this doesn't filter out any unreasonable vowel combinations like aaa, iii, uuu, but to filter them out the math would likely get really hairy and I can't be bothered to work it out.
Rather than using Σs^i, you could use Σs^i-v(c+1)Σ(2j-1)s^(j+f-1) where i→n, j→(n-f+1), v = |V|, c = |C|, and f is the shortest vowel repeat forbidden by the filtre, I believe. The first term is as you wrote, the sum of possible words of up to n syllables. But if we want to filtre out triples of identical vowels, then for words shorter than 3 syllables, we have no change. For a 3 syllable word, each vowel has c+1 possible first syllables which must then be followed by the syllable consisting of the bare vowel; for a 4 syllable word, the consonant can be in one of three spots (kaaaS, Skaaa, aaaCV), for a 5 syllable in one of 5, etc. For f = 3, this simplifies to, wlog, Σs^i-sΣ(2j-1)s^(j+2) = s(s^n-1)/(s-1) - s(s^3(1+s-(2n+1)s^n + (2n-1)s^(n+1)))/(s-1)^2 = (s (s^4 + s^3 + s - 1) (s^n - 1)/(s - 1)^2) - (2ns^(n+4)/(s-1)) While this isn't super pretty, it's still just a difference of rational functions over integers, so it wouldn't end up being too bad to work with. There's always a good chance that I missed something or screwed up somewhere (it's 00.30 and I've been awake since 03.30 yesterday 😅), but something like that should do the trick, I believe?
Calling it already: one bilabial, one alveolar, one velar, maybe a glottal, one open vowel, one front vowel and one back vowel. In lay terms: M, N, G, H, A, I, U.
It's crazy how quickly you get enough words. I just did a very conservative calculation on toki pona and, rules applied, 3 syllables max, there are way over 200k possible words
Okay. So to bring in a little bit of math here, you've given some pretty specific numbers, so it's totally possible to optimize the hell out of this. To sum up the theoretical limit in the rules you set out: For a little bit of background, imagine you have a language with 5 possible syllables. That means, obviously, that there are 5 different one-syllable words. But how many 2 syllable words would there be? Well, you have 5 choices for the first syllable, and 5 for the second, so it's 5*5 = 25. Similarly, for 3 syllable words, it's 5*5*5 or 5^3, which is equal to 125. So this means that if you're allowing ANY number of syllables from 1-8, that the total number of words would be x + x^2 + x^3 + x^4 + x^5 + x^6 + x^7 + x^8, where x is the number of distinct syllables that can be produced by your language. So, by solving for the smallest value of x that gives a number over 2000, you find it has to be at least about 2.4-- so you round up to 3. As long as there are 3 different possible syllables, you're golden. (This actually overshoots a ton, it actually turns out to be 9840 possible words. If you limit yourself to 7 syllables, you get 3279 words.) So what's the minimum number of phonemes you need to make 3 different syllables? Well, using the (C)V structure you laid out, you pick one consonant, and one vowel (plus one extra option for picking NO consonant.) This means the number of syllables is (C+1)*V, where C is the number of consonants (plus one for picking NO consonant) and V is the number of vowels. Playing around with this should convince you that you'll need at least 3 phonemes, and there's two ways to do it-- either three vowels, getting you three possible syllables, or two vowels and one consonant, getting you 4. For the sake of example, let's say that the consonant is T and the vowels are A and O. This would make a, o, ta, and to. This actually means we're overshooting the mark again, by a ton. Using the original 8-syllable rule, we have 87380 different possible words. We can cut this down to 5 syllables and still have 5460 words, which is all we need and plenty more. Now, what if we are willing to mess around with the syllable structure? Say, (C)V(C)? Well this increases the number of possible syllables to (C+1)^2 * V, and so we can reach that same 4 syllable structure with just 2 phonemes. The four syllables would be a, ta, tat, and at. Obviously, this language isn't realistic or... reasonable. But that's not a result of really pushing any one thing too far-- the syllable count isn't insane, and neither is the syllable structure. It meets the number you put as a goal and nearly triples it. But all the elements put together end up making something really ridiculous. But yeah, with 2 phonemes, 5 syllables, and (C)V(C) it's totally possible to hit that 2000 number without much difficulty.
I also thought about this approach initially, because I'm a scientist not at all a linguist. However I think a language that uses 2000 possible sounds to communicate 2000 ideas exactly is an unnatural thing. English words have a sort of vibe to them with certain phonemes used more in certain contexts, so I think densely packing semantics into the bare minimum number of syllables with no gaps makes it more of a code rather than an actual language that humans would invent together for general use.
Also some syllable combos need to be banned for obvious reasons, like how agma bans "aaa" in his video. A language where "a aa aa" means something different from "aaaaa" or "aaa aa" is going to be a total nightmare to actually use.
I think we can go smaller if we make it (C)V(C). With (C)V and the final inventory you have 8 options for syllables. We can drop a consonant and with (C)V(C) have 18 syllables, or also drop another vowel and have and still have 8 options. For another short calculation, before putting on a filter for weird repeats the total amount of words possible is x^(n)+x^(n-1)+...+x^(1). X is the total amount of syllables and n is the number of syllables in the word. For the results in the video, the total ends up being 4680. For (C)V(C) with only two vowels and two consonants we can get 6174 with a max of three syllables. Edit: I recognized a small error in my calculations, I forgot to include the standalone vowels as possible syllables. This raised it from 16 options to 18.
Just write lmao *flips the bird* *cartwheels into a limo* *blasts a hole in the wall with an explosive charge* *fingerguns* *points at the hole in the wall and signals the chauffeur to floor it* *blows the imaginary fingergun smoke* *dissappears into the distant horizon leaving a trail of fire*
Personally I think this doesn't take into account redundancy, since English, for example, doesn't make use of every single possible syllable or combinations of syllables for its base lemmas. And it doesn't seem realistic to have so many near homophones that could be mistaken for one another. What we need to find out is what the minimum level of redundancy required in a language is (basically the percentage of possible but unused syllables). Ik Mandarin has loads of homophones but people use clarifying adjectives and set phrases to deal with that so would the individual root be considered a lexeme or would the whole phrase with a lexical meaning be a lexeme?
phonetic redundancies (constraints on syllable and/or word structure) help in noisy environments, and grammatical redundancies (multiple marking, agreement, the whole plethora of synthetic or rigid morphosyntax) help with producing or identifying a coherent utterance
"And it doesn't seem realistic to have so many near homophones that could be mistaken for one another." Wrong. English has literally thousands of them. And many of them are used in common language. This is why you always seen misspelling "than" as "then" and vice versa, or "you're" and "your." It's not only realistic, it's extremely common. So common, entire books have been written on this.
All languages have at least 3 places of articulation for consonants and at least two heights for vowels. In addition all languages have stops, but not all languages have other kinds of sounds. So [ptk aə] is the simplest naturalistic inventory. Təkə Pəta is my favourite minimalist conlang.
Lol it's so funny that I just found this vid, cause I started a conlang that has only 6 phonemes (which is now the prolang for a bunch of dialects) on the back of my biology test about 2 weeks ago and now it's like a full conlang (not finished but yeah)
we once started making a conlang with 9 phonemes (7 consonants and 2 vowels), with a frankly silly amount of allophony and a "(C | FS | SF) V (C) (C) (C) (C)" syllable structure. it was called "iu", and the phrase "i love my friends" is translated as //ip nis ipts psimnk//
And then there was Ancient Barohsawkies, my conlang which was the predessessor to Old, Middle, New and Modern Barohsawkies. Ancient Barohsawkies probably has only 2 phonemes: 1 consonant, which is m and 1 vowel, a. That’s it!
i would do, for consonants: /ŋ/, /x/, /ʄ/, /ɦ/. vowels are: /ə/, /ɤ/, you could also get rid of some it might be too much but i think you could make some really beautiful words with them. still could be made shorter though. just really long words to say exactly what you mean with like 5 or more syllables. could be a contendor for cursed conlang circus as well but i dont like using common phonemes as i really think this language should have really unique sounds and really few of them so it just sounds amazing. p.s. i will probably use these phonemes on a conlang for any cursed conlang circus in the future.
@@flyingduck91 they're pretty fucking funny to have in a conlang though, like i think it would be hilarious for the whole conlang to juat sound like a train wreck, again it would be better for like the cursed conlang circus
I tried fiddling with the syllable structure (C)(C)V(C)(C) with consonants and and vowels and as well as limitations on "aa, ii, tt, ss" and maximum of 3 syllables I was able to make GenGo generate 2009 words! Four phonemes with no long sounds, three syllables and (C)(C)V(C)(C), not bad :D It did generate some atrocities such as "saststatsats" but eh :P
Of course that many vowels in a row happens in reality. Just look at Dutch words like koeieuier, iaë, ooi, kraaieeieren, and zaaiui. Granted, most of those "i"s are semivowels and not true vowels, but it's still very cursed both phonologically ans orthographically. Especially so when you consider that they break up into syllables like koei-e-ui-er, i-a-ë, kraai-e-ei-e-ren, and zaai-ui respectively.
@@swagmundfreud666 they're still the same sound, just non-syllabic. Though not all of them. They're are some syllabic/non-syllabic /i/ pairs there that are just written as a single .
Ok like from what I understand, a word like Zaaiui would be pronounced something like /tsa:juj/ (sorry if my IPA characters are off, I don't know much about Dutch vowels). @@Gamesaucer
Ah yes, iaë. Of course. We say it every day. The well known (in modern Dutch for almost all verbs definitely not long extinct) subjunctive of a verbalized form of the onomatopoeia "ia", the sound of a donkey. Which, as onomatopoeia surely do, adheres to the language's phonology. Nice find 🧠🧠🧠🧠🧠
sina wile ala wile toki e sona linja? jan sama mi li jan sona la ona li ken pana e sona tawa mi, taso ona li sona ala e toki pona. ni la, sina o pana e sona sina pi sona linja tawa mi kepeken toki pona taso a!!
@@thomas4841 sona linja li sona nanpa. sona linja e ni: ijo ale li linja lili mute. linja ni li kalama, li kama wawa. jan sona pilin e ni: sona linja li pana sona sin pi mun mute. tenpo pini la sona linja li pana sona pi lupa pimeja! sona linja pona mute tawa jan sona a! .. .. .. What I think you said: Will you teach me about string theory? People like me are scientists, they can teach me knowledge, but they don't know toki pona. Therefore, teach me your understanding of string theory using only toki pona! What I think I said: String theory is mathematics. String theory is this: everything is lots of little strings. These strings vibrate, and become strong. Scientists feel that string theory will give new information about the stars. Already, string theory has given information about black holes! String theory is very good for scientists! And I got that info from the intro on the wikipedia page for string theory lol. I don't understand much of it myself but wanted to give it a shot.
I'm currently making (or at least thinking about completing) a conlang which only has these phonemes /p/ /k/ /s/ /l/ /m/ /n/ /a/ /i/ /u/ and a (C)V(N) syllable structure, with a maximum of two non identical vowels in a row and no consonant clusters.. It's intended to be some sort of basic pidgin that can be easily learnt by anyone,sort of like Russenorsk. I think that I will easily accomplish this goal if I try, since the number of possible disyllabic words in this language is 42.42 -7×2×3 -21×6 = 1596, and I plan to have around ~2000 words
Minimum number of phonetically distinct syllables needed to get at least 2000 phonetically distinct words assuming a maximum syllable count per word of: 1 -> 2000 (considerably fewer than the number allowed by English phonology) 2 -> 45 (though you can get 1980 with 44) 3 -> 13 (you can get up to 1884 with 12) 4 -> 7 5 -> 5 6 -> 4 7/8/9 -> 3 10 - 1999 -> 2 2000+ -> 1 This is assuming any string of valid syllables shorter than or equal to the maximum length is allowed, which shouldn't be all that difficult to pull off with such a tiny syllable inventory.
If you want to know the absolute maximum number of syllables in a language you can just multiple the number of possible onsets, with possible nuclei, with possible codas. Then to figure out the number of words for a given syllable length, take that number and raise it to the power of the length. Add up all of those to get the total number of possible words. For example, using [ p, t, k ] and [ a, i, u ] (C)V you get: 4*3 = 12 possible syllables. 12 * 12 = 144 for two syllables. 12^3 = 1728, for a total of 1884 words with a max of three syllables. The only problem is that a lot of those syllables will be very similar making distinctions harder to realize. You can correct for certain rules by figuring out the number of combinations that violate those rules and subtracting them from the total(again just using multiplication)(this isn't always intuitive or obvious how to do).
What a great video; so many things struck me as completely strange, such as: why would a source ever claim that a language had infinite words??? Why does the calculator not give a consistent word count???
In a language like Inuktitut or Nahuatl you can incorporate whole nouns into the verb forms, and construct a fairly complicated sentence as a single very long word. In theory there's no limit to how many different ways you could do that, so that's one way infinite "words" are possible. But then you get into the sticky question of how to define "word" vs "morpheme"
i think it generates all 9999 words, then throws away the duplicates and that's the final count and because the word generation is random the number is gonna be random too
The weird thing is this language sounds awesome, could totally see using something like that on a game or movie 🤯 (maybe with a little more work on grammar)
I once tried to make a conlang with the phoneme inventory of k, b, s, n, l, o. (one voiced plosive, one unvoiced plosive, one fricative, one nasal, one liquid, one vowel). But I ran into the same problem, I quickly ran out of combinations for syllables.
I was trying to get close to that 2000 and came up with this phoneme inventory that with obligatory CV structure has only 13 unique syllables: C: p t k b d g m n l r s w ʔ, V: ə Which has 13*13*13 = 2197 3-syllable words. A bit too many, considering there are also 13² disyllabic words and 13 monosyllabic ones. To make it more pronouncable, use word initial stress and realize word-initial schwa as [eɪ] except in monosyllabic words. Other allophony to make the words more memorable could be things like lowering the vowel to [a] after b, d, g and raising to [u] after w. Conveniently it could be written in an abjad without losing any information. Does make me wonder what a realistic minimal number of syllables would be, since the words do not sound all that weird and I don't get a feeling that syllables are missing. Wouldn't be surprised if there are languages out there with much more restrictive syllable structure than the three you mentioned.
String theory in toki pona: ale li tan linja lili. linja ni li lili mute, li lon insa kiwen pi kipisi ala. and no I would not want to sit through a lecture that's like this.
@@MURDERPILLOW. Literal translation: all [verb specifier] because line small. line this [verb specifier] small much. [verb specifier] located inside non-spiky solid material
16:54 with such a simple syllable rule, you can math out exactly the number of possible syllables: (C)V has (consonants+1) * vowels number of 1-syllable words. then just take that to various powers for the number of syllables. the (ptkeo) language (we stan horizontal vowels here) with (C)V has 8 possible syllables (pe te ke e po to ko o) and therefore 64 bisyllables, 512 trisyllables, and 4096 quadrusyllables. taking half of all quadrusyllables, all trisyllables, and all bisyllables to be the language (with monosyllables reserved for articles and decimal modifiers like -ty / -hundred), that's 2368 words.
Once you get this low in phoneme count, the pronunciation of the phonemes should start to vary a lot. In this case, a would probably include o and u, and i would include e and y. So to the ear, the words would sound more varied, but technically a speaker could use any substitute vowel and be understood, although it probably wouldn't sound right to a native speaker. A similar thing would and does happen to consonants.
While (C)V *seems* like thr simplest syllable structure, it actually represents a reasonable complexity. A more restrictive phonotactic structure would result in a simpler (easier to pronounce, more naturalistic) language.
it wouldn't be that tough to write out the R script to manually calculate some permutations on a set of vowels and consonants, right? Intro stats kind of escapes me in the summer's dank haze, but those kind of combinatorics are definitely baby shit. wicked video. I miss iddubbbbbbzzzz too D:
this is really interesting but i think the experiment would be a bit better if u limited it to no dipthongs or any vowels next to eachother whatsoever, id be curious to see how that would turn out
That would be sick as hell. I believe there are actually pokemon sounds which are slightly modified versions of other pokemon sounds... interesting to look into. and VERY grating to try to use.
while it's nowhere near as minimal as 5 phonemes, I've been working on a vowel-only conlang that I'd say is pretty minimal. if you don't include tone, diphthongs, or triphthongs, it comes out to just 8 vowels: i, I (but short), 3 (but backwards and short), æ, a, e (but shwa), u, and o. with tone, they can either be high or low, so it's really 16. personally, I like to think of the diphthongs and triphthongs as being like the syllable structure, but with a few specific restrictions (no same sound-same tone immediately next to each other, and no "high-high-low" or "low-low-high" tonal patterns for triphthongs (h-l-l and l-h-h tones in triphthongs are falls and rises, respectively, and the only way I could conceive h-h-l and l-l-h were just sharp falls and rises, but they sounded too much like separate syllables. thinking more, I think I should remove two of the same sound being next to each other in rise and fall within triphthongs, regardless of tone, to cut out confusion with diphthongs, because there wouldn't really be a good way of distinguishing them)), because then I can write the syllable structure as (V)V(V), which is both funny and vaguely resembles a sleeping owl, which is what I've codenamed the language (sleeping owl).
In the final creation of words, you forgot to filter out for 4 and of the same letter, this would mean the maximum is decreased by 2, if I counted correctly. It would still not matter, but it is there, in the second row, there is the word paaaa.
Here's my take on the math behind this. Let's say we have CV(C) and 3 consonants (which can be in initial or closing position) and 3 vowels. There are 3 possibilities for C, 3 for V and 4 for (C) (as the fourth choice is none). The number of basic syllables is just the product, in this case 36. But we will call it s. How many two syllable words are possible allowing repeats? It's just s times s, or s^2. So without repeats, it's s less than that, or s^2 - s. Adding the one-syllable possibilities to that, the number of one and two syllable words with no repeated syllables is s^2. So in our case we actually have 36^2 = 1296 unique one and two syllable words already. Three syllables with no repeats: s(s-1)(s-2) = s^3 - 3s^2 + 2s. Adding in the s^2 possible one and two syllable words, you'll get s^3 - 2s^2 + 2s. With s = 36 we now have 44136 unique words. Let's do CV and use 5 consonants and 3 vowels. So s = 15. We get 2955 unique words. So the moral of the story is if you have anything more than a truly minimal phoneme set with a truly restrictive syntax, you'll get your 2000 words easy.
Depends on the language and how it's whistled. Some whistle languages' tonogenesis is the first layer of it, and therefore the "tones" are more like "consonant contours". In a strictly technical sense. I'm not saying you're wrong in any way, just trying to add a shade of context. Even if you think of it as a contouring consonant, it's acoustically and articulatorily a tone.
Okay this comment I made awhile back has inspired an idea that I've been ruminating on: for a cursed language (for the cursed conlang circus of course) you could make a language whose only sound is "O" but with various tones. No, not whistled... *spoken.* So it would go something like Ooôo ǒo᷄oo᷈ooo᷇ oo᷉o᷉oô o᷅oooo᷆o ooǒ Sure, you'd sound like a chimpanzee having a panic attack... but it would be a form of communication at least. Unfortunately I am nowhere near competent enough to even approach making a conlang but man it would be funny if someone made this work.
i think the people closest to knowing 100% of english words would be like champion scrable players, even then theres a big gap between nigel richards and everyone else, so he problably knows the most words
With a living language, is knowing all the words even technically even possible? Like at what point do you start considering sesquipedalian a word, and skrunkly not? Moreover, I wonder if loan words count to these languages word counts. Technically sushi is an Japanese word and an English word simultaneously and with the same meaning, so which language does it count towards? And then how do you judge Tycoon, which is technically both a Japanese and an English word, and a loan word, but the loan word has a different meaning compared to the word from which it derives.
The thing is, English is funny because it has a habit of stealing root words from Latin and Greek and then mashing them together in its own grammatical structure to create new English words
I'd argue the language with the fewest sounds might be the honking of cars, morse code if it was used as a conlang, or a language that is not spoken (0 sounds).
This reminds me of this time when i was little (and before i was into linguistics), when me and my sister tried to make a language with "a" and "b". This was it's """"""""""sounds"""""""""": A /æ̆/ AA /æ/ AAA /æ:::/ B /b/ BB /bʊ:/ BBB /bʊ:::/ In case you noticed, I used IPA Notation, I'm just putting my current linguistics knowledge into describing this. Here's the syllable structure: (C)(V) Words can be made like: BA, ABB, AAABB, ABBBABAAB, etc. Word order is English. This language that I just randomly remembered I made 2 years ago makes me cringe-
a while ago i was lurking the (new?) cbb forum and stumbled upon a scratchpad that some guy made where the protolang had the phonemes /t k d g i u e a/ with some of the most deranged seeming diachronic sound changes i had ever seen at that point and it stuck with me for years
If you had a language with a syllable structure of CV (so that any combination of syllables is a valid word) with c consonants and v vowels, then you'll have c^n*v^n = (v*c)^n words with n syllables and [(v*c)^(n+1) - 1]/(v*c - 1) words with up to n syllables. For that language with the inventory of {p, t, k, a, i} you'd get 1550 words with up to 4 syllables and 9331 words with up to 5 syllables. Anyways I did a little spreadsheet and here's a few nice limits: (note that swapping consonant and vowel ammounts gives the same result) 2c,1v -> 10 syllables = 2047 words 8c,7v -> 2 syllables = 2353 words 5c,4v -> 3 syllables = 3616 words 7c,2v -> 3 syllables = 2955 words 4c,2v -> 4 syllables = 4681 words 7c,1v -> 4 syllables = 2801 words 5c,1v -> 5 syllables = 3906 words
C: m n p t k V: a i u structure: CV syllables: 1-3 These phonotactics give 3615 possible words. Excluding "ti", a sequence absent from multiple Polynesian languages as well as a certain minimalist conlang mentioned in the video, this still results in a maximum lexicon of 2954 words. I admit this isn't perfect and has some ambiguity; for instance, how would one differentiate between "kama pinaku" and "kamapi naku" phonemically? Still, it's a decent starting point.
Solresol gets by with only 7 syllables. A mere 3 possible syllables already gets you well over 6,000 phonologically distinct words of 8 syllables or less, so by those criteria you could get by with only 4 phonemes (3 consonants and a vowel, or vice-versa). Or you could go with 4 syllables, 2 consonants and 2 vowels, and get over 65,000 words of 4 syllables or less.
I've decided to try making a realistic conlang with the consonants [m p k j w] and the vowels [i u a]; there are exactly 16 valid syllables but I honestly think it'll work@
For Toki Pona, you could almost consider it an agglutinative language. jan is the word for person. pona is the word for person. jan pona means literally good person, but is usually translated as friend. So is jan pona a different word?
It's more efficient to reduce consonants than vowels with the (C)V consonant count, but it's not *much* more efficient. With 3 consonants and 2 vowels, you're sometimes above and sometimes below 2,000 words, with 2 consonants and 3 vowels, you're sometimes above and sometimes bellow 2,500 words. With 1 consonant and 3 vowels, you're in the 1,3XX range, with 3 consonants and 1 vowel, you're between 320 and 340 words.
In one of my conlangs, there is an estimate of 63840 possible syllables (127680 if you take in to account stress), and currently no upper bound on how many syllables a word can have, which makes the amount of potential words infinite.
no natural language have limit on how many syllable that a word are allowed to have it only a matter of if it too short to be distinct or not or long to be practical or not
To add to a hypothetical low-phoneme lang, what's the largest amount of sounds you could have in a language while still having a low phoneme count? How allophonic can you get a language with like 9 phonemes and make it sound way more complex than it is?
I don't quite understand how the generator comes up with those numbers. With the phonology you presented, there are six consonants and five vowels, so 30 CV syllables, 35 (C)V syllables, and therefore 35^2 = 1225 two syllable and 35^3 = 42.875 three syllable words. While the two syllable count kinda makes sense, I don't get why the three syllable count isn't maxed out at 9999, unless it's using some kind of secret phonotactics rules not explained here Btw, if you just go combinatorically, 13 syllables is the minimum to get over 2000 for three syllable words, and 45 for two syllable words. So for two syllable words its actually quite reasonable even for a small phonology (and you could argue Mandarin does that, for example). For the three syllable case, a (C)V structure seems quite nice, and you can simplify quite a bit, 3 vowels and 4 consonants would be enough, for examle a,i,o and t, k, s, n (which would actually be quite nice for an auxlang, since those sounds appear in most phonologies). And with four syllable words considered ... 7 syllables are enough. So p, t, k, with a, u would actually be okay, as thats 8 syllables, you can even prohibit one, so your selection actually wasn't minimal, but very close.
Five phonemes is maximum. That makes me think what is those commonly Five Phobemes. They are very commonly fine a lot language for some reason and for some reason I can't remember them hmmmmm anyway Great Vid
I still have to watch the video but 9x5 with CV (not even "(C)V") with 2 syllables is 9*5+((9*5*(9*5)) es 2070 words. If you add stress diferentiation doubling only 5% of the 2-syllable words you get around 100 more words. If you add an option N ending at the end, japanese-style and allow optional consonants even if you dont allow vowel clusters, you end up with (V+VN+CV+CVN) * (CV+CVN) = (V+VN+CV+CVN) * (CV+CVN) , even without stress you can go down to 5 consonants and 3 vowels giving you almost 2.5k wods even without stress. Withth "diphthongs" you can get to 4 consonants and 2 vowels and also withotu stress, you get nearly 3k words (because you are effectively using 4 vowels. If you only use 1 dipthong per word and only on the second syllable that does not allow vowel nor diphtongs to touch each other, as a further constriction, then you still get around 1.5k (if you allow the dipthong on the first syllable instead you get exactly 1.5k which is a bit more but I suspect youd need more phonemes to make the difference worth it)
One of my small inventory conlangs had: P T K V S H R N A E I O U (8*5) with a "word" (loongest syllable vs the maximum ending syllables) structure of (C)(R)V(V)(N) * C(R)V(N) = a * b where: [R is R, (V) is istrictly I or U and N is R,N or S, repeated letters not allowed] a = V + VV + VN + VVN + CV + CVV + CVN + CVVN + CRV + CRVN + CRVV + CRVVN = 5 + 12 + 15 + 36 +40 + 96 + 120 + 288 + 35 + 105 + 84 + 252 = 1088 b = CV + CVN + CRV + CRVN + CRVV = 40 + 120 + 35 + 105 = 300 Meaning that if my math was correct, is 1088+(1088+300)= 327,488 possible, short, somewhat aesthetically pleasant possible words, which is more than enough to me (though I generally try to fall in the 12-15 range for consonants)
12:45 Ah, but with puupuua as a word, we need to know whether the grammar has reduplication. And to keep the language interesting, you could make one of the consonants a click.
With only 45 possible syllables, you could make 2025 two-syllable words; with only 13 possible syllables, you could make 2197 3-syllable words; and with only 7 possible syllables, you could make 2401 4-syllable words. So apparently you could have a serviceable language where the syllables are all just saying "Steve" in 7 different tones of voice (ie. intrigued, delighted, perplexed, playful, scared, disappointed, apologetic), even if those words are limited to 4 syllables each.
further idea: if every possible syllable doesn't require movement of lips or jaws (leave out the w and m, for example), then it would be difficult to tell if people were talking without hearing their words, it would look like they were just staring at each other with their mouths slightly open, this would limit the vowels (I think only schwa would be allowed) but there would still be ample consonants, it would sound like "the nun hugged a hung young stud, an unsung hun's son shunned a stun gun".
is linguist audio the new leftist audio? is there overlap? i find it funny that linguists do not care about mic quality I have been struggling to find one. Klein has the best I've heard and even he needs a pop filter (with much love, as I no longer have a good setup myself, and my humble mic can't be what i had before)
lmao, certainly overlap. I actually had my good mic recording during this one to try and sync it up but I had it positioned stupidly and it sounded even worse, hahaha
Not entirely sure what that 6 ish minute GenGo segment was supposed to demonstrate. That it only managed to generate ~3300 words out of possible >22K for C=ptk V=aiu (C)V*{1,4} words is hardly impressive or relevant.
Seems interesting that Rotokas, Hawaiian, and Piraha are languages with no consonant clusters, I am surprised that a language with such a small phonetic inventory has all the words ending in vowels, no syllable ends in a consonant. ts and dz are consider to be separate sounds in there own right, but could they actually be consider akin to vowel diphthongs.
I noticed that when you dropped the long vowels from the Rotokas phonology, your word generator still created things that looked like they had them (oo uu etc.). By allowing words with V syllables succeeding V or CV syllables, you've accidentally re-introduced the vowel length distinction. Oops.
I was learning Toki Pona and asked on the Toki Pona official Discord how they would translate "The bathroom is the third door on the left." It resulted in a three day discussion that came up with a sentence of about 24 "words". Really dampened my enthusiasm on the language.
the language isnt rlly meant to be a whole practical everyday language tho, it does make sense that its like that. im currently learning it and loving it
@@rainbowrotcod What the whole discussion really highlighted for me was that as much as the community is loathe to introduce new words, it would be well served with some concrete directional words and a whole rework of the number system.
Two consonants, two vowels, and let's make CV the only allowed syllable structure to keep thing interesting. This gives us four possible syllables and 65536 words of 8 syllables. Done.
I don't know why you had to go with only CV syllables though. I know you were sort of basing it on Rotokas, but plenty of languages would allow doubled consonants and even consonant clusters in the middle of words. If you do that, allowing for only 'CVC(C)V(C) and allow only a maximum of 3 syllable morphemes, and you have that sound inventory of [p, t, k, a, i, u], you get a pretty naturalistic sounding language, even without adding things like tone and vowel length. Patka appa itkupa, etc. You could even have some phonotactic rules (like allowing the consonant cluster /kt/ but not /tk/) and lengthening or fusing two vowels when they touch each other, it would still sound naturalistic. For example, instead of "appa atati ukaku", the vowels fuse or lengthen to become "appaa 'tati kaku". A language like this could easily be a real language, but it would struggle to have enough DISTINCT sounding words. Languages with really small phonemic inventories also struggle to form compound words, and words with "sound symbolism". Which is why we tend to see these kinds of languages develop things like tone to compensate. But, the question of whether a language is "naturalistic" or not is only about whether it sounds like it could be a real language, not whether it is a feasible language for a community of speakers who have to interact with the outside world and invent words for abstract philosophical ideas. A language like Toki Pona that you mentioned works extremely well at communicating a limited number of concepts, which works great for a creole or auxiliary language, but if it ever had a group of native speakers you can bet they wouldn't keep their vocabulary to only 2,000 words.
There is an issue with this method. This sort of experiment does not have any parameters for redundancy. Sure single syllable words like "stat", and "spat" are distinguished by a single consonant, but imagine actually trying to converse with words like "kitatakapatika" and "kitatapakatika", and "kitataapakatika". I would suggest putting in some kind of filter to account for near-duplicates over 3-4 syllables
So it’s a little subjective, but I typically pick from a different Swadesh list when I speedrun a language just to get the first hundred or so out of the way. though I usually end up deleting the pronouns and prepositions from the Swadesh list because I often have them created already or need to make sure they’re simple or irregular.
i saw someone say that car horn is technically a language with only one phoneme which is used to communicate ideas like frustration, alarm, or a polite reminder based on repetition, timing, and context (for example: sitting at a red light, and giving someone two short honks can be a polite reminder that the light has changed, while one long honk can portray impatience and frustration)
/pikha/
welcome to semiotics
Car horn can also communicate celebration, usually by repeating a pattern consistently, with different patterns used for different celebrations, like a soccer team winning a match
Beeeep beeeep beep beep beeeeeep beep bebebeep beeeeeep beep beep
Beebeebeebeebeeeep 🤬
Agma Schwa: this is the fewest sounds a language can have.
Sign language:
Sign language is just a version of different spoken languages
I mean from my understanding they still have phonemes
@@dayalasingh5853 Not really.
@@angelmendez-rivera351 no? My intro to phonology and phonetics prof explained them as having phonemes.
@@dayalasingh5853 I'm sure some sign languages do, in some sense or other, but most of the ones I know don't. Sign languages are more similar to Chinese in that every sign functions as a logogram. Of course, Chinese still has phonemes, because Chinese is a spoken language, so each logogram has associated pronunciations, but this clearly doesn't apply here, since there is no concept of "pronouncing" a sign in sign language, for obvious reasons. Therefore, there is no notion of phoneme. Now, there might be some sign languages out there which build signs from a set of "basic signs," and all other signs are just signed sequences of those basic signs, but I don't know any sign languages actually used in practice by people who use sign languages which have this construction.
16:37 i love how it generated "aaa" even though it was supposed to filter that out
not to mention "iiii" on line 3
he forgot to add commas between each filtered string
it was screaming from all of the stress
I mean... people HAVE wrote entire essays about non-Euclidean geometry... not saying this means its best suited for everyday use or it could convey the majority of complex topics with ease, but it IS possible
(edit I'm referring to toki pona if that wasn't clear idk how I forgot to mention that initially)
WHAT THATS INSANE DO YOU GOT A LINK
@@thepanplate ua-cam.com/video/tL1WBUOqE48/v-deo.html
@@thepanplate youtube hates links but look up "nasin pi sitelen ma pi jan Ekite ala"
@@thepanplate probably this one by jan Telakoman: ua-cam.com/video/tL1WBUOqE48/v-deo.html
I want to see what happens if you show one of these essays to a random Toki Pona speaker, and ask them what it was about.
Here's my attempt at a more mathematical solution to the problem.
Let there be c consonants and v vowels in the language, and with a (C)V syllable structure the total number of possible syllables would be
# of possible syllables = s = (c + 1) v.
If the max number of syllables in a lemma is n, then the total number of possible distinct lemmas would be
# of possible lemmas = s + s^2 + ... + s^n = (s^n - 1) / (1 - 1/s)
since you can have lemmas of length 1 up to n syllables. If we want this to be at least some lemma count L (we assume L = 2000), then the inequality would be
(s^n - 1) / (1 - 1/s) ≥ L
To then solve for the lowest max syllable count given s and L, we would get
n_lowest = ceil(log_s(L (1 - 1/s) + 1)).
(This can be approximated as simply n_lowest_approx ≈ ceil(log_s(L)), and as long as L ≥ s, we have that n_lowest_approx ≥ n_lowest)
So for example the Central Rotokas language (with long vowels) has c = 6 consonants and v = 10 vowels, so s = 70, and so n_lowest = ceil(log_70(2000 * (1 - 1/70) + 1)) = ceil(1.786) = 2, i.e. the max syllable count per lemma can be as low as 2. If the language is analyzed as only having short vowels, then s = 35 and n_lowest = ceil(2.130) = 3.
Note that this doesn't filter out any unreasonable vowel combinations like aaa, iii, uuu, but to filter them out the math would likely get really hairy and I can't be bothered to work it out.
You could do it with stars and bars. Shouldn't be that annoying. I think most highschool combinatorics classes should cover it.
Rather than using Σs^i, you could use Σs^i-v(c+1)Σ(2j-1)s^(j+f-1) where i→n, j→(n-f+1), v = |V|, c = |C|, and f is the shortest vowel repeat forbidden by the filtre, I believe.
The first term is as you wrote, the sum of possible words of up to n syllables. But if we want to filtre out triples of identical vowels, then for words shorter than 3 syllables, we have no change. For a 3 syllable word, each vowel has c+1 possible first syllables which must then be followed by the syllable consisting of the bare vowel; for a 4 syllable word, the consonant can be in one of three spots (kaaaS, Skaaa, aaaCV), for a 5 syllable in one of 5, etc.
For f = 3, this simplifies to, wlog, Σs^i-sΣ(2j-1)s^(j+2)
= s(s^n-1)/(s-1) - s(s^3(1+s-(2n+1)s^n + (2n-1)s^(n+1)))/(s-1)^2
= (s (s^4 + s^3 + s - 1) (s^n - 1)/(s - 1)^2) - (2ns^(n+4)/(s-1))
While this isn't super pretty, it's still just a difference of rational functions over integers, so it wouldn't end up being too bad to work with.
There's always a good chance that I missed something or screwed up somewhere (it's 00.30 and I've been awake since 03.30 yesterday 😅), but something like that should do the trick, I believe?
Calling it already: one bilabial, one alveolar, one velar, maybe a glottal, one open vowel, one front vowel and one back vowel. In lay terms: M, N, G, H, A, I, U.
a glottal isn't necessary
M I N G I
M U M I A M G U N N A H U G U
PTKAIU feels better
@@slyar I was between PTK and BDG, then it dawned on me that M and N are even more common worldwide than P and T respectively
It's crazy how quickly you get enough words.
I just did a very conservative calculation on toki pona and, rules applied, 3 syllables max, there are way over 200k possible words
Okay. So to bring in a little bit of math here, you've given some pretty specific numbers, so it's totally possible to optimize the hell out of this. To sum up the theoretical limit in the rules you set out:
For a little bit of background, imagine you have a language with 5 possible syllables. That means, obviously, that there are 5 different one-syllable words. But how many 2 syllable words would there be? Well, you have 5 choices for the first syllable, and 5 for the second, so it's 5*5 = 25. Similarly, for 3 syllable words, it's 5*5*5 or 5^3, which is equal to 125.
So this means that if you're allowing ANY number of syllables from 1-8, that the total number of words would be x + x^2 + x^3 + x^4 + x^5 + x^6 + x^7 + x^8, where x is the number of distinct syllables that can be produced by your language. So, by solving for the smallest value of x that gives a number over 2000, you find it has to be at least about 2.4-- so you round up to 3. As long as there are 3 different possible syllables, you're golden. (This actually overshoots a ton, it actually turns out to be 9840 possible words. If you limit yourself to 7 syllables, you get 3279 words.)
So what's the minimum number of phonemes you need to make 3 different syllables? Well, using the (C)V structure you laid out, you pick one consonant, and one vowel (plus one extra option for picking NO consonant.) This means the number of syllables is (C+1)*V, where C is the number of consonants (plus one for picking NO consonant) and V is the number of vowels. Playing around with this should convince you that you'll need at least 3 phonemes, and there's two ways to do it-- either three vowels, getting you three possible syllables, or two vowels and one consonant, getting you 4. For the sake of example, let's say that the consonant is T and the vowels are A and O. This would make a, o, ta, and to.
This actually means we're overshooting the mark again, by a ton. Using the original 8-syllable rule, we have 87380 different possible words. We can cut this down to 5 syllables and still have 5460 words, which is all we need and plenty more.
Now, what if we are willing to mess around with the syllable structure? Say, (C)V(C)? Well this increases the number of possible syllables to (C+1)^2 * V, and so we can reach that same 4 syllable structure with just 2 phonemes. The four syllables would be a, ta, tat, and at.
Obviously, this language isn't realistic or... reasonable. But that's not a result of really pushing any one thing too far-- the syllable count isn't insane, and neither is the syllable structure. It meets the number you put as a goal and nearly triples it. But all the elements put together end up making something really ridiculous. But yeah, with 2 phonemes, 5 syllables, and (C)V(C) it's totally possible to hit that 2000 number without much difficulty.
I genuinely cant believe i read all of this, and understood it. i feel like i just read a college essay.
I also thought about this approach initially, because I'm a scientist not at all a linguist. However I think a language that uses 2000 possible sounds to communicate 2000 ideas exactly is an unnatural thing. English words have a sort of vibe to them with certain phonemes used more in certain contexts, so I think densely packing semantics into the bare minimum number of syllables with no gaps makes it more of a code rather than an actual language that humans would invent together for general use.
Also some syllable combos need to be banned for obvious reasons, like how agma bans "aaa" in his video. A language where "a aa aa" means something different from "aaaaa" or "aaa aa" is going to be a total nightmare to actually use.
I think we can go smaller if we make it (C)V(C). With (C)V and the final inventory you have 8 options for syllables. We can drop a consonant and with (C)V(C) have 18 syllables, or also drop another vowel and have and still have 8 options. For another short calculation, before putting on a filter for weird repeats the total amount of words possible is x^(n)+x^(n-1)+...+x^(1). X is the total amount of syllables and n is the number of syllables in the word. For the results in the video, the total ends up being 4680. For (C)V(C) with only two vowels and two consonants we can get 6174 with a max of three syllables.
Edit: I recognized a small error in my calculations, I forgot to include the standalone vowels as possible syllables. This raised it from 16 options to 18.
Just write lmao
*flips the bird*
*cartwheels into a limo*
*blasts a hole in the wall with an explosive charge*
*fingerguns*
*points at the hole in the wall and signals the chauffeur to floor it*
*blows the imaginary fingergun smoke*
*dissappears into the distant horizon leaving a trail of fire*
That's a rather complex system of writing
Personally I think this doesn't take into account redundancy, since English, for example, doesn't make use of every single possible syllable or combinations of syllables for its base lemmas. And it doesn't seem realistic to have so many near homophones that could be mistaken for one another. What we need to find out is what the minimum level of redundancy required in a language is (basically the percentage of possible but unused syllables). Ik Mandarin has loads of homophones but people use clarifying adjectives and set phrases to deal with that so would the individual root be considered a lexeme or would the whole phrase with a lexical meaning be a lexeme?
probably the richer the phonology, the smaller the fraction of syllables that actually occur out of all
Do you think redundancy has a purpose of some sort, like possibly allowing sounds to shift without creating a lot of homonyms?
@@holdingpattern245 Redundancy has a purpose as it allows more meaning to be preserved if you mishear someone.
phonetic redundancies (constraints on syllable and/or word structure) help in noisy environments, and grammatical redundancies (multiple marking, agreement, the whole plethora of synthetic or rigid morphosyntax) help with producing or identifying a coherent utterance
"And it doesn't seem realistic to have so many near homophones that could be mistaken for one another."
Wrong. English has literally thousands of them. And many of them are used in common language. This is why you always seen misspelling "than" as "then" and vice versa, or "you're" and "your." It's not only realistic, it's extremely common. So common, entire books have been written on this.
All languages have at least 3 places of articulation for consonants and at least two heights for vowels. In addition all languages have stops, but not all languages have other kinds of sounds. So [ptk aə] is the simplest naturalistic inventory. Təkə Pəta is my favourite minimalist conlang.
proto lakes plain is reconstructed with /p t k b d a e i o u/ and 2 tones (high level, low level)
As a Brazilian, the thumbnail of this video is perfectly logical and funny/fitting, but not for the reasons one would expect lol
kkkkk Meu consagrado...
10:43 _HELLO_ everybody, this is Soundchangeiplier and welcome back to another let's play
When you say at 2K is an inflection point where the language gets “creative”, I think you mean “productive”
Lol it's so funny that I just found this vid, cause I started a conlang that has only 6 phonemes (which is now the prolang for a bunch of dialects) on the back of my biology test about 2 weeks ago and now it's like a full conlang (not finished but yeah)
we once started making a conlang with 9 phonemes (7 consonants and 2 vowels), with a frankly silly amount of allophony and a "(C | FS | SF) V (C) (C) (C) (C)" syllable structure. it was called "iu", and the phrase "i love my friends" is translated as //ip nis ipts psimnk//
And then there was Ancient Barohsawkies, my conlang which was the predessessor to Old, Middle, New and Modern Barohsawkies. Ancient Barohsawkies probably has only 2 phonemes: 1 consonant, which is m and 1 vowel, a. That’s it!
i would do, for consonants: /ŋ/, /x/, /ʄ/, /ɦ/. vowels are: /ə/, /ɤ/, you could also get rid of some it might be too much but i think you could make some really beautiful words with them. still could be made shorter though. just really long words to say exactly what you mean with like 5 or more syllables. could be a contendor for cursed conlang circus as well but i dont like using common phonemes as i really think this language should have really unique sounds and really few of them so it just sounds amazing. p.s. i will probably use these phonemes on a conlang for any cursed conlang circus in the future.
the title says "while still being sorta realistic", those contonants aren't very naturalistic
@@flyingduck91 they're pretty fucking funny to have in a conlang though, like i think it would be hilarious for the whole conlang to juat sound like a train wreck, again it would be better for like the cursed conlang circus
This is a fabulous video Robbie, I love it. Good experiments, we should do more.
I tried fiddling with the syllable structure (C)(C)V(C)(C) with consonants and and vowels and as well as limitations on "aa, ii, tt, ss" and maximum of 3 syllables I was able to make GenGo generate 2009 words!
Four phonemes with no long sounds, three syllables and (C)(C)V(C)(C), not bad :D It did generate some atrocities such as "saststatsats" but eh :P
As a Finn, i can apparently -count- word to infinity. Neat.
Of course that many vowels in a row happens in reality. Just look at Dutch words like koeieuier, iaë, ooi, kraaieeieren, and zaaiui. Granted, most of those "i"s are semivowels and not true vowels, but it's still very cursed both phonologically ans orthographically. Especially so when you consider that they break up into syllables like koei-e-ui-er, i-a-ë, kraai-e-ei-e-ren, and zaai-ui respectively.
Glorious.
This /i/s look suspiciously like [j]
@@swagmundfreud666 they're still the same sound, just non-syllabic. Though not all of them. They're are some syllabic/non-syllabic /i/ pairs there that are just written as a single .
Ok like from what I understand, a word like Zaaiui would be pronounced something like /tsa:juj/ (sorry if my IPA characters are off, I don't know much about Dutch vowels). @@Gamesaucer
Ah yes, iaë. Of course. We say it every day. The well known (in modern Dutch for almost all verbs definitely not long extinct) subjunctive of a verbalized form of the onomatopoeia "ia", the sound of a donkey. Which, as onomatopoeia surely do, adheres to the language's phonology. Nice find 🧠🧠🧠🧠🧠
This guy is so natural on camera for having such a small channel, he really should have more subscribers.
string theory in toki pona could be called "sona linja"
sina wile ala wile toki e sona linja? jan sama mi li jan sona la ona li ken pana e sona tawa mi, taso ona li sona ala e toki pona. ni la, sina o pana e sona sina pi sona linja tawa mi kepeken toki pona taso a!!
@@thomas4841 sona linja li sona nanpa. sona linja e ni: ijo ale li linja lili mute. linja ni li kalama, li kama wawa. jan sona pilin e ni: sona linja li pana sona sin pi mun mute. tenpo pini la sona linja li pana sona pi lupa pimeja! sona linja pona mute tawa jan sona a!
..
..
..
What I think you said:
Will you teach me about string theory? People like me are scientists, they can teach me knowledge, but they don't know toki pona. Therefore, teach me your understanding of string theory using only toki pona!
What I think I said:
String theory is mathematics. String theory is this: everything is lots of little strings. These strings vibrate, and become strong. Scientists feel that string theory will give new information about the stars. Already, string theory has given information about black holes! String theory is very good for scientists!
And I got that info from the intro on the wikipedia page for string theory lol. I don't understand much of it myself but wanted to give it a shot.
@@Queer_Nerd_For_Human_Justicei hate that i comprehended this so well
"linja" come from filipino "linya" meaning "line"
@@onetpottwelve in many slavic languages it also means line.
*linija* - line
I'm currently making (or at least thinking about completing) a conlang which only has these phonemes /p/ /k/ /s/ /l/ /m/ /n/ /a/ /i/ /u/ and a (C)V(N) syllable structure, with a maximum of two non identical vowels in a row and no consonant clusters.. It's intended to be some sort of basic pidgin that can be easily learnt by anyone,sort of like Russenorsk. I think that I will easily accomplish this goal if I try, since the number of possible disyllabic words in this language is 42.42 -7×2×3 -21×6 = 1596, and I plan to have around ~2000 words
Minimum number of phonetically distinct syllables needed to get at least 2000 phonetically distinct words assuming a maximum syllable count per word of:
1 -> 2000 (considerably fewer than the number allowed by English phonology)
2 -> 45 (though you can get 1980 with 44)
3 -> 13 (you can get up to 1884 with 12)
4 -> 7
5 -> 5
6 -> 4
7/8/9 -> 3
10 - 1999 -> 2
2000+ -> 1
This is assuming any string of valid syllables shorter than or equal to the maximum length is allowed, which shouldn't be all that difficult to pull off with such a tiny syllable inventory.
Bird Morse Code;
Two letters
One that sounds like Ck
One that sounds like Ri
Dots are Ck
Lines are Ri
Have fun feeling true pain
glad I'm not the only one to think about Morse.
If you want to know the absolute maximum number of syllables in a language you can just multiple the number of possible onsets, with possible nuclei, with possible codas. Then to figure out the number of words for a given syllable length, take that number and raise it to the power of the length. Add up all of those to get the total number of possible words.
For example, using [ p, t, k ] and [ a, i, u ] (C)V you get: 4*3 = 12 possible syllables. 12 * 12 = 144 for two syllables. 12^3 = 1728, for a total of 1884 words with a max of three syllables.
The only problem is that a lot of those syllables will be very similar making distinctions harder to realize. You can correct for certain rules by figuring out the number of combinations that violate those rules and subtracting them from the total(again just using multiplication)(this isn't always intuitive or obvious how to do).
What a great video; so many things struck me as completely strange, such as: why would a source ever claim that a language had infinite words??? Why does the calculator not give a consistent word count???
In a language like Inuktitut or Nahuatl you can incorporate whole nouns into the verb forms, and construct a fairly complicated sentence as a single very long word. In theory there's no limit to how many different ways you could do that, so that's one way infinite "words" are possible. But then you get into the sticky question of how to define "word" vs "morpheme"
i think it generates all 9999 words, then throws away the duplicates and that's the final count
and because the word generation is random the number is gonna be random too
apologising in toki pona is harder than saying "you wanna fight then?" when you fuck up
Let's make it tiny but also *interesting*: b d t k, a i u with Warlpiri-style vowel harmony and strict CV syllables.
The weird thing is this language sounds awesome, could totally see using something like that on a game or movie 🤯 (maybe with a little more work on grammar)
I once tried to make a conlang with the phoneme inventory of k, b, s, n, l, o. (one voiced plosive, one unvoiced plosive, one fricative, one nasal, one liquid, one vowel). But I ran into the same problem, I quickly ran out of combinations for syllables.
I was trying to get close to that 2000 and came up with this phoneme inventory that with obligatory CV structure has only 13 unique syllables:
C: p t k b d g m n l r s w ʔ, V: ə
Which has 13*13*13 = 2197 3-syllable words. A bit too many, considering there are also 13² disyllabic words and 13 monosyllabic ones. To make it more pronouncable, use word initial stress and realize word-initial schwa as [eɪ] except in monosyllabic words. Other allophony to make the words more memorable could be things like lowering the vowel to [a] after b, d, g and raising to [u] after w.
Conveniently it could be written in an abjad without losing any information.
Does make me wonder what a realistic minimal number of syllables would be, since the words do not sound all that weird and I don't get a feeling that syllables are missing. Wouldn't be surprised if there are languages out there with much more restrictive syllable structure than the three you mentioned.
String theory in toki pona:
ale li tan linja lili. linja ni li lili mute, li lon insa kiwen pi kipisi ala.
and no I would not want to sit through a lecture that's like this.
What does that directly translate to
@@MURDERPILLOW.
Literal translation:
all [verb specifier] because line small. line this [verb specifier] small much. [verb specifier] located inside non-spiky solid material
16:54 with such a simple syllable rule, you can math out exactly the number of possible syllables: (C)V has (consonants+1) * vowels number of 1-syllable words. then just take that to various powers for the number of syllables. the (ptkeo) language (we stan horizontal vowels here) with (C)V has 8 possible syllables (pe te ke e po to ko o) and therefore 64 bisyllables, 512 trisyllables, and 4096 quadrusyllables. taking half of all quadrusyllables, all trisyllables, and all bisyllables to be the language (with monosyllables reserved for articles and decimal modifiers like -ty / -hundred), that's 2368 words.
Once you get this low in phoneme count, the pronunciation of the phonemes should start to vary a lot. In this case, a would probably include o and u, and i would include e and y. So to the ear, the words would sound more varied, but technically a speaker could use any substitute vowel and be understood, although it probably wouldn't sound right to a native speaker. A similar thing would and does happen to consonants.
While (C)V *seems* like thr simplest syllable structure, it actually represents a reasonable complexity. A more restrictive phonotactic structure would result in a simpler (easier to pronounce, more naturalistic) language.
The bare minimum to convey meaning is two, obviously, as in binary, but yeah at this point it's extremely impractical.
it wouldn't be that tough to write out the R script to manually calculate some permutations on a set of vowels and consonants, right? Intro stats kind of escapes me in the summer's dank haze, but those kind of combinatorics are definitely baby shit.
wicked video. I miss iddubbbbbbzzzz too D:
this is really interesting but i think the experiment would be a bit better if u limited it to no dipthongs or any vowels next to eachother whatsoever, id be curious to see how that would turn out
I agree, I was wondering through the whole video what would've happen if he changed the phonotactic pattern to CV.
Has there been any successful attempts to formalize pokemon vocalizations into actual languages?
That would be sick as hell. I believe there are actually pokemon sounds which are slightly modified versions of other pokemon sounds... interesting to look into. and VERY grating to try to use.
while it's nowhere near as minimal as 5 phonemes, I've been working on a vowel-only conlang that I'd say is pretty minimal. if you don't include tone, diphthongs, or triphthongs, it comes out to just 8 vowels: i, I (but short), 3 (but backwards and short), æ, a, e (but shwa), u, and o. with tone, they can either be high or low, so it's really 16. personally, I like to think of the diphthongs and triphthongs as being like the syllable structure, but with a few specific restrictions (no same sound-same tone immediately next to each other, and no "high-high-low" or "low-low-high" tonal patterns for triphthongs (h-l-l and l-h-h tones in triphthongs are falls and rises, respectively, and the only way I could conceive h-h-l and l-l-h were just sharp falls and rises, but they sounded too much like separate syllables. thinking more, I think I should remove two of the same sound being next to each other in rise and fall within triphthongs, regardless of tone, to cut out confusion with diphthongs, because there wouldn't really be a good way of distinguishing them)), because then I can write the syllable structure as (V)V(V), which is both funny and vaguely resembles a sleeping owl, which is what I've codenamed the language (sleeping owl).
I also basically "shadow defined" a bunch of words by providing a "root" and "descriptor" definition for each phoneme (tone included).
In the final creation of words, you forgot to filter out for 4 and of the same letter, this would mean the maximum is decreased by 2, if I counted correctly.
It would still not matter, but it is there, in the second row, there is the word paaaa.
no, "paaaa" would be filtered because it contains the string "aaa"
@@lietajucemaciatko383 Oh, right my mistake. I hadn't noticed the t apparently.
I'm just blind. ( :
Here's my take on the math behind this. Let's say we have CV(C) and 3 consonants (which can be in initial or closing position) and 3 vowels. There are 3 possibilities for C, 3 for V and 4 for (C) (as the fourth choice is none). The number of basic syllables is just the product, in this case 36. But we will call it s.
How many two syllable words are possible allowing repeats? It's just s times s, or s^2. So without repeats, it's s less than that, or s^2 - s. Adding the one-syllable possibilities to that, the number of one and two syllable words with no repeated syllables is s^2. So in our case we actually have 36^2 = 1296 unique one and two syllable words already.
Three syllables with no repeats: s(s-1)(s-2) = s^3 - 3s^2 + 2s. Adding in the s^2 possible one and two syllable words, you'll get s^3 - 2s^2 + 2s. With s = 36 we now have 44136 unique words.
Let's do CV and use 5 consonants and 3 vowels. So s = 15. We get 2955 unique words.
So the moral of the story is if you have anything more than a truly minimal phoneme set with a truly restrictive syntax, you'll get your 2000 words easy.
Technically, whistled language only uses one sound and a crapton of tones
Depends on the language and how it's whistled.
Some whistle languages' tonogenesis is the first layer of it, and therefore the "tones" are more like "consonant contours".
In a strictly technical sense. I'm not saying you're wrong in any way, just trying to add a shade of context. Even if you think of it as a contouring consonant, it's acoustically and articulatorily a tone.
Okay this comment I made awhile back has inspired an idea that I've been ruminating on: for a cursed language (for the cursed conlang circus of course) you could make a language whose only sound is "O" but with various tones. No, not whistled... *spoken.* So it would go something like
Ooôo ǒo᷄oo᷈ooo᷇ oo᷉o᷉oô o᷅oooo᷆o ooǒ
Sure, you'd sound like a chimpanzee having a panic attack... but it would be a form of communication at least. Unfortunately I am nowhere near competent enough to even approach making a conlang but man it would be funny if someone made this work.
The differing pitch contours of whistle registers *are* phonemes. Treating tone/pitch contours as non-phonemic is cheating.
I smell a cursed conlang
i think the people closest to knowing 100% of english words would be like champion scrable players, even then theres a big gap between nigel richards and everyone else, so he problably knows the most words
With a living language, is knowing all the words even technically even possible? Like at what point do you start considering sesquipedalian a word, and skrunkly not?
Moreover, I wonder if loan words count to these languages word counts. Technically sushi is an Japanese word and an English word simultaneously and with the same meaning, so which language does it count towards? And then how do you judge Tycoon, which is technically both a Japanese and an English word, and a loan word, but the loan word has a different meaning compared to the word from which it derives.
The thing is, English is funny because it has a habit of stealing root words from Latin and Greek and then mashing them together in its own grammatical structure to create new English words
Morse code and binary have two, and can say any english work with them.
Turns out when your language has like 3 sounds in it, it just sounds polynesian
C: ʔ,
V: a i u,
(C)V
Filter: aa, ii, uu,
1-5 syllables
Over 2000 words possible
you didnt account for two stops in a row
I'd argue the language with the fewest sounds might be the honking of cars, morse code if it was used as a conlang, or a language that is not spoken (0 sounds).
Good video.
This reminds me of this time when i was little (and before i was into linguistics), when me and my sister tried to make a language with "a" and "b". This was it's """"""""""sounds"""""""""":
A /æ̆/
AA /æ/
AAA /æ:::/
B /b/
BB /bʊ:/
BBB /bʊ:::/
In case you noticed, I used IPA Notation, I'm just putting my current linguistics knowledge into describing this.
Here's the syllable structure: (C)(V)
Words can be made like: BA, ABB, AAABB, ABBBABAAB, etc.
Word order is English.
This language that I just randomly remembered I made 2 years ago makes me cringe-
a while ago i was lurking the (new?) cbb forum and stumbled upon a scratchpad that some guy made where the protolang had the phonemes /t k d g i u e a/ with some of the most deranged seeming diachronic sound changes i had ever seen at that point and it stuck with me for years
If you had a language with a syllable structure of CV (so that any combination of syllables is a valid word) with c consonants and v vowels, then you'll have c^n*v^n = (v*c)^n words with n syllables and [(v*c)^(n+1) - 1]/(v*c - 1) words with up to n syllables. For that language with the inventory of {p, t, k, a, i} you'd get 1550 words with up to 4 syllables and 9331 words with up to 5 syllables.
Anyways I did a little spreadsheet and here's a few nice limits: (note that swapping consonant and vowel ammounts gives the same result)
2c,1v -> 10 syllables = 2047 words
8c,7v -> 2 syllables = 2353 words
5c,4v -> 3 syllables = 3616 words
7c,2v -> 3 syllables = 2955 words
4c,2v -> 4 syllables = 4681 words
7c,1v -> 4 syllables = 2801 words
5c,1v -> 5 syllables = 3906 words
that's literally tuki tiki's phonology.
C: m n p t k
V: a i u
structure: CV
syllables: 1-3
These phonotactics give 3615 possible words. Excluding "ti", a sequence absent from multiple Polynesian languages as well as a certain minimalist conlang mentioned in the video, this still results in a maximum lexicon of 2954 words. I admit this isn't perfect and has some ambiguity; for instance, how would one differentiate between "kama pinaku" and "kamapi naku" phonemically? Still, it's a decent starting point.
Solresol gets by with only 7 syllables. A mere 3 possible syllables already gets you well over 6,000 phonologically distinct words of 8 syllables or less, so by those criteria you could get by with only 4 phonemes (3 consonants and a vowel, or vice-versa). Or you could go with 4 syllables, 2 consonants and 2 vowels, and get over 65,000 words of 4 syllables or less.
wait to you see my cursed conlang...
you say you can't know 100% words of language, but I know very more than 100 word in Russian
I've decided to try making a realistic conlang with the consonants [m p k j w] and the vowels [i u a]; there are exactly 16 valid syllables but I honestly think it'll work@
Sign language…
92 allowed syllables in toki pona, so 8464 possible 2-syllablu words, I and I thought that was not many phonemes. 3+3*4 is pretty tight.
For Toki Pona, you could almost consider it an agglutinative language.
jan is the word for person. pona is the word for person.
jan pona means literally good person, but is usually translated as friend. So is jan pona a different word?
Oh hey. At 1:28, that image on the computer monitor is a screenshot of a word entry in Lexicanter. *I have invaded your videos, Agma.*
It's more efficient to reduce consonants than vowels with the (C)V consonant count, but it's not *much* more efficient. With 3 consonants and 2 vowels, you're sometimes above and sometimes below 2,000 words, with 2 consonants and 3 vowels, you're sometimes above and sometimes bellow 2,500 words. With 1 consonant and 3 vowels, you're in the 1,3XX range, with 3 consonants and 1 vowel, you're between 320 and 340 words.
In one of my conlangs, there is an estimate of 63840 possible syllables (127680 if you take in to account stress), and currently no upper bound on how many syllables a word can have, which makes the amount of potential words infinite.
no natural language have limit on how many syllable that a word are allowed to have it only a matter of if it too short to be distinct or not or long to be practical or not
p,t,m,s,w,l,a,u
Within 4 hours!
I know, it’s crazy how quick this one got views, not complaining haha
1:58
To add to a hypothetical low-phoneme lang, what's the largest amount of sounds you could have in a language while still having a low phoneme count? How allophonic can you get a language with like 9 phonemes and make it sound way more complex than it is?
I don't quite understand how the generator comes up with those numbers. With the phonology you presented, there are six consonants and five vowels, so 30 CV syllables, 35 (C)V syllables, and therefore 35^2 = 1225 two syllable and 35^3 = 42.875 three syllable words. While the two syllable count kinda makes sense, I don't get why the three syllable count isn't maxed out at 9999, unless it's using some kind of secret phonotactics rules not explained here
Btw, if you just go combinatorically, 13 syllables is the minimum to get over 2000 for three syllable words, and 45 for two syllable words. So for two syllable words its actually quite reasonable even for a small phonology (and you could argue Mandarin does that, for example). For the three syllable case, a (C)V structure seems quite nice, and you can simplify quite a bit, 3 vowels and 4 consonants would be enough, for examle a,i,o and t, k, s, n (which would actually be quite nice for an auxlang, since those sounds appear in most phonologies). And with four syllable words considered ... 7 syllables are enough. So p, t, k, with a, u would actually be okay, as thats 8 syllables, you can even prohibit one, so your selection actually wasn't minimal, but very close.
Five phonemes is maximum. That makes me think what is those commonly Five Phobemes. They are very commonly fine a lot language for some reason and for some reason I can't remember them hmmmmm anyway Great Vid
4:19
Never smoke that tira 292b kush bro
Now I’m stuck in the refugium 💀
this is pretty gamer
I still have to watch the video but 9x5 with CV (not even "(C)V") with 2 syllables is 9*5+((9*5*(9*5)) es 2070 words. If you add stress diferentiation doubling only 5% of the 2-syllable words you get around 100 more words. If you add an option N ending at the end, japanese-style and allow optional consonants even if you dont allow vowel clusters, you end up with (V+VN+CV+CVN) * (CV+CVN) = (V+VN+CV+CVN) * (CV+CVN) , even without stress you can go down to 5 consonants and 3 vowels giving you almost 2.5k wods even without stress. Withth "diphthongs" you can get to 4 consonants and 2 vowels and also withotu stress, you get nearly 3k words (because you are effectively using 4 vowels. If you only use 1 dipthong per word and only on the second syllable that does not allow vowel nor diphtongs to touch each other, as a further constriction, then you still get around 1.5k (if you allow the dipthong on the first syllable instead you get exactly 1.5k which is a bit more but I suspect youd need more phonemes to make the difference worth it)
One of my small inventory conlangs had: P T K V S H R N A E I O U (8*5) with a "word" (loongest syllable vs the maximum ending syllables) structure of (C)(R)V(V)(N) * C(R)V(N) = a * b where:
[R is R, (V) is istrictly I or U and N is R,N or S, repeated letters not allowed]
a = V + VV + VN + VVN + CV + CVV + CVN + CVVN + CRV + CRVN + CRVV + CRVVN = 5 + 12 + 15 + 36 +40 + 96 + 120 + 288 + 35 + 105 + 84 + 252 = 1088
b = CV + CVN + CRV + CRVN + CRVV = 40 + 120 + 35 + 105 = 300
Meaning that if my math was correct, is 1088+(1088+300)= 327,488 possible, short, somewhat aesthetically pleasant possible words, which is more than enough to me (though I generally try to fall in the 12-15 range for consonants)
12:45 Ah, but with puupuua as a word, we need to know whether the grammar has reduplication. And to keep the language interesting, you could make one of the consonants a click.
With only 45 possible syllables, you could make 2025 two-syllable words; with only 13 possible syllables, you could make 2197 3-syllable words; and with only 7 possible syllables, you could make 2401 4-syllable words. So apparently you could have a serviceable language where the syllables are all just saying "Steve" in 7 different tones of voice (ie. intrigued, delighted, perplexed, playful, scared, disappointed, apologetic), even if those words are limited to 4 syllables each.
Sorta like I am Groot?
further idea: if every possible syllable doesn't require movement of lips or jaws (leave out the w and m, for example), then it would be difficult to tell if people were talking without hearing their words, it would look like they were just staring at each other with their mouths slightly open, this would limit the vowels (I think only schwa would be allowed) but there would still be ample consonants, it would sound like "the nun hugged a hung young stud, an unsung hun's son shunned a stun gun".
the bonus part at the end was funny
I got a 5 consonants and 2 vowels conlang. Wonder how low can you go
is linguist audio the new leftist audio? is there overlap?
i find it funny that linguists do not care about mic quality I have been struggling to find one. Klein has the best I've heard and even he needs a pop filter
(with much love, as I no longer have a good setup myself, and my humble mic can't be what i had before)
lmao, certainly overlap. I actually had my good mic recording during this one to try and sync it up but I had it positioned stupidly and it sounded even worse, hahaha
8:10 "And no person knows 100% of a language"
Nigel Richards:
Not entirely sure what that 6 ish minute GenGo segment was supposed to demonstrate. That it only managed to generate ~3300 words out of possible >22K for C=ptk V=aiu (C)V*{1,4} words is hardly impressive or relevant.
nguh: "you can't explain string theory in toki pona"
mi: "ale li linja lili"
Seems interesting that Rotokas, Hawaiian, and Piraha are languages with no consonant clusters, I am surprised that a language with such a small phonetic inventory has all the words ending in vowels, no syllable ends in a consonant. ts and dz are consider to be separate sounds in there own right, but could they actually be consider akin to vowel diphthongs.
I was looking at ur laptop
I noticed that when you dropped the long vowels from the Rotokas phonology, your word generator still created things that looked like they had them (oo uu etc.). By allowing words with V syllables succeeding V or CV syllables, you've accidentally re-introduced the vowel length distinction. Oops.
7:23 ken la linja lili li ijo lili. ijo lili mute li soweli li kasi li waso li ale pi ma ale.
Can so line small is thing small. Thing small many is animal is plant is bird is all of land all.
With only 3 vowels and 3 glides you get more than 2000 two (c)v(c) syllables words.
I was learning Toki Pona and asked on the Toki Pona official Discord how they would translate "The bathroom is the third door on the left."
It resulted in a three day discussion that came up with a sentence of about 24 "words". Really dampened my enthusiasm on the language.
the language isnt rlly meant to be a whole practical everyday language tho, it does make sense that its like that. im currently learning it and loving it
@@rainbowrotcod What the whole discussion really highlighted for me was that as much as the community is loathe to introduce new words, it would be well served with some concrete directional words and a whole rework of the number system.
Two consonants, two vowels, and let's make CV the only allowed syllable structure to keep thing interesting. This gives us four possible syllables and 65536 words of 8 syllables. Done.
just 6 is enough 4096
8 Consonants, 5 Vowels, 2 Syllables gives you 2,070 possible words
What about ka's inventory of m k a u?
"bigger than toki pona but smaller than tamil"
I don't know why you had to go with only CV syllables though. I know you were sort of basing it on Rotokas, but plenty of languages would allow doubled consonants and even consonant clusters in the middle of words. If you do that, allowing for only 'CVC(C)V(C) and allow only a maximum of 3 syllable morphemes, and you have that sound inventory of [p, t, k, a, i, u], you get a pretty naturalistic sounding language, even without adding things like tone and vowel length. Patka appa itkupa, etc. You could even have some phonotactic rules (like allowing the consonant cluster /kt/ but not /tk/) and lengthening or fusing two vowels when they touch each other, it would still sound naturalistic. For example, instead of "appa atati ukaku", the vowels fuse or lengthen to become "appaa 'tati kaku". A language like this could easily be a real language, but it would struggle to have enough DISTINCT sounding words. Languages with really small phonemic inventories also struggle to form compound words, and words with "sound symbolism". Which is why we tend to see these kinds of languages develop things like tone to compensate. But, the question of whether a language is "naturalistic" or not is only about whether it sounds like it could be a real language, not whether it is a feasible language for a community of speakers who have to interact with the outside world and invent words for abstract philosophical ideas. A language like Toki Pona that you mentioned works extremely well at communicating a limited number of concepts, which works great for a creole or auxiliary language, but if it ever had a group of native speakers you can bet they wouldn't keep their vocabulary to only 2,000 words.
true lol in my native language people prefer to borrow word from english instead because compounding sound cringe and awkward
There is an issue with this method. This sort of experiment does not have any parameters for redundancy.
Sure single syllable words like "stat", and "spat" are distinguished by a single consonant, but imagine actually trying to converse with words like "kitatakapatika" and "kitatapakatika", and "kitataapakatika".
I would suggest putting in some kind of filter to account for near-duplicates over 3-4 syllables
I remember needing to learn Mandarin for a year and I learned about tonal language. Why not just use tonal variations of one vowel?
Tones are still phonemes ("tonemes") if they're distinguished, so that doesn't change the count. It's just a different sound for this tiny language.
what about a script with 2 letters?
Do you have a list of necessary words?
So it’s a little subjective, but I typically pick from a different Swadesh list when I speedrun a language just to get the first hundred or so out of the way. though I usually end up deleting the pronouns and prepositions from the Swadesh list because I often have them created already or need to make sure they’re simple or irregular.
@@AgmaSchwa but i need 2000 at least, i cannot just stack these