That's a good idea. Illiterate or pre-literate (a student?). The issue there is that Voynichese has a very clear "system" to it. It really looks like the makers put a lot of thought into the composition of the glyph set and the way it behaves. We just don't know what that behavior means. Still, I wouldn't entirely discount something along those lines.
The obvious objection is that you could have a substitutional cipher that substitutes individual sounds for groups of Voynichese letters that occur together
@@metachirality yes, I think that's a valid line of investigation, and in fact I looked into that a couple of years ago: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/ This could be called a "verbose cipher", though I certainly agree that this is still a form of substitution. It's a far cry from what theorists tend to do though. More commonly, they like to do the opposite, mapping syllables to voynichese glyphs. I'm currently making a video about how this is an awful solution:) Note that there are all kinds of issues with the verbose cipher approach though. One of them is that "words" would need to get really short. So probably you'll be forced to abandon spaces. So the first thing you'd need to do after identifying glyph clusters that map to letters is explaining how spaces were inserted.
@@koengheuens My personal theory is that many different Voynichese words can map to a single letter in another language, and the difference between Voynich A and Voynich B reflects a shift in which "words" are favored over others during conversion. Who knows though, it could be completely false.
@@VirtueInEternity thanks! I'm scripting a new video right now.
23 дні тому+15
Maybe this is influenced by my being a physician researching mental illness, but I've always had the feeling that the manuscript might be done by someone with some mental disorder (which at the time were not treatable at all), from something as comparatively mild as autism (where repetitive behaviours are extremely common) to some form of psychosis. So it would basically be a form of written echolalia or echopraxia... It would be quite a boring solution, as that would probably mean there is absolutely no meaning in it at all. Some forms of schizophrenia are interpreted by the patient as being visions or revelations from deities... I can totally imagine a mildly schizophrenic priest or monk thinking they were being instructed on what to write by some god or some spirit, as with many (even mentally sound) people believing they can "speak in tongues".
Yeah, like any solution that proposes absense of meaning, we'd never be able to prove this if it were the case. Mental illness can express itself in so many ways... One big issue with this is that we have long known that there are at least two scribes, and Lisa Fagin Davis' more recent research indicates that there are five. This at least suggests some kind of shared experience. Even if the text turns out to be something like what you describe, we may not even need mental illnesses as an explanation. Medieval people were religious fanatics compared to the average person in modern developed nations. And religious zeal can cause people to do weird things.
23 дні тому
@@koengheuens very interesting! I'll keep watching your other videos on the subject!
Just discovered this channel. I know all your videos are on a very niche subject, but I was shocked to find you had less than 1k subs! I don't know enough about languages to have my own theory on the voynich manuscript, but merely absorbing this information (which I otherwise have no interest in) has been delightful. Thank you for your videos, they're presented well and simply enough for someone like me (with no experience in this) to grasp. I've known about the Voynich Script for a while, but videos never go into the details of the writing itself. Although, the Histocrat had a video about its history, which was in depth and interesting. Thank you for your hard work on these, I know how long it takes to edit even seemingly simple things, and your work is commendable. Thank you!
Thanks! To be honest, many of my videos are really niche (like those from Voynich Day, which is like a conference for Voynich nerds). But the most recent one suddenly got picked up by the algorithm. I'm planning to make more videos like these, explaining things Voynich researchers struggle with to a broader audience. I'm glad people enjoy this and find it informative, that means a lot to me.
Voynichese reminds me of a court reporter's steno system where the "characters" of a word always appear in the same location regardless of its location in the word. Vowels and vowel combinations always on the left in that example, consonant combinations always on the right, and the next word on the following line.
Also did any of these "solutions" actually bother translating the whole thing? I feel like if you solved the VM, and your ealized it was Welsh all along...you'd just go ahead and translate all the characters to welsh and then people can easily verify it. Are these people deliberately lying and didn't expect anyone to actually *apply* the substitution beyond a few cherry-picked words/sentences?
Oh man, don't get me started... We always see the same thing over and over. "I found the language but actually I don't speak it, so I need experts to solve the rest". Stuff like that. To be honest though, it rarely happens that I feel some malicious intent by the solvers. They are genuinely convinced that they solved it. It's so easy to cherry pick words, but each time one of those "works", they do get a confirmation that they are on the right track. So I like to think of most solvers as people who have fallen victim to confirmation bias rather than conscious liars.
Videos like this about the Voynich Manuscript are the perfect size to get invested into the subject- please make more like this and thank you for breaking down the study of this famous codex 🙏
@@cadesummers5866 thanks! I hope to have the next one out in a week or two. I'm really enjoying making these and want to keep improving while doing it. It means a lot that people find them informative.
Some languages have weird limitations of entropy: - Turkish vowels past the first one are limited to 2 options only (a/ı > a/ı, o/u > a/u, e/i > e/i, ö/ü > e/ü). - In Inuktitut, words can only start in a, i, u, p, t, k, q, s, m, n, and only end in a, i, u, t, k, q. Medial consonant clusters undergo assimilation such that there's mostly only mm nn ŋŋ vv ll jj ɡg ʁʁ jv jl jg jʁ ʁv ʁl ʁj pp tt kk qq ts ss ɬɬ sp st sk sq sɬ qp qt qɬ ɬp ɬt ɬk ɬq ɬs as far as I can tell (plus some others depending on dialect). - Middle Chinese can be seen as a series of binary choices per syllable: [initial consonant] > [normal or "back" t,tʰ,d,n?] > [labiodentalization of p,pʰ,b,m?] > [sibilant type is s, ʂ or ɕ?] > [aspiration?] > [voicing?] > [j?] > [j or j̈ ?] > [w?] > [ɥ?] > [low or high vowel?] > [normal vowel or faster more centralized one?] > [extra weird split between "ʌ" and "e"] > [extra weird split specific to ij/in] > [pharyngealized?] > [rhoticized?] > [final consonant is j/w/m/n/ŋ/p/t/k or nothing?] > [is final ŋ/k "front" or "back"?] > [level/rising/departing tone?]. Each of these binary choices is heavily dependent on other choices and the number of potential neutralizations is exponential. For instance, labialization can only happen once - either as a labial consonant, or as w or ɥ, or as a rounded vowel, or as final w or m or p, or as "back" ŋ/k (except for a rare "jom" rhyme). The pattern of palatalization and rhoticizing involves like 8 of these choices and is completely bonkers and most of it is in almost complete complementary distribution. There's all in all about 5000 possible syllables (which merge into about 3000 in Cantonese, and 1200 in Mandarin). That being said, none of the languages you'd expect in the Voynich manuscript are like that. Romance languages in particular are quite orthogonal in combining consonants with vowels and your enthropy never really falls under about 1-in-3 (and is more like 1-in-5), and likewise for other Indo-European languages (aside from "ə" shenanigans).
These are some very good points - the observed phenomena In Voynichese aren't unheard of in natural languages. But whichever example you produce, Voynichese usually turns out to be much worse. For example, the glyphs that can regularly occur both word-initially and word-finally are very limited (in contrast to for example the Inuktitut example, where you have a few that can do both). And usually, when you find a language with some of those properties, you'll see that another stat is way off in compensation, like for instance the practically required size of your "alphabet". (I want to get into that a bit in the next video).
Your roman numeral example made me think of a reverse gematria type cypher of some sort. Where specific numbers correspond to certain letters or words. So these numbers would have a very rigid structure in how they can be written. Or like how in modern linguistics, when describing phonemes you list the features of each given aspect of the phoneme in a very particular order
It's my understanding that Voynich has a lot of unique "words" (collections of symbols), despite the rigid word structure. Does this mean that many of the "words" are nearly identical to each other, differentiated only by a single symbol? If so, does this mean that most "words" can be placed into a relatively small number of categories? Again, if so, how many categories would you say there are? From what I've seen, I would guess probably somewhere around ten main categories, but maybe as high as twenty? (I imagine there are a small number of "irregular" words that don't fall into any pattern that would be ignored for categorization purposes.)
I have personally not studied word categories enough to adequately answer that question. What I do know is that if you look at the rate at which new vocabulary is introduced (TTR), Voynichese behaves a lot like other languages. Those are just the broad statistics though. If you look at what those "words" actually are, it is as you say: small differences from one word to the next, lots of repetition and near-repetition like "du du dum". This would be a good topic for another video, but I'd need to get in some people who've studied this more.
I'm torn on the mystery. Half of me wants to see a _true_ solution, but half of me wants it to just be a very effective act of trolling from some troll several centuries in the past. The latter really makes my heart soar as a current era troll.
The best of both worlds if it is is solvable and is an immense troll at the same time. I really hope the text is mocking the solver and is not related to the illustrations.
While it is true that Voynichese character sequences are more predictable than natural language ones, linguists Claire Bowern and Luke Lindemann actually dismiss the idea that it is all just gibberish based on the properties similar to those of natural languages that it displays at higher levels of organisation. So, it is likely to be an encoded natural language or a constructed language. It is unclear whether it is more likely to be an encoded natural language or a constructed language because the supposed early 15th-century date of the Voynich manuscript predates the split between linguistic experimentation and cryptography and we don’t have the intended key, which would explain why Voynichese character sequences are more predictable than natural language ones. And if it is intended to be an encoded natural language, maybe it is a verbose substitution whereby the real character sequences are replaced with false longer ones. Or it is really just plain text and it is so predictable because it was only used for these manuscripts about specific topics, though it is still mysterious which language it is in.
Great video. I am personally inclined to believe it's systematic gibberish, but still, your video made me wonder about the following: Has anyone worked in anything resembling a tokenization of the text, that could try to match *sequences* of EVA characters with tokenization of other candidate languages to be behind Voynichese? Token as in: discrete unit of meaning or structure (e.g., words or parts of words) that serves as the input building block for language models in machine learning. Maybe in that case there are more relevant elemental structures than the EVA characters... And also: Would making a single character equate to more than one EVA character still count as substitution? if a = qwe b = qwt c = qwy or any reasonable iteration from that.
Thanks! Regarding your last question, I honestly don't know where substitution ends - that sounds like a matter of definitions where people are likely to disagree. That some parts of EVA, like ch and sh, are better written as single glyphs seems clear. But this could certainly be taken further. Back in 2020 I spent some time looking into these kinds of issues. You can read about it here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
5:45 wouldnt this suggest some sort of n-gram based system? there has to be some trade off between character length of a text block and encoded characters per length. this sort of has to assume that spaces are to be ignored though.. also, this only works if the ability to predict the next character from the last is not 'continuous' across a section of text. if it reliably fails to predict at a certain interval, then you have your n-gram length. if it never fails to predict the next character, then its too deterministic to express any meaning whatsoever (unless the meaning itself is the repeated pattern)
I looked a bit into n-grams a couple of years ago, though only with entropy in mind and without trying to find a consistent length. See here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/ (note that I had no training in statistics or anything the like, so everything you see is at the limits of my understanding at the time). So far, all research I'm aware of has kept spaces intact. Removing them will probably have some impact, but I don't think it will move the needle much on the entropy meter. Your idea is worth testing though.
I feel like this implies Voynichese is some intermediate between a logography and syllabary, or maybe just a highly derived logography. The trends in character entropy seem to fit the concept of purely orthographic agglutination, possibly with purely non-phonetic adornments (e.g., the divine marker in cuneiform). People have definitely looked into the idea of agglutination in Voynichese before, but afaik it's just that guy who claims it's a phonetic version of Old Turkish which is so obviously wrong, and I think the opposite perspective--of a text which is barely connected to or entirely disconnected from phonetics--works a lot better. Then again, such a consistent structure within each word could also easily point to a shorthand with words referencing entire sentences or phrases with order defined by some strict grammatical system to ensure similar order in all sentences containing like grammatical aspects, as would fit shorthand. Or it could just imply the algorithmic creation of an overly-effortful hoax we've all been fooled by for centuries.
Almost makes me think of some of the writing practice I did as a kid. That often had a nonsensical and repetitive nature to the final product with sometimes no actual letters, because it was intended to teach muscle movements not spelling. That would also be pretty low entropy. Not saying I think that is the answer, just noting a similarity in my experience.
I've thought about this as well - scribes also had to learn how to write at some point. This would still leave a whole lot of questions open, of course. Why exactly this system, which 'teaches' some useless connections and leaves out a whole range of useful ones? And why 'practice' alongside weird drawings of plants and nude figures?
@@koengheuens Perhaps a more specific sort of practice to learn how to illuminate manuscripts for a scribe who already knew how to write. Monks had all sorts of strange hangups, it would absolutely not surprise me if some sect refused to practice on something real so made up things for their practice sheets. Though you know like a million times as much as I do about this whole thing, so if you don't think it fits then I would bet you are right.
How much entropy would be needed to characterize a language not by words, but by numbers and units? As in, if these were recorded results of an experiment, then it would make more sense if certain symbols only appeared in front of others, and recurring suffixes would make more sense, too.
@@JenJenswriting I don't think there's enough entropy anywhere within voynich words to accommodate decimal numerals. But something like Roman numerals plus extra stuff might certainly be possible. I don't think there's a way to prove this though. You could show that it's possible, but that's probably where this line of investigation would end. Also, don't forget that we're looking at tens of thousands of "words". Whatever was done to obtain these was sustained for a long time.
@@koengheuens Yeah, that makes sense. I'll try to come up with another avenue. It just doesn't make any sense for it to be gibberish, like some vehemently believe. After all, writing an entire text of gibberish would just land someone in deep water in the 1400's.
@@JenJenswriting I also still think there must be something more to it. But currently I'm clueless about what this could be. There's also something philosophical about it. We can never prove that something has no meaning, so if I were to say "this is gibberish and the drawings are a child's doodles", then I might as well look for a different hobby. We will keep looking for meaning, hoping there is one. Who knows, it might turn out to be fairly simple and obvious once we find it.
@@JenJenswriting People wrote entire books full of gibberish in the 1400s, mostly to con other people. Someone creating this as a con - to sell to someone else as some sort of valuable, mysterious manuscript - is plausible. One of the theories is that it was directed at either Georg Baresch or Athanasius Kircher, two of the early people associated with the book, or possibly Rudolf II, who purportedly owned the book previously. It is also possible it was a prank on Kircher; someone wrote up fake Egyptian text and sent it to Kircher and Kircher "deciphered" it, even though it actually was gibberish. The biggest point against it being gibberish is that it has some statistical properties that are unlikely in a gibberish text.
Wile it likely isnt substitutions I think the letters that only appear at the beginning or end of a word may be for removing excess ink from the wrighting implement and the ones at the end to make the space between words more distinct The ones at the end of a full line is so you get rid of lots of the ink on the quill so it doesn't leave a drip all along what you just wrote Those symbols don't make a sound or thay can also mark where a inhale or breathing marker so you can fluently read it outloud I am curious on what this thing is and want to know more about it What is it called I need pictures And location of discovery Using this method the translation may be in our grasp and we could even determine if it could infact be giberoush but with such little info i have i can't tell
Could it be possible that this is some sort of Semitic-like language? From my experience, Arabic contains a lot of predicable word patterns, as in non-concatenative morphology maybe whatever this language is (assuming it's speech), could be attempting to replicate that? Maybe the "o" character is some sort of placeholder for any vowel sound in addition to the "ɪ" character. I'm definitely not claiming it's semitic, because even Arabic is very flexible and has high entropy when it comes to character prediction. My interpretation is that the author probably copied that idea and made a conlang out of it: terribly artificial and unflexible but functional. I haven't looked much into, but since you're the expert, I have a question: are there words in this manuscript? More specifically, do they even obey Zipf's law? If not, then we can automatically discard natural languages, and assign it to a potential "kitchen-sink conlang".
I haven't studied Zipf's law myself because I am hesitant to determine glyph boundaries (and thus word length). However, the Voynichese system does not appear to follow Zipf's law: voynichattacks.wordpress.com/2021/08/12/word-length-distributions/ . I also recall discussion of Zipf's law of Brevity that states that the most common words in languages tend to be short. Even if we can't be certain how many glyphs the Voynichese strokes denote, it also appears clear that the MS does not follow this law, with many frequent words of average length.
Sometimes I wonder if it could be a music sheet? Maybe a choral compendium? Medieval music was pretty rigid, so repetitive sequences and symbols occurring only in specific positions wouldn't be surprising. I was a part of a local church with very old tradition in rural Russia, and they had a music notation system that was local enough only couple of dozen people across a handful of local churches could still read it, and it had no digital trail or afaik mention in outside sources. My main hunch it's meaningless though. Not that we could verify that be it the case though.
Maybe the finals aren't pronounced. Maybe they are particles that get hooked on to the word instead of following it. And maybe letters look different depending on where they are in the word like Greek sigma and lots of Arabic. Any use?
@@WDCallahan my most recent video explores exactly that, positional variation. In the video I focus on the fact that this shifts your problem: the alphabet becomes very small.
This inspired me to consider something. I don't think it's at all applicable to the Voynich manuscript at all due to the fixed patterns and how limiting this would be but the entropy and low alphabet size made me consider what if the letters were like a key to a dictionary. Use the 13 letters as pages, sections on a page. Given the word size that's 13⁹ combinations; the left over combinations could be the next word, thus allowing one word to represent an entire part of a sentence. Again to reiterate, I dont think this is the case here, it just got me thinking.
If it is something like this, we would essentially be looking at a code (if I understand correctly), which may be impossible to crack without the code book...
I've thought about that as well, but I'm not sure if it would work as an explanation. The main objective is that Voynichese is a system sustained over many pages. There's thought behind the glyph set (generalizing a bit, it's medieval numerals + ligatures and abbreviation characters + the set of "gallows"). There's a certain coherence between how curve-based glyphs and line-based glyphs interact. It's thought through and elegant rather than the haphazard mess you'd expect with someone badly interpreting a real text.
@@ChrisWAnim In its current state not. It can't even apply a simple substitution cipher to a text. I'll probably want to make a video about this later.
@@koengheuens something AI can't do right now is ask creative/unique questions. But it could be used to crowdsource ideas quickly. Maybe just sheer computational power alone could provide 1 great idea out of a sea of nonsense.
@@ChrisWAnim I definitely don't want to dismiss AI altogether. As a general remark, I am still not sure if we are at the peak of the current AI boom or just at the beginning. From experience though, I know that right now AI is mostly misused and misunderstood when it comes to Voynich research. It is very eager to please and prefers to come up with an answer rather than saying "I don't know". So I really think we need more reasoning and less "producing language that looks right". That said, it is still a handy tool. I used is to write some python code that came in handy while gathering statistics for my last video.
Pardon my very very casual uninformed comment here, but could't the repetitive gliphs at the start of words be some kind of diacritical markings? Would be a very extreme example, but I guess possible for an invented language, even if the same gliphs appear in other places, you could have a rule that says "when this appears at the start of a word it means this, but anywhere else it functions completely different". Wouldn't be too crazy, I think. But again, not a linguist. Not a native english speaker either, just to be safe.
@@germansassarini1372 normally, yes. The problem is though that the whole system is predictable. The common starting and ending glyphs are just easy examples because of their obvious position.
If it's not language, what else can it be? I'm not looking for a full solution here. But like...why else would someone write all these symbols if not writing a language?
There are many options still open, some more likely than others. Maybe it _is_ language, but just written in a system we haven't comprehended yet. Maybe it's "filler text". Maybe a combination of both. Maybe it was done as a form of meditative exercise. Maybe someone tried to fool his contemporaries. We just don't know yet.
Something intended to *resemble* a text in an unknown language: a hoax, joke or proto-surrealist artwork. Many such things have been done in modern times, but sometimes, they were likely inspired by the Voynich manuscript.
In german manuscripts it was not unusual to write q for the phonem k. Even if it was followed by u it was only pronounced "k". See Heinrich von St.Gallen "da quame eine priester.." Though written with qu it was pronounced just k. Also - linguists probably know - that often in southgerman manuscripts (uppergerman language)phonem a was written o and vice versa. qo = ka = kann (standard german), can (english).
@@thomaseriksen6885 good question. It was an established term among researchers when I encountered it. Now, in linguistics there are often inconsistencies in names op countries, languages and peoples. So it's not that important - the custom is to go with whatever name happens to be the most used one.
Superb. Congratulations Koen. Does this mean that if Voynichese can't map to any known language that there's no point in hoping it enciphers a natural language? Should all the cryptologists go home? :) Or do we start looking at ways to represent spoken language in other-than a phonetic alphabet. (By the way the term 'abjad' is no longer in vogue, since all it means is an alphabet which can, but need not represent vowels by diacritics. They are still alphabets. Not that it matters.. Great graphics.
What it especially means is that we have to abandon the hope of a relatively straightforward mapping of Voynichese to a natural language. People have long been aware of this - the stats clearly don't allow for such a solution. This doesn't necessarily mean that the text has no meaning though. Binary code is a bunch of 1s and 0s but it can say anything you'd like it to say. I don't know if there is any meaning to the text, but if there is, it will be discovered by understanding the system (how does it represent meaning?) rather than any language the glyphs map to. The term abjad is fine, it is simply a type of alphabet. That is exactly why I used it: whether the system is an alphabet with vowels, an abjad or an alphabet with vowels omitted really doesn't make any difference for the relevant stats. But I could have been more explicit about this in the video.
@@koengheuens I'm not quibbling. This is a genuine question from someone who is not interested much in cryptography or historical linguistics and is an ignoramus as a result. But aren't you really just saying that we have to abandon the hope of a relatively straightforward mapping of Voynichese to a natural language *that is recorded using an alphabet"? Or have you considered rules that apply to the recoding of natural languages in other forms of script, such as pictograms (like Chinese) or syllabaries. Perhaps there are other ways in which natural languages were recorded which aren't alphabets. Also, a trivial point - the 'final position only' isn't a problem. Some well-known scripts have two forms for a letter, one of which is used only in the final position. Also, why do you think so many glyphs are involved? It couldn't be needed to record number could it?
@@dntodo6749 People, including myself, have been exploring all kinds of options all the time. Thing is, anything we can think of that's ever been used to represent language is even worse (as a match for Voynichese) than an alphabet. This is different from for example the Rohonc Codex, which has a much larger variety of glyphs, so one could think of logographs or something similar to hieroglyphic script. It's a natural reflex to think of positional variation, and in fact I think this must play a role somehow. But the number of different glyphs is very limited, and most of them (if not all) have strong positional restraints. So if you want to be able to express a real language in Voynichese, you've got like six or seven sounds left to do so, because you will need an initial, medial and final form for each sound. It just doesn't work. In fact, even without reducing the de facto glyph inventory through positional variation, it is already a struggle to find enough frequent glyphs to match to sounds.
@@koengheuens Just to be clear, so I don't misrepresent you. You are saying (a) Voynichese cannot be a substitution cipher for the expression of a natural language IF (a) the natural language is conveyed using an alphabet AND - or should it be AND/OR -(b) so long as one glyph is assumed to represent one sound. Would you express it differently?)
@@dntodo6749 Alphabets like the Greek or Latin alphabet are, as far as I'm aware, the closest match to Voynichese. That is to say, moving to any different non-artificial writing system will make matters much worse. For example, abjadic alphabets actually have entropy stats that are further removed from Voynichese. And hopefully I don't have to explain why a logogram system doesn't work. So you're kind of stuck with alphabets as the closest thing, which is probably why they are so appealing to the average solver. Something like the Rohonc codex would invite more logographic solutions, since it has way more glyphs and some of those appear to have some pictographic inspiration. So let me be clear about it: IF you want a natural language solution , then an alphabet is the closest thing you can find "in the wild", but as I demonstrated, that won't work. There are modifications you could make, like for example assuming that certain glyph combinations represent one sound, but that would mean that all your words are now incredibly short, so you would have to abandon spaces. Each problem you solve leads to a bunch of other problems elsewhere. Anyway, some of this will be clarified in the video I'm currently working on.
Not necessarily. The word 'hoax' implies that the intent of the maker was to deceive others. Now, the vast majority of scholars believe the document to be authentic (i.e. made in the early 15th century), so it's not any kind of modern hoax where someone in the early 20th century made something that looks like a medieval manuscript. The other option is a medieval hoax, but this would imply that we can deduce that the medieval maker of the MS had an intention to deceive his contemporaries. And we don't have any reason to assume that this must have been the case. Imagine for example that someone made this text as a form of religious contemplation, or some spiritual undertaking. Or that it was a writing exercise for young students. Or that someone just thought it looked better this way. All that to say, even if it is meaningless, it needn't be a hoax. And of course it could still have some kind of meaning within the text. I'm just explaining that the dominant method among solvers (some variation of a simple substitution cipher) is impossible as a solution.
It could be a left over prop from a medieval escape room! 😅 Or maybe it functioned as how Lorem Ipsum does today - nonsense to redirect focus on the graphical elements, fonts, layout, etc
Just to be clear, I don't know that much about the Voynich manuscript, but: Why would the capital letter consistently be followed by one of two small letters? Also, 4o is followed by either the tall letter with two loops (ºHº) or the tall letter with one loop (Hº). Are you arguing that both of the tall glyphs are the same? Also, as per the newer video from this person, "The Voynich Manuscript's alphabet is smaller than you think (and that's why your theory is wrong).", if a space is part of the preceding letter, that adds at most 8 letters. If it's part of the letter after the space, that adds at most 13 letters. I guess you could argue that things don't argue "word-initially" / "word-finally" anymore, but the space thing just kicks the can down the road because it doesn't fundamentally account for the fact that the letters occur in such a predictable way. Maybe you could argue that strings of graphemes correspond to "actual letters", like "aiin 4o" is a single "actual letter" or something, but at that point, there's probably some other problem I don't understand. That problem might have to do with there suddenly being a weird amount of "actual letters" which doesn't correspond to any known language. Letter entropy isn't meaningless even if we don't know what does or doesn't count as a letter, it just means that the bounds of what counts as a letter have shifted and we have to deal with the entropy of those.
@@Stonecutter334 this will be the subject of my next video, to be released in a week or two hopefully (it takes quite a bit of research before I can even start editing).
As someone who has only a passing interest in ciphers and encryption. It seems like it's an art piece made in boredom. Or something to trick/convince someone of something useful to the creator.
Maybe the manuscript came from another reality where nothing makes sense when translated to our reality. Like it came from the fairy realm or something.
@@gregoryallen0001 the current generation of AI is ridiculously bad with this kind of stuff. Ask ChatGPT to come up with a substitution cipher and encode a paragraph. It can't do it.
@@koengheuensIt can't even manage to count the number of R's in words. The "Sparks of AGI" paper claimed GPT4 was good at Caesar ciphers of different offsets. Researchers later found it was particularly good when the offset was 13. The training data had scraped a bunch of rot13 text and the AI learned it as if it were a foreign language.
@@koengheuens Yeah text generative ML definitely isn't useful in this situation. However, I would love to see what a character/word recognition model, trained on texts in different languages to pick out all the variations of characters, spat out. Like, for one, i wouldn't be surprised if that "4o" character combination would be treated like one character, not a combination of 2. But getting a ML model to run on the whole manuscript might give us some insight that would take a lot of work to get to manually.
"It's solved!"
"What's the translation then?"
"No, not like that!" 😂
I've sent the translation to your youtube inbox.
@@mrossknesend it to me too! 😅
i wonder if it was written by someone who was illiterate,
imitating the forms of words with their repetitive shapes but without any meaning
That's a good idea. Illiterate or pre-literate (a student?). The issue there is that Voynichese has a very clear "system" to it. It really looks like the makers put a lot of thought into the composition of the glyph set and the way it behaves. We just don't know what that behavior means. Still, I wouldn't entirely discount something along those lines.
The obvious objection is that you could have a substitutional cipher that substitutes individual sounds for groups of Voynichese letters that occur together
@@metachirality yes, I think that's a valid line of investigation, and in fact I looked into that a couple of years ago: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
This could be called a "verbose cipher", though I certainly agree that this is still a form of substitution. It's a far cry from what theorists tend to do though. More commonly, they like to do the opposite, mapping syllables to voynichese glyphs. I'm currently making a video about how this is an awful solution:)
Note that there are all kinds of issues with the verbose cipher approach though. One of them is that "words" would need to get really short. So probably you'll be forced to abandon spaces. So the first thing you'd need to do after identifying glyph clusters that map to letters is explaining how spaces were inserted.
Interesting thought. Basically turning an alphabet in a syllabary
In other words many letters just aren’t used in the document although they exist in the alphabet
@@koengheuens why couldn't you treat spaces as just another character?
@@koengheuens My personal theory is that many different Voynichese words can map to a single letter in another language, and the difference between Voynich A and Voynich B reflects a shift in which "words" are favored over others during conversion. Who knows though, it could be completely false.
I would love to see way more in Voynich from you - really glad the channel was recommended to me
@@VirtueInEternity thanks! I'm scripting a new video right now.
Maybe this is influenced by my being a physician researching mental illness, but I've always had the feeling that the manuscript might be done by someone with some mental disorder (which at the time were not treatable at all), from something as comparatively mild as autism (where repetitive behaviours are extremely common) to some form of psychosis. So it would basically be a form of written echolalia or echopraxia... It would be quite a boring solution, as that would probably mean there is absolutely no meaning in it at all. Some forms of schizophrenia are interpreted by the patient as being visions or revelations from deities... I can totally imagine a mildly schizophrenic priest or monk thinking they were being instructed on what to write by some god or some spirit, as with many (even mentally sound) people believing they can "speak in tongues".
Yeah, like any solution that proposes absense of meaning, we'd never be able to prove this if it were the case. Mental illness can express itself in so many ways...
One big issue with this is that we have long known that there are at least two scribes, and Lisa Fagin Davis' more recent research indicates that there are five. This at least suggests some kind of shared experience.
Even if the text turns out to be something like what you describe, we may not even need mental illnesses as an explanation. Medieval people were religious fanatics compared to the average person in modern developed nations. And religious zeal can cause people to do weird things.
@@koengheuens very interesting! I'll keep watching your other videos on the subject!
Reminds me of Terry Davis and his TempleOS
Just discovered this channel. I know all your videos are on a very niche subject, but I was shocked to find you had less than 1k subs! I don't know enough about languages to have my own theory on the voynich manuscript, but merely absorbing this information (which I otherwise have no interest in) has been delightful. Thank you for your videos, they're presented well and simply enough for someone like me (with no experience in this) to grasp.
I've known about the Voynich Script for a while, but videos never go into the details of the writing itself. Although, the Histocrat had a video about its history, which was in depth and interesting. Thank you for your hard work on these, I know how long it takes to edit even seemingly simple things, and your work is commendable. Thank you!
Thanks! To be honest, many of my videos are really niche (like those from Voynich Day, which is like a conference for Voynich nerds). But the most recent one suddenly got picked up by the algorithm. I'm planning to make more videos like these, explaining things Voynich researchers struggle with to a broader audience. I'm glad people enjoy this and find it informative, that means a lot to me.
Voynichese reminds me of a court reporter's steno system where the "characters" of a word always appear in the same location regardless of its location in the word. Vowels and vowel combinations always on the left in that example, consonant combinations always on the right, and the next word on the following line.
Also did any of these "solutions" actually bother translating the whole thing? I feel like if you solved the VM, and your ealized it was Welsh all along...you'd just go ahead and translate all the characters to welsh and then people can easily verify it. Are these people deliberately lying and didn't expect anyone to actually *apply* the substitution beyond a few cherry-picked words/sentences?
Oh man, don't get me started... We always see the same thing over and over. "I found the language but actually I don't speak it, so I need experts to solve the rest". Stuff like that.
To be honest though, it rarely happens that I feel some malicious intent by the solvers. They are genuinely convinced that they solved it. It's so easy to cherry pick words, but each time one of those "works", they do get a confirmation that they are on the right track. So I like to think of most solvers as people who have fallen victim to confirmation bias rather than conscious liars.
@@koengheuens that just sounds like cranks who say "I disproved general relativity but I haven't done the math" lmao
Videos like this about the Voynich Manuscript are the perfect size to get invested into the subject- please make more like this and thank you for breaking down the study of this famous codex 🙏
@@cadesummers5866 thanks! I hope to have the next one out in a week or two. I'm really enjoying making these and want to keep improving while doing it. It means a lot that people find them informative.
Some languages have weird limitations of entropy:
- Turkish vowels past the first one are limited to 2 options only (a/ı > a/ı, o/u > a/u, e/i > e/i, ö/ü > e/ü).
- In Inuktitut, words can only start in a, i, u, p, t, k, q, s, m, n, and only end in a, i, u, t, k, q. Medial consonant clusters undergo assimilation such that there's mostly only mm nn ŋŋ vv ll jj ɡg ʁʁ jv jl jg jʁ ʁv ʁl ʁj pp tt kk qq ts ss ɬɬ sp st sk sq sɬ qp qt qɬ ɬp ɬt ɬk ɬq ɬs as far as I can tell (plus some others depending on dialect).
- Middle Chinese can be seen as a series of binary choices per syllable: [initial consonant] > [normal or "back" t,tʰ,d,n?] > [labiodentalization of p,pʰ,b,m?] > [sibilant type is s, ʂ or ɕ?] > [aspiration?] > [voicing?] > [j?] > [j or j̈ ?] > [w?] > [ɥ?] > [low or high vowel?] > [normal vowel or faster more centralized one?] > [extra weird split between "ʌ" and "e"] > [extra weird split specific to ij/in] > [pharyngealized?] > [rhoticized?] > [final consonant is j/w/m/n/ŋ/p/t/k or nothing?] > [is final ŋ/k "front" or "back"?] > [level/rising/departing tone?]. Each of these binary choices is heavily dependent on other choices and the number of potential neutralizations is exponential. For instance, labialization can only happen once - either as a labial consonant, or as w or ɥ, or as a rounded vowel, or as final w or m or p, or as "back" ŋ/k (except for a rare "jom" rhyme). The pattern of palatalization and rhoticizing involves like 8 of these choices and is completely bonkers and most of it is in almost complete complementary distribution. There's all in all about 5000 possible syllables (which merge into about 3000 in Cantonese, and 1200 in Mandarin).
That being said, none of the languages you'd expect in the Voynich manuscript are like that. Romance languages in particular are quite orthogonal in combining consonants with vowels and your enthropy never really falls under about 1-in-3 (and is more like 1-in-5), and likewise for other Indo-European languages (aside from "ə" shenanigans).
These are some very good points - the observed phenomena In Voynichese aren't unheard of in natural languages. But whichever example you produce, Voynichese usually turns out to be much worse. For example, the glyphs that can regularly occur both word-initially and word-finally are very limited (in contrast to for example the Inuktitut example, where you have a few that can do both). And usually, when you find a language with some of those properties, you'll see that another stat is way off in compensation, like for instance the practically required size of your "alphabet". (I want to get into that a bit in the next video).
Your roman numeral example made me think of a reverse gematria type cypher of some sort. Where specific numbers correspond to certain letters or words. So these numbers would have a very rigid structure in how they can be written. Or like how in modern linguistics, when describing phonemes you list the features of each given aspect of the phoneme in a very particular order
It's my understanding that Voynich has a lot of unique "words" (collections of symbols), despite the rigid word structure. Does this mean that many of the "words" are nearly identical to each other, differentiated only by a single symbol? If so, does this mean that most "words" can be placed into a relatively small number of categories? Again, if so, how many categories would you say there are? From what I've seen, I would guess probably somewhere around ten main categories, but maybe as high as twenty? (I imagine there are a small number of "irregular" words that don't fall into any pattern that would be ignored for categorization purposes.)
I have personally not studied word categories enough to adequately answer that question. What I do know is that if you look at the rate at which new vocabulary is introduced (TTR), Voynichese behaves a lot like other languages. Those are just the broad statistics though. If you look at what those "words" actually are, it is as you say: small differences from one word to the next, lots of repetition and near-repetition like "du du dum". This would be a good topic for another video, but I'd need to get in some people who've studied this more.
I'm torn on the mystery. Half of me wants to see a _true_ solution, but half of me wants it to just be a very effective act of trolling from some troll several centuries in the past. The latter really makes my heart soar as a current era troll.
The best of both worlds if it is is solvable and is an immense troll at the same time. I really hope the text is mocking the solver and is not related to the illustrations.
the translation is "ohio sigma skibidi" repeated many times
While it is true that Voynichese character sequences are more predictable than natural language ones, linguists Claire Bowern and Luke Lindemann actually dismiss the idea that it is all just gibberish based on the properties similar to those of natural languages that it displays at higher levels of organisation. So, it is likely to be an encoded natural language or a constructed language. It is unclear whether it is more likely to be an encoded natural language or a constructed language because the supposed early 15th-century date of the Voynich manuscript predates the split between linguistic experimentation and cryptography and we don’t have the intended key, which would explain why Voynichese character sequences are more predictable than natural language ones. And if it is intended to be an encoded natural language, maybe it is a verbose substitution whereby the real character sequences are replaced with false longer ones. Or it is really just plain text and it is so predictable because it was only used for these manuscripts about specific topics, though it is still mysterious which language it is in.
Has anyone tried to analyze it as a language written in some weird order, like, as you said, every word in alphabetical order?
Great video.
I am personally inclined to believe it's systematic gibberish, but still, your video made me wonder about the following:
Has anyone worked in anything resembling a tokenization of the text, that could try to match *sequences* of EVA characters with tokenization of other candidate languages to be behind Voynichese? Token as in: discrete unit of meaning or structure (e.g., words or parts of words) that serves as the input building block for language models in machine learning.
Maybe in that case there are more relevant elemental structures than the EVA characters...
And also:
Would making a single character equate to more than one EVA character still count as substitution? if a = qwe b = qwt c = qwy or any reasonable iteration from that.
Thanks! Regarding your last question, I honestly don't know where substitution ends - that sounds like a matter of definitions where people are likely to disagree.
That some parts of EVA, like ch and sh, are better written as single glyphs seems clear. But this could certainly be taken further. Back in 2020 I spent some time looking into these kinds of issues. You can read about it here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/
5:45 wouldnt this suggest some sort of n-gram based system? there has to be some trade off between character length of a text block and encoded characters per length. this sort of has to assume that spaces are to be ignored though..
also, this only works if the ability to predict the next character from the last is not 'continuous' across a section of text. if it reliably fails to predict at a certain interval, then you have your n-gram length. if it never fails to predict the next character, then its too deterministic to express any meaning whatsoever (unless the meaning itself is the repeated pattern)
I looked a bit into n-grams a couple of years ago, though only with entropy in mind and without trying to find a consistent length. See here: herculeaf.wordpress.com/2020/10/19/entropy-hunting-bigger-and-better/ (note that I had no training in statistics or anything the like, so everything you see is at the limits of my understanding at the time).
So far, all research I'm aware of has kept spaces intact. Removing them will probably have some impact, but I don't think it will move the needle much on the entropy meter. Your idea is worth testing though.
I feel like this implies Voynichese is some intermediate between a logography and syllabary, or maybe just a highly derived logography. The trends in character entropy seem to fit the concept of purely orthographic agglutination, possibly with purely non-phonetic adornments (e.g., the divine marker in cuneiform). People have definitely looked into the idea of agglutination in Voynichese before, but afaik it's just that guy who claims it's a phonetic version of Old Turkish which is so obviously wrong, and I think the opposite perspective--of a text which is barely connected to or entirely disconnected from phonetics--works a lot better.
Then again, such a consistent structure within each word could also easily point to a shorthand with words referencing entire sentences or phrases with order defined by some strict grammatical system to ensure similar order in all sentences containing like grammatical aspects, as would fit shorthand. Or it could just imply the algorithmic creation of an overly-effortful hoax we've all been fooled by for centuries.
Almost makes me think of some of the writing practice I did as a kid. That often had a nonsensical and repetitive nature to the final product with sometimes no actual letters, because it was intended to teach muscle movements not spelling. That would also be pretty low entropy.
Not saying I think that is the answer, just noting a similarity in my experience.
I've thought about this as well - scribes also had to learn how to write at some point. This would still leave a whole lot of questions open, of course. Why exactly this system, which 'teaches' some useless connections and leaves out a whole range of useful ones? And why 'practice' alongside weird drawings of plants and nude figures?
@@koengheuens
Perhaps a more specific sort of practice to learn how to illuminate manuscripts for a scribe who already knew how to write. Monks had all sorts of strange hangups, it would absolutely not surprise me if some sect refused to practice on something real so made up things for their practice sheets.
Though you know like a million times as much as I do about this whole thing, so if you don't think it fits then I would bet you are right.
source: it came to me in a dream!
imagine an alternate universe where everyone speaks voynichese, and mfs find "The English Manuscript"?
How much entropy would be needed to characterize a language not by words, but by numbers and units? As in, if these were recorded results of an experiment, then it would make more sense if certain symbols only appeared in front of others, and recurring suffixes would make more sense, too.
@@JenJenswriting I don't think there's enough entropy anywhere within voynich words to accommodate decimal numerals. But something like Roman numerals plus extra stuff might certainly be possible. I don't think there's a way to prove this though. You could show that it's possible, but that's probably where this line of investigation would end.
Also, don't forget that we're looking at tens of thousands of "words". Whatever was done to obtain these was sustained for a long time.
@@koengheuens Yeah, that makes sense. I'll try to come up with another avenue. It just doesn't make any sense for it to be gibberish, like some vehemently believe. After all, writing an entire text of gibberish would just land someone in deep water in the 1400's.
@@JenJenswriting I also still think there must be something more to it. But currently I'm clueless about what this could be.
There's also something philosophical about it. We can never prove that something has no meaning, so if I were to say "this is gibberish and the drawings are a child's doodles", then I might as well look for a different hobby. We will keep looking for meaning, hoping there is one.
Who knows, it might turn out to be fairly simple and obvious once we find it.
@@JenJenswriting People wrote entire books full of gibberish in the 1400s, mostly to con other people. Someone creating this as a con - to sell to someone else as some sort of valuable, mysterious manuscript - is plausible. One of the theories is that it was directed at either Georg Baresch or Athanasius Kircher, two of the early people associated with the book, or possibly Rudolf II, who purportedly owned the book previously. It is also possible it was a prank on Kircher; someone wrote up fake Egyptian text and sent it to Kircher and Kircher "deciphered" it, even though it actually was gibberish.
The biggest point against it being gibberish is that it has some statistical properties that are unlikely in a gibberish text.
Wile it likely isnt substitutions
I think the letters that only appear at the beginning or end of a word may be for removing excess ink from the wrighting implement and the ones at the end to make the space between words more distinct
The ones at the end of a full line is so you get rid of lots of the ink on the quill so it doesn't leave a drip all along what you just wrote
Those symbols don't make a sound or thay can also mark where a inhale or breathing marker so you can fluently read it outloud
I am curious on what this thing is and want to know more about it
What is it called
I need pictures
And location of discovery
Using this method the translation may be in our grasp and we could even determine if it could infact be giberoush but with such little info i have i can't tell
Voyni Cheese 🧀 😋
Could it be possible that this is some sort of Semitic-like language? From my experience, Arabic contains a lot of predicable word patterns, as in non-concatenative morphology maybe whatever this language is (assuming it's speech), could be attempting to replicate that? Maybe the "o" character is some sort of placeholder for any vowel sound in addition to the "ɪ" character. I'm definitely not claiming it's semitic, because even Arabic is very flexible and has high entropy when it comes to character prediction.
My interpretation is that the author probably copied that idea and made a conlang out of it: terribly artificial and unflexible but functional.
I haven't looked much into, but since you're the expert, I have a question: are there words in this manuscript? More specifically, do they even obey Zipf's law? If not, then we can automatically discard natural languages, and assign it to a potential "kitchen-sink conlang".
I haven't studied Zipf's law myself because I am hesitant to determine glyph boundaries (and thus word length). However, the Voynichese system does not appear to follow Zipf's law: voynichattacks.wordpress.com/2021/08/12/word-length-distributions/ .
I also recall discussion of Zipf's law of Brevity that states that the most common words in languages tend to be short. Even if we can't be certain how many glyphs the Voynichese strokes denote, it also appears clear that the MS does not follow this law, with many frequent words of average length.
It's gylphs still look like some form of Contential Celtic script. like the double ιι etc. Although probably unrelated I find it interesting.
Sometimes I wonder if it could be a music sheet? Maybe a choral compendium? Medieval music was pretty rigid, so repetitive sequences and symbols occurring only in specific positions wouldn't be surprising.
I was a part of a local church with very old tradition in rural Russia, and they had a music notation system that was local enough only couple of dozen people across a handful of local churches could still read it, and it had no digital trail or afaik mention in outside sources.
My main hunch it's meaningless though. Not that we could verify that be it the case though.
Or, is anybody thought of a code like quoAaoi quoPaoi quoPaoi quoLaoi quoEaoi for example?🍎
Maybe the finals aren't pronounced. Maybe they are particles that get hooked on to the word instead of following it.
And maybe letters look different depending on where they are in the word like Greek sigma and lots of Arabic.
Any use?
@@WDCallahan my most recent video explores exactly that, positional variation. In the video I focus on the fact that this shifts your problem: the alphabet becomes very small.
This inspired me to consider something. I don't think it's at all applicable to the Voynich manuscript at all due to the fixed patterns and how limiting this would be but the entropy and low alphabet size made me consider what if the letters were like a key to a dictionary. Use the 13 letters as pages, sections on a page. Given the word size that's 13⁹ combinations; the left over combinations could be the next word, thus allowing one word to represent an entire part of a sentence.
Again to reiterate, I dont think this is the case here, it just got me thinking.
If it is something like this, we would essentially be looking at a code (if I understand correctly), which may be impossible to crack without the code book...
@@koengheuens maybe with computers
Could it be a copy of a real text by someone who couldn't read the writing system and misinterpreted the characters?
I've thought about that as well, but I'm not sure if it would work as an explanation. The main objective is that Voynichese is a system sustained over many pages. There's thought behind the glyph set (generalizing a bit, it's medieval numerals + ligatures and abbreviation characters + the set of "gallows"). There's a certain coherence between how curve-based glyphs and line-based glyphs interact. It's thought through and elegant rather than the haphazard mess you'd expect with someone badly interpreting a real text.
To be fair, D can only be an initial consonant in Chinese, just like Voynich
do you think the rapidly improving AI like chatgpt will crack this?
@@ChrisWAnim In its current state not. It can't even apply a simple substitution cipher to a text. I'll probably want to make a video about this later.
@@koengheuens something AI can't do right now is ask creative/unique questions. But it could be used to crowdsource ideas quickly. Maybe just sheer computational power alone could provide 1 great idea out of a sea of nonsense.
@@ChrisWAnim I definitely don't want to dismiss AI altogether. As a general remark, I am still not sure if we are at the peak of the current AI boom or just at the beginning. From experience though, I know that right now AI is mostly misused and misunderstood when it comes to Voynich research. It is very eager to please and prefers to come up with an answer rather than saying "I don't know". So I really think we need more reasoning and less "producing language that looks right". That said, it is still a handy tool. I used is to write some python code that came in handy while gathering statistics for my last video.
May it be numerals? And then some code. Maybe not binary, haha, but some-ry
Pardon my very very casual uninformed comment here, but could't the repetitive gliphs at the start of words be some kind of diacritical markings? Would be a very extreme example, but I guess possible for an invented language, even if the same gliphs appear in other places, you could have a rule that says "when this appears at the start of a word it means this, but anywhere else it functions completely different". Wouldn't be too crazy, I think. But again, not a linguist. Not a native english speaker either, just to be safe.
@@germansassarini1372 normally, yes. The problem is though that the whole system is predictable. The common starting and ending glyphs are just easy examples because of their obvious position.
If it's not language, what else can it be? I'm not looking for a full solution here. But like...why else would someone write all these symbols if not writing a language?
There are many options still open, some more likely than others. Maybe it _is_ language, but just written in a system we haven't comprehended yet. Maybe it's "filler text". Maybe a combination of both. Maybe it was done as a form of meditative exercise. Maybe someone tried to fool his contemporaries. We just don't know yet.
Something intended to *resemble* a text in an unknown language: a hoax, joke or proto-surrealist artwork.
Many such things have been done in modern times, but sometimes, they were likely inspired by the Voynich manuscript.
What if it’s not a language but equations ? Or formulas? Like dna… gattaca?
Fairies
its simple, they are not single letters, what we see as two are just one. that or its just made up jibberish
In german manuscripts it was not unusual to write q for the phonem k. Even if it was followed by u it was only pronounced "k".
See Heinrich von St.Gallen "da quame eine priester.." Though written with qu it was pronounced just k. Also - linguists probably know - that often in southgerman manuscripts (uppergerman language)phonem a was written o and vice versa. qo = ka = kann (standard german), can (english).
I don't get why everyone one is saying it's voynichese when it's far likelier to be voynichian?
@@thomaseriksen6885 good question. It was an established term among researchers when I encountered it. Now, in linguistics there are often inconsistencies in names op countries, languages and peoples. So it's not that important - the custom is to go with whatever name happens to be the most used one.
Superb. Congratulations Koen. Does this mean that if Voynichese can't map to any known language that there's no point in hoping it enciphers a natural language? Should all the cryptologists go home? :) Or do we start looking at ways to represent spoken language in other-than a phonetic alphabet. (By the way the term 'abjad' is no longer in vogue, since all it means is an alphabet which can, but need not represent vowels by diacritics. They are still alphabets. Not that it matters.. Great graphics.
What it especially means is that we have to abandon the hope of a relatively straightforward mapping of Voynichese to a natural language. People have long been aware of this - the stats clearly don't allow for such a solution. This doesn't necessarily mean that the text has no meaning though. Binary code is a bunch of 1s and 0s but it can say anything you'd like it to say. I don't know if there is any meaning to the text, but if there is, it will be discovered by understanding the system (how does it represent meaning?) rather than any language the glyphs map to.
The term abjad is fine, it is simply a type of alphabet. That is exactly why I used it: whether the system is an alphabet with vowels, an abjad or an alphabet with vowels omitted really doesn't make any difference for the relevant stats. But I could have been more explicit about this in the video.
@@koengheuens I'm not quibbling. This is a genuine question from someone who is not interested much in cryptography or historical linguistics and is an ignoramus as a result.
But aren't you really just saying that we have to abandon the hope of a relatively straightforward mapping of Voynichese to a natural language *that is recorded using an alphabet"? Or have you considered rules that apply to the recoding of natural languages in other forms of script, such as pictograms (like Chinese) or syllabaries. Perhaps there are other ways in which natural languages were recorded which aren't alphabets. Also, a trivial point - the 'final position only' isn't a problem. Some well-known scripts have two forms for a letter, one of which is used only in the final position. Also, why do you think so many glyphs are involved? It couldn't be needed to record number could it?
@@dntodo6749 People, including myself, have been exploring all kinds of options all the time. Thing is, anything we can think of that's ever been used to represent language is even worse (as a match for Voynichese) than an alphabet. This is different from for example the Rohonc Codex, which has a much larger variety of glyphs, so one could think of logographs or something similar to hieroglyphic script.
It's a natural reflex to think of positional variation, and in fact I think this must play a role somehow. But the number of different glyphs is very limited, and most of them (if not all) have strong positional restraints. So if you want to be able to express a real language in Voynichese, you've got like six or seven sounds left to do so, because you will need an initial, medial and final form for each sound. It just doesn't work. In fact, even without reducing the de facto glyph inventory through positional variation, it is already a struggle to find enough frequent glyphs to match to sounds.
@@koengheuens Just to be clear, so I don't misrepresent you. You are saying (a) Voynichese cannot be a substitution cipher for the expression of a natural language IF (a) the natural language is conveyed using an alphabet AND - or should it be AND/OR -(b) so long as one glyph is assumed to represent one sound. Would you express it differently?)
@@dntodo6749 Alphabets like the Greek or Latin alphabet are, as far as I'm aware, the closest match to Voynichese. That is to say, moving to any different non-artificial writing system will make matters much worse. For example, abjadic alphabets actually have entropy stats that are further removed from Voynichese. And hopefully I don't have to explain why a logogram system doesn't work. So you're kind of stuck with alphabets as the closest thing, which is probably why they are so appealing to the average solver. Something like the Rohonc codex would invite more logographic solutions, since it has way more glyphs and some of those appear to have some pictographic inspiration.
So let me be clear about it: IF you want a natural language solution , then an alphabet is the closest thing you can find "in the wild", but as I demonstrated, that won't work.
There are modifications you could make, like for example assuming that certain glyph combinations represent one sound, but that would mean that all your words are now incredibly short, so you would have to abandon spaces. Each problem you solve leads to a bunch of other problems elsewhere.
Anyway, some of this will be clarified in the video I'm currently working on.
So its a hoax?
Not necessarily. The word 'hoax' implies that the intent of the maker was to deceive others. Now, the vast majority of scholars believe the document to be authentic (i.e. made in the early 15th century), so it's not any kind of modern hoax where someone in the early 20th century made something that looks like a medieval manuscript.
The other option is a medieval hoax, but this would imply that we can deduce that the medieval maker of the MS had an intention to deceive his contemporaries. And we don't have any reason to assume that this must have been the case. Imagine for example that someone made this text as a form of religious contemplation, or some spiritual undertaking. Or that it was a writing exercise for young students. Or that someone just thought it looked better this way. All that to say, even if it is meaningless, it needn't be a hoax.
And of course it could still have some kind of meaning within the text. I'm just explaining that the dominant method among solvers (some variation of a simple substitution cipher) is impossible as a solution.
It could be a left over prop from a medieval escape room! 😅
Or maybe it functioned as how Lorem Ipsum does today - nonsense to redirect focus on the graphical elements, fonts, layout, etc
Nope, my theory is correct. I knew the author.
4oH is one capital letter f.e. r and R looks completely different. letter entropy is meaningless if you even don't know if space is letter or not ;)
Just to be clear, I don't know that much about the Voynich manuscript, but:
Why would the capital letter consistently be followed by one of two small letters? Also, 4o is followed by either the tall letter with two loops (ºHº) or the tall letter with one loop (Hº). Are you arguing that both of the tall glyphs are the same?
Also, as per the newer video from this person, "The Voynich Manuscript's alphabet is smaller than you think (and that's why your theory is wrong).", if a space is part of the preceding letter, that adds at most 8 letters. If it's part of the letter after the space, that adds at most 13 letters. I guess you could argue that things don't argue "word-initially" / "word-finally" anymore, but the space thing just kicks the can down the road because it doesn't fundamentally account for the fact that the letters occur in such a predictable way.
Maybe you could argue that strings of graphemes correspond to "actual letters", like "aiin 4o" is a single "actual letter" or something, but at that point, there's probably some other problem I don't understand. That problem might have to do with there suddenly being a weird amount of "actual letters" which doesn't correspond to any known language. Letter entropy isn't meaningless even if we don't know what does or doesn't count as a letter, it just means that the bounds of what counts as a letter have shifted and we have to deal with the entropy of those.
question everything.. Give me 1mo ;)
@@RamanujSarkar-i3l tested on all languages its not real language, there is simple caesar cipher behind this manuscript
There is a fair amount of evidence its a total vintage fake.
@@Stonecutter334 this will be the subject of my next video, to be released in a week or two hopefully (it takes quite a bit of research before I can even start editing).
Voayeurnich solved and solved and solved Spannxxx!!!
As someone who has only a passing interest in ciphers and encryption. It seems like it's an art piece made in boredom. Or something to trick/convince someone of something useful to the creator.
Using Chi to represent ch as in voynich is deeply wrong and I will be reporting you to authorities
Maybe the manuscript came from another reality where nothing makes sense when translated to our reality. Like it came from the fairy realm or something.
You will tell us why all solutions are wrong, but first you mistype Caesar.
Classy reply.
*classicist reply
@@capnmnemoconical ripple
AI is gonna solve this in 4sec
@@gregoryallen0001 the current generation of AI is ridiculously bad with this kind of stuff. Ask ChatGPT to come up with a substitution cipher and encode a paragraph. It can't do it.
@@koengheuensIt can't even manage to count the number of R's in words. The "Sparks of AGI" paper claimed GPT4 was good at Caesar ciphers of different offsets. Researchers later found it was particularly good when the offset was 13. The training data had scraped a bunch of rot13 text and the AI learned it as if it were a foreign language.
@@koengheuens Yeah text generative ML definitely isn't useful in this situation. However, I would love to see what a character/word recognition model, trained on texts in different languages to pick out all the variations of characters, spat out. Like, for one, i wouldn't be surprised if that "4o" character combination would be treated like one character, not a combination of 2. But getting a ML model to run on the whole manuscript might give us some insight that would take a lot of work to get to manually.