THE SEARCH FOR NUMERICAL CODES IN THE VM.

Jan. B. Hurych

Even before the search for natural language of the VM (that is without any code being used) was in full speed, some researchers suspected the manuscript could have been encoded (or enciphered). Several points were leading to that conclusion:

The truly unknown script. In spite of minor similarities here and there the VM script was never found in any other document or record. The possibility that we are dealing with "once-living-but-now-dead" language and/or script seems to be very low, considering that even with unknown and dead languages, we usually have quite a number of documents or artifacts that were left behind. More likely, we have here an artificial script and/or coded text, where the script is simply one kind of encoding ( say as is Morse code e.t.c. ]
It was a frequent habit already in the Middle ages to encode the important documents.The secret script was already considered as an encoding tool (say like Freemasons' script) - and sometimes even the one and only tool of encryption. Pretty soon it became obvious however that it is only one special case of the substitutional, monoalphabetic cipher for which they already developed the so call "letter frequency" statistics. One just had to use the transcript (any unambiguous assignment would do) and then, letter by letter, to guess and check for the proper plaintext until the text had one, non-ambiguous meaning . So to make the text hard to crack, additional enciphering was used, such as reversed order of letters, polyalphabetical cipher, grill, transposition cipher or other methods.
No description of the script. Contrary to other secret scripts, which were artificially created and were mostly described in minute details by the authors in their books, the VM script is described nowhere. And what's more, its character frequency does not fit any of languages so far investigated. (Note: Actually there is a resemblance to the statistics of Latin, but using the corresponding conversion table, we do not get the plaintext in Latin. This would also point to the fact there was probably additional encoding used, say monoalphabetic or transposition cipher - that would not change the frequency curve).
The natural language search did not provide any solution. When quite a number of the known languages were tried, the attempts switched to less known languages, of course with the same negative results. Here I beg to differ from those, who claim they did find such language: what they really found are the methods of interpretation - mostly trimmed to the expected results - something like reading tea-leaves. And so we have now several leaf-readers that accuse each other for reading the sense into the random sediment not realizing they themselves do the same.
The shortage of longer words. There were of course more indications that the VM is encrypted text, namely since the longer words there are missing, there is totally disconnected script, strange plants, strange pictures and astrological/astronomical charts.
The call for the cavalry. It was quite logical that a great hopes were put on the expert decoding and top military experts tried their hands (and brains as well) in cracking the VM, unfortunately with no success.
When decoding failed, the research turned for a while to the search for natural language and after failure of that, the decrypting was undertaken again. That created the vicious circle which we are not able to break. So instead of working in cycles, we should count our losses and split in two groups, each devoting their full time to the either non-encoded natural language or decoding. After all, the VM could hardly have both at the same time . . .

In order to undertake the cracking of the VM, we have to first see the problems that did not allow the military experts to reach that goal. As I mentioned in my last article ("The Research of the Voynich Manuscript: The Strategies and the Results) the military cryptography has certain advantages that we cannot exploit in the VM decoding: the scripts (or signals used, say Morse) are very well known and as for the language, there are only few necessary to be tried. Beside that, there are already some known methods, partly cracked messages, mistakes by enemy operators, set of usual - and suspected, mostly military - expressions, dates or same messages translated in different languages and/or coded by already known cipher, captured enemy operators, captured coding documents or coding machines and last but not least, the actions of friendly spies. Actually they are so many possibilities that one wonders how much of the cracking success is thanks to them alone :-). Unfortunately, none of that is available for the VM.

That is not to say that military experts have it easier, just different - and what's more, some of their cracking methods are simply not applicable for the VM either. But the most difficult obstacle is still the language and next to it the VM script. Why also the script? Those two problems are actually interrelated. We know that frequency characteristics do not help, since hey are good for monoalphabetic enciphering only. But they might not fit for some other reason as well: the script of the VM is not clear at all. For instance, is the script more of the phonetic type (say the characters represents whole syllables) or is it only singular with maybe several composites (say doublets) for some sounds? We simply have to have a match of the script with existing language - just imagine the difference between English "sh" and German "sch" - certainly, they both sound the same, but what a difference it would make for decoding where every letter counts! Besides, some questions around the VM script are not yet solved: are "gallows" really characters of script or are they something else, say only the indicators of the new key? Are the Currier's characters "M", "N" and "3" really single characters or only composite characters ( in EVA alphabet the same characters are indicated as "in", "iin" and "iiin"?). Especially when EVA has also characters "i" and "n" so the confusion is complete and resulting frequency curves are therefore useless.

Can we avoid those problems? If the text is not encoded, we apparently can: once we establish the language, we can easily decide which alphabet is right and which is not, mostly from the meanings of the words alone. For decrypting however we have to try only the words with characters that are non-ambiguous. So the problem is this: if we find the "solution", how do we know it is the right one? Apparently we have to try to read the "solution" in many languages before we discard it . . .

I think the solution is in already hidden the VM: we have to assume that the author really wanted the transcript to be
a) read, and
b) understood - provided the assumption a) is correct and the proper method of encrypting, suggested in the VM, is found.

True, the VM may be only for initiated, but even they could not relay on their memory only and next generation of readers would may completely lose it. So the codes (or keys to ciphers) have to be hidden somewhere and somehow in the manuscript itself. The assumption a) is quite probable, since nobody would in his sane mind go through such work just for himself, just for his pleasure. He would not make spent so much time to make it so mysterious and challenging, just to please his ego - that would require the public. And he if so, he must have been sure enough with his coding so he could have placed some hints there as well. The point b) is even stronger: such book would certainly raise the curiosity and many would want to read and understand it, same way like we want to :-). All that in spite of the fact - or maybe because of the fact? - we do not know why the author wanted the others to read/understand it, i.e. we do not know the true secret of the VM which can be known only after decoding . . .

What's left is then to find the right method, but that is the most difficult part of it all. However, since we assume the key must be there, we should start looking for it first. If we find something peculiar and the application of it will give us some interesting results, we may be on the right track. So we should look for hidden numbers. Why searching mainly for numbers? The linguists did all kinds of possible statistics, but their explanation did not confirmed the basic fact that the text is in natural language. For instance the testing of the second entropy suggests the VM has a meaningful, organized content. Some read it the way it supports the theory that the VM is written in natural language - in spite of other facts that show the VM is rather "un-natural". For instance if the VM is only the written record of some melodies expressed by strange notation of musical notes what do the results of second entropy tests tell us about the melody itself? No, to satisfy the claim of natural language we have to find the language itself.

As for the key, it could be of course alphabetical, numerical, mixed or expressed some other way (say inside the text or just steganographically, e.t.c.). Alphabetical key would be of course hard to recognize in unknown script and what's more, we probably would not know how to use it anyway. The numbers however have one and only meaning (i.e. 3 is 3 and not 4). Of course, if they are not written as visible digits, they must be expressed some special way One way is of course by unnoticeable digits - see section (1) below where the numbers are masked, the other is by counting objects (2) or there may be some other ways as well (3). First two techniques can be both found in the VM and they are peculiar enough to assume they have some other purpose the just to number the lines - the case I mentioned in my article for Journal of Voynich Studies, "THE NUMBERS IN THE VM".

1) Numbers are written in (but masked).
In above article I already mentioned the folio f49v where is already 5 lines starting with numbers from 1 to 5. It was suggested by some they were written later, by somebody else than the author, but if we study the writing style and its details as well as the color of the ink, we see they are in the same hand, most likely written at the same time. Of course, it proves the author knew the Arabic numerals but that is really no surprise, considering how long were those numerals known in Europe. Since they are in numerical sequence, they apparently served for numbering the lines in question, for some obscure reason. and so they are not our concern here.

Original sample, part of folio f102v2e

I also mentioned quite different case, in the article " SEARCH FOR HIDDEN NUMBERS IN THE VM" (see the picture above, folio f102v2). It required certain speculation what is hidden underneath and I went to extensive testing to see what could be seen on one of those seemingly decorative cylinders (of not-so-obvious purpose, some were suggested already). The find was first described by Jorge Stolfi who also suggested there might be some numbers hidden under the blue coloring, partially covering the brown scribblings in ink.

First attempt for deconvolution

I went through several methods of color filtering (I am using here the rather inaccurate name for intensity, color and gamma manipulation, as well as other graphical tools available to me then). Several methods show undoubtedly some spots similar to Arabic digits. Again, it raised the criticism with shallow statement that "those are only the decoration doodles" or even more ludicrous "there were not supposed to be any numbers there like if that was not what we were trying to establish first place :-).

The second attempt for deconvolution

So I searched for deconvolution filter and I got one, for forensic deconvolution (courtesy of Dr. Charles E. H. Berger) with very good results (see pictures below) which confirm the hypothesis that the scribblings underneath are indeed numbers (or maybe mixture of numbers and letters). It also verified that this particular method of deconvolution works best for separating layered colors, one applied on the top of the other, as in this case. Of course, the outlines are still jagged, as it could be expected since ether writing is very small but the coverage by the blue color was removed completely.

It would not work so well however if the top color would soak into the lower layer, creating thus composite colors of many shades, which would not be so easy to remove completely or if they were, some red-brown mix would be removed as well therefore we would be losing some shape information. Apparently the top coloring was applied some time later, long enough to prevent soaking and mixing of two different residua. It was discovered already in the past that the coloring was done after the pictures were drawn and it was therefore proposed by some, that it was done by different person. How much time passed in between the text and coloring is of course difficult to tell but that is not so important now.

More important is the question what purpose the coloring served. The original idea that it served the embellishment of the book or even for better recognition of the plants is clearly not satisfactory due to apparent negligence of the illustrator and limited number of colors that were used. It simply cannot satisfy such purpose which fact of course should any illustrator know beforehand.. Also, in the case of plants, the colors are not natural, have no shades and in some cases they clearly suggests they simply were not correct.

The results are convincing: the purpose of coloring was mainly the one of masking. Not everywhere in the VM but surely in this particular case and maybe somewhere lese as well. We can see that blue color is not fully spread over the required area but it is just barely covering the numbers. When we follow the coloring stripes from left to right, we can see the height diminishes and at the end it just covers only the numbers. The purpose is then obvious, of course we still do not know what it all means. It is rather clumsy done anyway, we can see there is something underneath and the illustrator should expect it will be noticed. It could have been even done intentionally, to stress there is something important there. On the other hand, it would be conspicuous even without coloring: it will bring readers attention anyway, except in the scale 1:1 is really too small to be easily overlooked.

The negative of second attempt for deconvolution

The symbols there look more like numbers, but to cover all cases, I will just call them "symbols". There are visibly (that means distinctly and most likely intentionally) only three gaps "a, b, c", two unquestionable ("b" and "c") while the gap "a" shows some spots which due to their indefinite shape might be just what they appear to be: only the spots. However, by eliminating those and accepting even the gap "a", we have following groups of numbers, with total in each group in bracket: I(2) - II(4) - II(5) - IV (3). Of course, if we do not accept "a" as a gap, there will be only joint group of I and II, making for 13 numbers in that group which is another possibility.

We have to realize that original size is rather small, so it would be difficult to write there anything readable without magnifying glass and very sharp pen. However, even that small, the inscription there was apparently still readable with glass, so covering by extra layer of paint was chosen. While the shown pictures are probably of the best resolution and reproduction we can get, the uncertainty is still high so even if we would know how to use such key, there is still uncertainty in some numbers. I will not go here into guessing what numbers or letters there may be, only to suggest that they may be the part of some key which - thanks to the uncertainty of some signs - will not be able to help us too much anyway. They might however serve well as a part of learning process, namely if we try to find some other connection, maybe even on the same folio.

2) Numbers expressed by physical objects.
It is certainly noticeable, that the pictures in the VM are very strange: the herbs that do not exist on this earth, the "horoscopes" of unusual construction, circles full of stars with crude suggestion of some heavenly constellations, bathing beauties and strange cylinders like the one discussed above. It was of course suspected from the every beginning that the author might have used the steganography. Here I use the modern meaning of the word, i.e. the information was hidden there indirectly, inconspicuously, not by "steganography" described say by Trithemius, where the "hiding" was done simply by encoding. The pictures of course can hide almost anything, but one has to know what to look for and where to look. Simple steganography may just use the counts of objects but more sophisticated steganography would be dealing with symbols, allegories and other indirect means.

To start, we may simply assume that there are hidden numbers in the pictures expressed via counts of objects. There are such places in the VM, namely on folio f100r (Beinecke 1006246). Two rows of plant miniatures could hardly serve any other purpose, such as categorizing or display of plant variations, since each vary so much - from its neighbors as well as from the rest - that it is impossible to tell what they could possibly have in common at all. Not only the shapes are weird but the variations sometimes do not follow the biology and it looks like they are only products of phantasy. One could even guess that the author tried to make them that way, to distinguish them only by something else: that they each have easily counted number of leaves and roots. Especially the number of roots is unnatural - if the plant has more than one root, their number is normally not by any standard, i.e. it does not matter if they are shown say 4 or 5 since it does not serve for recognition. And what's more, the numbers of leaves and roots are very different and within the whole scale from 1 to 10. The only exception is 15 leaves in one case which may only be not leaves bud buds. I am displaying the numbers in the tables under the pictures, counting plants from left to right.

Top picture

L	5	9	7	7	3	3	15	9	5	6	4
R	3	2	5	5	2	?	3	1	1	5	2

Similarly for the bottom picture below. Please also notice that each row in here as well as in the case above starts with mysterious picture of the cylinder reminding us the casing for the rolled scroll. I think that is the hint the arrangement of plants is really the key to something.

Bottom picture

L	3	10	1	2	5
R	1	1	4	3	6

There are other folios with similar "codes" such as folio f99r, f99v and 88r -, the last however with the cylinder changed to vase or what, but apparently with the same purpose. Again, we are not going to dwell here on the application of the coed - in the meantime, we can hardly guess how to use it anyway.

As an extension of the idea, it is possible we can find such keys inn the folios with the picture of individual herb only, but the key might be "encoded" differently (there are also blossoms) and each folio may be encoded by different key.

3) Numbers identified by colors.

Mathiolli's woodcut

It is also possible o that the colors in the pictures might have some meaning in regard to this point of view, at least to distinguish the different parts of the code. In that case they would not directly represent the numbers alone. Contrary to masking mentioned in point (2), it is possible that in the case of individual herbs the colors might serve also some other purpose but hardly to make the plant more recognizable (like some researchers are thinking). On the contrary, some colors (tints) are missing and with those that are present, there are no "shades" so that would make herbs even less recognizable.

Simply said, the "herbal" part of the VM is not herbal at all: the crude pictures, the lack of details, sometimes even disregard of natural shapes and apparently no solid knowledge of botanical science makes it actually anything but herbal. One cannot believe the author could draw those plants from the picked samples as is usually the case - maybe from memory, but one could hardly remember so many strange shapes by heart. Neither it is of course the way the herbals are produced. Herbals are made to be used for recognizing and identifying the plants and that is of course "the main and only" reason the pictures in herbals are drawn. Compare the beautiful woodcuts in Mathiolli's herbal at the left (published in 1554 and in Czech translation published by Hayek in 1562, here courtesy of the site http://penelope.uchicago.edu/ ).

Conclusion:There is no doubt there are more occurrences of the second type of number hiding in the VM, we just do not know what to count and where. From the shown tables, one cannot get wise either: if presented to cryptologist, it would most likely generate more questions than answers. Still, the probability the VM is encoded and the keys are right there is very high and while trying to trace the keys might be uncertain and ungrateful undertaking, there is still the possibility to apply the known cryptographical methods just to the text alone. That may be the way where we could hope for interesting news.

6th of March, 2008.