Word Mutagenation! Zachriel's Word Mutation and Evolution Experiment

___________________________________________________
STATISTICALLY IMPOSSIBLE . . . ZILLIONS OF YEARS

"This is a great story [Sea of Beneficence], but sadly, it is statistically impossible. . . . The problem here is that there simply is not enough time this size of zillions of years to get the limited number of phrases to "bump together" enough times to make anything beyond the lowest levels of functional complexity without the input of a higher intelligence or pre-established information system. It just won't happen. Try it and see."

We will take Dr. Pitman's advice and will try it and see. Meanwhile, here are some other famous Sean Pitman posts using the technical term "zillions".
http://tinyurl.com/2aokz

___________________________
The LIMIT of CATS and DOGS

The basic thrust in this calculation will be to set an upper-limit to the number of possible mutations per the rules of our game. We will use some simplifying assumptions, but suffice it to say that our estimate will be many orders of magnitude larger than the actual number of possible mutations.

* Note that the Original Rules are a subset of the Extended Rules.

___________
JUST a DOG

Let C = number of character-symbols in the language, called "letters". For a basic English alphabet, it's 26, but let's make it 100 to make the arithmetic a little easier. That also allows us to include numerals, spaces and other "fancy" symbols. In fact, for large populations, this will not even be a significant factor. We could just as easily make it a thousand.

Consider a simple case, "dog". Let L = length("dog") = 3. (We will slightly modify our definition of length later.)

POINT-MUTATIONS (P): We can do a point-mutation on any one of the three letters, or we can add a letter to either end. As there are 100 possible letters, this would be a total of (L+2)*C = 500 possible point-mutations. (We can also delete any single letter, or insert a single letter, but we will count these along with the snippets.)

* We count P = (L+2)*C possible point-mutations.

SNIPPETS (S): From the word "dog", we can snip three different one-letter sections "d" "o" "g", two different two-letter sections "do" "og", and just one three-letter section "dog". This forms a triangular series 1+2+. . Nice Doggy . L = L*(L+1)/2 < half of (L+1)^2. This is the number of possible snippets from a string that might create a new string, a free-snippet. Also, when we snip out the "o", we leave "dg", which if it were valid, might also enter the general population. The number of such remainder strings is also < half of (L+1)^2. So there is an upper limit of S = (L+1)^2 new strings created by snipping. When we calculate the number of possible insertions, we'll over count somewhat and also let S = (L+1)^2. Why quibble over details?

* We count S = (L+1)^2 possible snippets and remainders.

INSERTIONS (I): We can insert each of these snippets (S), in four different places in the word "dog"; before the "d", before the "o", before the "g", or at the end of the word. So I = S*(L+1) = (L+1)^2 * (L+1) = (L+1)^3.

* Note that insertion at the beginning or end of a word is the same as a concatenation.

* Let's make another simplifying assumption. From now on, we will treat the length of a word such that L = length("dog")+2 = length(" dog ") = 5. This will increase our count somewhat, but we won't have to use L, L+1, or L+2 in different parts of the calculation. Let's give our calculation a little room to breathe.

* This makes I = S * L = L^3, a nice round figure. Gee whiz. Maybe Sean Pitman is right, after all. That number does increase geometrically!

MUTATIONS (M): Consider a pond filled with a large multitude of the word "dog" with mutations occurring randomly among the population. To consider a single change to a single string in our population, we will consider every possible mutation. Most such mutations will be non-viable, i.e. not valid in the English language, e.g. "dxg". However, a few will be valid and can be selected for beneficence, or meaningfulness. If the number of possible mutations (M) is orders of magnitude larger than the population of "dog", that is, if M is in the "zillions", then such a beneficial mutation will probably never occur. For " dog ", M = 5*100 + 5^2 + 5^3 = 650 possible mutations. This is clearly less than "zillions". Certainly, a reasonably large population of "dog" could evolve by these rules into "dogs" or "dig" or "cog" or "do".

* By our reckoning, M < P+S+I = C*L + L^2 + L^3

* Note that for large L, the point-mutations (P) and free-snippets (S) are negligible and can be disregarded. More on this later.

_______________
CATS and DOGS

Now consider two words, "cat" and "dog". Create a new string for consideration (not meant to be an actual mutation, just an aid in computation), with a space at the beginning and end of each word, " cat dog " (consistent with our new definition of L).

We can count each of the point-mutations on the combined string and use this to calculate the sum of the number for each of the strings separately. We can also count the number of snippets. Of course, we will get a few snippets which include parts of both individual words, so our count will be high. Depending on the number of words and the length of the words, our count might be way high. But that's ok. Kitty, Kitty, Kitty Why quibble over a few orders of magnitude here or there?

L = 10
P = C*L = 100*10 = 10^3
S = L^2 = 10^2
I = L^3 = 10^3
M = P+S+I = 10^3 + 10^2 + 10^3 = 2100.

Now, our Creationist claims that anything over seven-letters has millions of possible permutations. This is clearly incorrect. Our upper-limit M for 20-letters (which might be a bunch of small words, a few larger ones, or a mixture of phrases and words) is 8,000. Most of these, as our Creationist correctly points out, are not valid words or phrases, and can be automatically excluded from the next generation.

________________
The MENAGERIE

Now, consider a collections of many words, with a total length of 1000 letters, including the extra spaces as separators. Mama!

L=1000
P = L*C = 1000*100 = 10^5
S = L^2 = 10^6
I = L^3 = 10^9

M = P + S + I = 10^5 + 10^6 + 10^9 = ~10^9

* Note that the point-mutations and free-snippets are negligible for large numbers. Considering all our simplifying assumptions, for a large population we can treat M = L^3. For a thousand letters, the total possible mutations is 10^9, which is many orders of magnitude less than "zillions"

The line of verse "Beware a war of words, Sean Pitman, ere you err." was evolved in a space of about 1000 letters, including many words that are not needed for the final derivation. You can see this in "A Pond of Doggerel". We could even optimize our process to fit in a smaller pond. More on this in "Malthus' Catastrophe".

Now, for the entire poem, "Beware a War of Words", the total length of every phrase, word, space and comma in the entire project is less than 5000. 5000^3 is 10^11, still much less than our pond-size of 10^14. You can find the complete evolution at the beginning of the thread.

O Sean Pitman
http://tinyurl.com/2rw58

* Quo erat demonstratum.



*Hosted By Crown Mall and Designed by Web King.*