___________________________________________________ ___________________________ * Note that the Original Rules are a subset of the Extended Rules. ___________ Let C = number of character-symbols in the language, called "letters". For a basic English alphabet, it's 26, but let's make it 100 to make the arithmetic a little easier. That also allows us to include numerals, spaces and other "fancy" symbols. In fact, for large populations, this will not even be a significant factor. We could just as easily make it a thousand. Consider a simple case, "dog". Let L = length("dog") = 3. (We will slightly modify our definition of length later.) POINT-MUTATIONS (P): We can do a point-mutation on any one of the three letters, or we can add a letter to either end. As there are 100 possible letters, this would be a total of (L+2)*C = 500 possible point-mutations. (We can also delete any single letter, or insert a single letter, but we will count these along with the snippets.) * We count P = (L+2)*C possible point-mutations. SNIPPETS (S): From the word "dog", we can snip three different one-letter sections "d" "o" "g", two different two-letter sections "do" "og", and just one three-letter section "dog". This forms a triangular series 1+2+. . . L = L*(L+1)/2 < half of (L+1)^2. This is the number of possible snippets from a string that might create a new string, a free-snippet. Also, when we snip out the "o", we leave "dg", which if it were valid, might also enter the general population. The number of such remainder strings is also < half of (L+1)^2. So there is an upper limit of S = (L+1)^2 new strings created by snipping. When we calculate the number of possible insertions, we'll over count somewhat and also let S = (L+1)^2. Why quibble over details? * We count S = (L+1)^2 possible snippets and remainders. INSERTIONS (I): We can insert each of these snippets (S), in four different places in the word "dog"; before the "d", before the "o", before the "g", or at the end of the word. So I = S*(L+1) = (L+1)^2 * (L+1) = (L+1)^3. * Note that insertion at the beginning or end of a word is the same as a concatenation. * Let's make another simplifying assumption. From now on, we will treat the length of a word such that L = length("dog")+2 = length(" dog ") = 5. This will increase our count somewhat, but we won't have to use L, L+1, or L+2 in different parts of the calculation. Let's give our calculation a little room to breathe. * This makes I = S * L = L^3, a nice round figure. Gee whiz. Maybe Sean Pitman is right, after all. That number does increase geometrically! MUTATIONS (M): Consider a pond filled with a large multitude of the word "dog" with mutations occurring randomly among the population. To consider a single change to a single string in our population, we will consider every possible mutation. Most such mutations will be non-viable, i.e. not valid in the English language, e.g. "dxg". However, a few will be valid and can be selected for beneficence, or meaningfulness. If the number of possible mutations (M) is orders of magnitude larger than the population of "dog", that is, if M is in the "zillions", then such a beneficial mutation will probably never occur. For " dog ", M = 5*100 + 5^2 + 5^3 = 650 possible mutations. This is clearly less than "zillions". Certainly, a reasonably large population of "dog" could evolve by these rules into "dogs" or "dig" or "cog" or "do". * By our reckoning, M < P+S+I = C*L + L^2 + L^3 * Note that for large L, the point-mutations (P) and free-snippets (S) are negligible and can be disregarded. More on this later. _______________ Now consider two words, "cat" and "dog". Create a new string for consideration (not meant to be an actual mutation, just an aid in computation), with a space at the beginning and end of each word, " cat dog " (consistent with our new definition of L). We can count each of the point-mutations on the combined string and use this to calculate the sum of the number for each of the strings separately. We can also count the number of snippets. Of course, we will get a few snippets which include parts of both individual words, so our count will be high. Depending on the number of words and the length of the words, our count might be way high. But that's ok. Why quibble over a few orders of magnitude here or there? L = 10 ________________ Now, consider a collections of many words, with a total length of 1000 letters, including the extra spaces as separators. L=1000 M = P + S + I = 10^5 + 10^6 + 10^9 = ~10^9 * Note that the point-mutations and free-snippets are negligible for large numbers. Considering all our simplifying assumptions, for a large population we can treat M = L^3. For a thousand letters, the total possible mutations is 10^9, which is many orders of magnitude less than "zillions" The line of verse "Beware a war of words, Sean Pitman, ere you err." was evolved in a space of about 1000 letters, including many words that are not needed for the final derivation. You can see this in "A Pond of Doggerel". We could even optimize our process to fit in a smaller pond. More on this in "Malthus' Catastrophe". Now, for the entire poem, "Beware a War of Words", the total length of every phrase, word, space and comma in the entire project is less than 5000. 5000^3 is 10^11, still much less than our pond-size of 10^14. You can find the complete evolution at the beginning of the thread. O Sean Pitman * Quo erat demonstratum. |
Mutagenation Home |
Sea of Beneficence | Beware a War of Words | A Pond of Doggerel | Contact |
Zachriel's Word Mutagenation brought to you by |
Hosted By Crown Mall and Designed by Web King. |