Phrasenation! from the makers of Word Mutagenation
Phrase Mutation and Evolution Experiment
And it takes less than "zillions of years"!

 If you are unfamiliar with Word Mutagenation
you can find out more about it here:
Word Mutagenation--Now with Phrasenation!

The meaning of words
Words, Words, Words

As a general principle, we could define a word found in the dictionary as more meaningful than a jumble of letters; so "king" is more meaningful than "kxjz". Two words forming a valid phrase are, of course, more meaningful than a single word; so "the king" is more meaningful than simply "king". Consequently, a single word with a leading or trailing space is more meaningful than just a single word as it implies a connection to another word; so " king" is more meaningful than simply "king". The addition of special symbols can also add meaning; so "the king!" is more meaningful, and of a somewhat different meaning, than "the king". 

Alas, poor Yorick!
Some valid Phrasenations:
"king" " king" "the king" "the king!"
Now cracks a noble heart. Good night, sweet prince,

Though this be madness, yet there is method in ’t.

So how do we determine, for the purposes of Phrasenation, what constitutes a valid word or phrase? Well, we will take a miniscule sample of English literature, and compare our phrase to that sample! If the sample is found, then it will be considered a valid phrase. One proviso: All words must be complete. No half words.

 (As a technical matter, we will count certain symbols, such as "!" and "?" as separate words.) 

Our Phrase Book
Within the book and volume of my brain.

For our Phrase Book, we will use The Tragedy of Hamlet, Prince of Denmark by William Shakespeare. I'm sure most everyone can agree that just about anything the Bard said is in some sense meaningful. After all, he practically invented modern English single-handedly! Keep in mind that this will exclude the vast majority of valid phrases, including even most of Shakespeare. However, you can add phrases to the Phrase Book, if you choose. 

Phrases not Included
O happy dagger
a tale told by an idiot
the quality of mercy

Finally, all valid phrasenations are ranked by numbers of characters. Longer is "better". 

Welcome to Elsinore

Foul deeds will rise, Though all the earth o’erwhelm them, to men’s eyes.

If circumstances lead me, I will find
Where truth is hid.

When generating a phrasagen (a mutant phrase), we will use random mutation and recombination. Starting from just two words, "the" and "question", some not-so-valid phrasagens might look like these:

Some Invalid Phrasenations
Point Snip Remainder
quextion ues qtion
Exchange Insertion Complex
thestion quethstion quthon
Find out the cause of this effect,
Or rather say, the cause of this defect,
For this effect defective comes by cause.

Phrasenation allows one to adjust the relative frequency of each type of mutation. Once having generated a phrasagen, we must compare it to our Phrase Book in order to determine its meaningfulness. If it is not found, we will ruthlessly eliminate it. To be honest, as this world goes, is to be one man pick'd out of ten thousand.

Indexing the Bard
Yea, from the table of my memory
I’ll wipe away all trivial fond records.

To make this a practical matter, we must index every valid phrase in our Phrase Book. But it isn't enough to index just the first word in a phrase. We must index every single word. 

Consider that each of these are valid phrases:
to be, or not to be be, or not
or not to be not to be

But what if the first words are the same? Well, then we will compare the word that follows, and if necessary each succeeding word until we find a word which is different. For instance, "to be" is not a unique phrase, as it could be found as "to be, or not" or as "to be- that is the question". 

In fact, the phrase "to be" shows up 34 times as first words, including "to be your valentine", but "to be," (note the comma) only once, in "to be, or not to be".  

The index includes 36,176 words, including symbols. 

I knew him, Horatio: a fellow of infinite jest, of most excellent fancy: he hath borne me on his back a thousand times;

Something is rotten in the state of Denmark.

 A Most Interesting Result
O day and night, but this is wondrous strange!

In my mind’s eye, Horatio.When testing the index, the Phrasenator outputted every indexed word followed by a specified number of words. Then the Phrasenator counted the number of unique phrases. For one word phrases, there were 4,801 unique phrases. But what about other numbers of words? 

For large numbers of words, the answer is surely 36,176, but what is a "large" number? Somewhat surprisingly, if you select any four words in series, the vast majority will constitute a unique phrase! And for that small percentage which are not unique, those are nearly all purposefully repeated phrases, such as "a pit of clay for to be made" from the singing Clown's refrain, 

A pickaxe and a spade, a spade,
For and a shrouding sheet;
O, a pit of clay for to be made
For such a guest is meet.

Doubt thou the stars are fire;
  Doubt that the sun doth move;
Doubt truth to be a liar;
  But never doubt I love.
 Length Number
% of
Average Letters
per Word
         1    4,801 13.27%    6.4  6.4
         2  20,995 58.04%    9.2  4.6
         3  32,490 89.81%  12.6  4.2
         4  35,317 97.63%  16.7  4.2
         5  35,930 99.32%  21.0  4.2
         6  36,090 99.76%  25.4  4.2
         7  36,145 99.91%  29.7  4.2
         8  36,158 99.95%  34.0  4.3
         9  36,162 99.96%  38.5  4.3
       10  36,165 99.97%  42.8  4.3
* Symbols are counted as words.

All that lives must die, Passing through nature to eternity.Definitions:
You must translate; 'tis fit we understand them.

  • L = Length of phrase or collection of phrases.

  • N = Number of valid phrases of length L. N is generally difficult to define. In Phrasenation, we explicitly define it as the contents of The Tragedy of Hamlet, Prince of Denmark, by William Shakespeare.

  • M = Total number of Mutations, the number of Phrasagens whether valid or not.

  • C = Number of allowable characters (letters or symbols).

  • C^L = Sequence space. Number of possible combinations of letters in a string of length L.

  • (C^L)/N = Pitman's Number, the number of possible combinations of characters divided by the number of valid phrases. Zillions and zillions.

  • N/(C^L) = Pitman's Ratio, the probability of evolving a valid sequence through Pitman's Random Walk across the Neutral Gap

Rest, rest, perturbed spirit!

With "O Sean Pitman", we introduced our modest project with simple concatenation and point mutation. Then, in accordance with Dr. Pitman's wishes, we added Insertion. Now we add Exchange and Complex Recombination. Each of these categories are approximately related to powers of L.

A dream itself is but a shadow.That he is mad, ’t is true: ’t is true ’t is pity;
And pity ’t is ’t is true.

  • L^0 Concatenation: There are only two ways to join two strings end-to-end. The number of ways to choose is not dependent on L.

  • L^1 Point Mutation: There are L+2 possible locations for each such mutation, including adding one at either end. The number of ways to choose is proportional to L.

  • L^2 Exchange: Exchanging same-length snippets between strings. We must choose the beginning and end of snippet to be exchanged.

  • L^3 Insertion: Inserting snippets from one string into another string. We must choose the beginning and ending of each snippet, plus the insertion point.

  • L^4 Complex: Exchanging snippets of possibly different lengths between strings. We must choose the beginning and ending of the snippet, and the beginning and ending of the section to be replaced. 

Pitman's Assertions
There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

Dr. Pitman's assertions relate to the behavior of Phrasenation as L increases. 

Pitman's Vastness Assertion
Pitman's Ratio implies the Neutral Gap
N/(C^L) --> 0 

Pitman's Principle of Zillions
Pitman's Number approaches Zillions and Zillions
M --> (C^L)/N

As L increases, Dr. Pitman claims that the ratio of valid phrases to the totality of sequence space, approaches zero. Valid sequences get lost in the vastness of sequence space. He concludes that it is impossible to evolve sequences beyond the "lowest level of complexity". However, Dr. Pitman has failed to provide a method of calculating N, much less a map of how valid sequences are distributed in sequence space. Generally, any collection of valid phrases and sentences have some validity. Language can ramble somewhat and still be valid. We could start by talking about Pitman's handwaving and suddenly, for no particular reason, change the subject to ghosts and the murder of kings. Imagine that!


 and now, how abhorred in my imagination it is!Results Matter
More matter, with less art.

However, even using a tiny sliver of the English language—just one play by one playwright—it can be shown that phrases of substantial length can be easily evolved. 

Our experiments have shown that words and phrases appear to have some underlying connection related to their own evolution. As such, words and phrases make ideal subjects for evolutionary algorithms. 

Beware a War of Words, Sean Pitman, 
Ere you err.
You can join the discussion on Phrasenation at The thread can be found here


Word Mutagenation--Now with Phrasenation!

Word Mutagenation
Zip format ~2MB.

Zip format ~2MB.

Zip format ~2MB.

(Requires VBA6 which is included in Office 2000 and Excel 2000.  
Please extract all files to a folder before running software.)
Certified Virus-Free

©2004 Zachriel

Tired of letting Shakespeare have all the fun?
Coming Soon! Phrasomatic!!


Zachriel's Phrasenation brought to you by 
Moods in Music - MIDI Music by Lee Croteau
Moods in Music
All music written, performed and copyrighted by Lee Croteau.
All rights reserved. ©1995-2004  


Great Place to Shop

Web Presence Provider

Hosted By Crown Mall and Designed by Web King.