Finding
Mona
100x142 = 14200

 

But Mona Lisa must have had the highway blues.
You can tell by the way she smiles.
— Bob Dylan

 

What if someone hid a Mona Lisa in a genome?
Would we be able to find her?
Could we tell by the way she smiles?

 

We start with this tiny true-color bitmap image of the Mona Lisa, La Gioconda

This is what Tiny Lisa looks like when stretched out in a long line. 

Adding a little height so you can actually see her. 

Mona's True Colors

 

Now, this is what a genome of the E.coli bacteria looks like when represented as a true-color bit-map. It is visually quite indistinguishable from random pixels. Amazingly, almost every single pixel in this image has a unique color! 

But look closely. La nostra Bambina giocando Nascondere-e-Cercare
Our little girl is playing hide-and-seek. Can you find her?

E. coli as a bitmap

agcttttcattctgactgcaacgggcaatatgtctctgtgtggattaaaaaaagagtgtctgatagcagcttctgaactg
gttacctgccgtgagtaaattaaaattttattgacttaggtcactaaatactttaaccaatataggcatagcgcacagac
agataaaaattacagagtacacaacatccatgaaacgcattagcaccaccattaccaccaccatcaccattaccacaggt
aacggtgcgggctgacgcgtacaggaaacacagaaaaaagcccgcacctgacagtgcgggctttttttttcgaccaaagg
  

 

This gray image shows you where La nostra Bambina is hiding.
Now that you know where she is, take another look above.

Where's Mona?

 I should like to creep
Through the long brown grasses
    That are your lashes.
— Angelina Ward Grimké

This is a close-up of the image where Mona is hiding. 
You can see a bit of Mona across the center line. She's the mostly dark green pixels.

 

 
Yon strange blue city crowns a scarped steep
No mortal foot hath bloodlessly essayed: 
Dreams and illusions beacon from its keep.
But at the gate an Angel bares his blade
— Edith Wharton

Excel WorksheetA simple statistical algorithm quickly found our Lady Lisa. Just take the genome of E.coli and divide it into segments. Then take the average (arithmetic mean) of each segment and note the one that stands out. 

Anomaly!!

 


Mona Lisa, Mona Lisa, men have named you.
You're so like the lady with the mystic smile.
— Nat King Cole

Technical details.

The Escherichia coli K-12 MG1655 genome is about 4 megabases. There are four available bases; a, c, g, t, so each base takes two bits. It takes three bases to make a codon, but we will combine four bases into each 8-bit byte for a megabyte of data. There are three bytes in a true-color pixel. Tiny Mona is just 10 x 14, but it is stretched out in a line of 140 pixels. 

The global average of the E.coli genome or a random sequence is very close to 127½. With 2000 segments of length 500 each, then averaging each of the 2000 segments, we can calculate the divergence from the global average. 

  • With a random sequence, the typical maximum divergence is 11-13. 

  • With the E.coli genome, the maximum divergence is just over 17.

  • But Mona typically makes a strong signal at about 25-35 from the mean as can be seen in the "Segment Averages" graph above. 

We can test with different length segments. The Mona signal tested strong for segment lengths 25 to 2000. This graph shows a close-up with length 25 and indicates a distinct Mona anomaly. Sorta in the shape of a smile. 

Of course, there are many possible statistical methods that can be used, and modern genomic analysis stretches the limits of available computational techniques. However, it is apparent that the crudest statistical test would find even the teeny tiniest Mona Lisa hiding in an E.coli genome. 

 

50x71 = 3550
50
20x28 = 560
20

 

10x14 = 140
10
5x7 = 35
5

 

2x3 = 6
2
1x1 = 1
1

 

 



Finding Mona
Zip format ~8MB.

(Requires VBA6 which is included in Office 2000 and Excel 2000.  
Please save and extract all files to a folder before running software.)
Certified Virus-Free

©2006 Zachriel
  

Finding Mona
Program Notes


 

Zachriel's Blog
Civilization, Mutagenation and more!
http://zachriel.blogspot.com/
<%=HitCount%>

 

Artwork by Ahmet Kurt, Marcel Duchamp, and Jeff Mihaylo