Technical details.
The Escherichia
coli K-12 MG1655 genome is about 4 megabases. There are four
available bases; a, c, g, t, so each base takes two bits. It takes three
bases to make a codon, but we will combine four bases into each
8-bit byte for a megabyte of data. There are three bytes in a
true-color pixel. Tiny Mona is just 10 x 14, but it is stretched
out in a line of 140 pixels.
The global
average of the E.coli genome or a random sequence is
very close to 127˝. With 2000 segments of length 500 each, then
averaging each of the 2000 segments, we can
calculate the divergence from the global average.
-
With a random sequence, the typical
maximum divergence is 11-13.
-
With the E.coli genome, the maximum
divergence is just over 17.
-
But Mona typically makes a strong signal
at about 25-35 from the mean as can be seen in the
"Segment Averages" graph
above.
We
can test with different length segments. The Mona signal tested strong for segment lengths
25 to 2000. This graph shows a
close-up with length 25 and indicates a distinct Mona
anomaly. Sorta in the shape of a smile.
Of course, there are many possible
statistical methods that can be used, and modern genomic analysis
stretches the limits of available computational
techniques. However, it is apparent that the crudest statistical
test would find even the teeny tiniest Mona Lisa hiding in an
E.coli genome.
|