What's in a word? Several nucleotides, some researchers might
say. By applying statistical methods developed by linguists, investigators have
found that "junk" parts of the genomes of many organisms may be expressing a
language. These regions traditionally been regarded as 'useless' accumulations
of material from millions of years of evolution.
'The
feeling is,' says Boston University physicist Eugene Stanley, 'that
there's something going on in the non-coding region.'
Junk
DNA got its name because the nucleotides there (the fundamental pieces of
DNA, combined into so-called base pairs) do not encode instructions for making
proteins, the basis for life. In fact, the vast majority of genetic material in
organisms from bacteria to mammals consists of non-coding DNA segments, which
are interspersed with the coding parts. In humans, about 97 percent of the
genome is junk. Over the past 10 years biologists began to suspect that this
feature is not entirely trivial.
"It's
unlikely that every base pair in non-coding DNA is critical, but it is also
foolish to say that all of it is junk" notes Robert Tjian, a biochemist
at the University of California at Berkeley.
For
instance, studies have found that mutations in certain parts of the non-coding
regions lead to cancer. Physicists backed the suspicions a few years ago, when
those studying fractals noticed certain patterns in junk DNA. They found that
non-coding sequences display what are termed long-range correlations. That is,
the position of a nucleotide depends to some extent on the placement of other
nucleotides.
Their
patterns follow a fractal-like property called 1/f noise, which is inherent in
many physical systems that evolve over time, such as electronic circuits,
periodicity of earthquakes and even traffic patterns. In the genome, however,
the long-range correlations held only for the non-coding sequences; the coding
parts exhibited an uncorrelated pattern. Those signs suggested that junk DNA
might contain some kind of organized information. To decipher the message,
Stanley and his colleagues Rosario N. Mantegna, Sergey V. Buldyrev
and Shlomo Haviin collaborated with Amy L Goldberg, Chung-Kang
Peng and Michael Simons of Harvard Medical School.
They
borrowed from the work of linguist George K. Zipf who by looking at texts
from several languages ranked the frequency with which words occur. Plotting the
rank of words against those in a text produces a distinct relation. The most
common word "the" in English occurs 10 times, than the 10th most common word,
100 times more often than the 100th most common, and so forth. The researchers
tested the relation on 40 DNA sequences of species ranging from viruses to
humans.
They
then grouped pairs of nucleotides to create words between three and eight pairs
long (it takes three pairs to specify an amino acid). In every case, they found
that non-coding regions followed the Zipf relation more closely than did coding
regions, suggesting that junk DNA follows the structure of languages.
"We
didn't expect the coding DNA to obey Zipf," Stanley notes. "A code literal one
if by land, two if by sea."
You
can't have any mistakes in a code. Language, in contrast, is a statistical,
structured system with built-in redundancies. A few mumbled words or scattered
typos usually do not render a sentence incomprehensible.
In
fact, the workers tested this notion of repetition by applying a second
analysis, this time from information theorist Claude E Shanon who in the
1950s quantified redundancies in languages. They found that junk DNA contains
three to four times the redundancies of coding segments. Because of the
statistical nature of the results, the researchers admit their findings are
unlikely to help biologists identify functional aspects of junk DNA. Rather the
work may indicate something about efficient information storage.
"There
has to be some sort of hierarchical arrangement of the information to allow one
to use it in an efficient fashion and to have some adaptability and
flexibility," Goldberger observes.
Another
speculation is quences may be essential to the way DNA has to fold to fit into
the nucleus.
Some researchers question whether the group has found anything significant. One of those is Beniot Mandelbrot of Yale University. In the 1950s the mathematician pointed out that Zipf's law is a statistical numbers game that has little to do with recognizable language features, such as semantics. Moreover, he claims the group made several errors.
Some researchers question whether the group has found anything significant. One of those is Beniot Mandelbrot of Yale University. In the 1950s the mathematician pointed out that Zipf's law is a statistical numbers game that has little to do with recognizable language features, such as semantics. Moreover, he claims the group made several errors.
'Their
evidence does not establish Zipf's law even remotely.' he says.
But
such criticisms are not stopping the Boston workers from trying to deciphers
junk DNA's tongue.
'It
could be a dead language,' Stanley says, 'but the search will be exciting.'
No comments:
Post a Comment