Information+threshold

Prev: Quasi-species theory Next: Hypercycles


 * TODO List**
 * REF to Eigen Schuster

=Information threshold=

RNA is a polymer, and hence mutations at many locations in the sequence are possible. Hence, //Q//, the quality of replication per genotype can be calculated as //q// L, where //q// is the quality of replication per residue and //L// is the length of the sequence (measured in number of base pairs/residues). Now, given that //q// is fixed, limitations on //Q// (as given by the error threshold) also put constraints on the length of the sequence //L//.

For //q// close to 1, we can use the following identity:

//q L = e ( -L*(1-q) ) //; where //e// is the natural exponent 2.71828....

Then, substituting this into the expression we found for the error threshold yields:

//Q > 1 / σ => e ( -L*(1-q) ) > 1// / //σ// Taking natural logarithms on both sides: //-L*(1-q) > -ln(σ)//

which finally gives:

//L < ln(σ) / (1-q)//

The very interesting thing about this equation is that it predicts a limit in sequence length, above which information cannot be maintained because the master sequence can no longer be maintained in the population. In other words: for a given error rate per base pair only a limited amount of information can be stored in a replicator. This is the __information threshold__. From simulations (Takeuchi and Hogeweg, 2007), it appears that RNA strings larger than +/- 50 bases are not under the error threshold anymore. Hence, for longer RNA strings another molecule (enzyme/ribozyme) would be needed to improve the copying fidelity (increase //q//). It has been calculated that the information threshold occurs already for a sequence length of 50 and a 95% replication efficiency and a very strong selection coefficient, so we should definitely expect it to be occurring!

The dynamics before and after crossing the error threshold are quite different (see figure). The main way in which we can recognize that the system has not yet crossed the error threshold is that the common ancestor of all individuals is the master sequence. Hence, all other sequences are only present in the population because they are mutants of the master sequence, and this is represented in the consensus sequence of all sequences in the population (which is still the master sequence). Note that although the master sequence is the common ancestor of all sequences present, it is not necessarily in the majority. As we get closer to the error threshold, the mutants greatly outnumber the master sequence.

Three important remarks at this stage: 1) Quasispecies theory is defined on infinite populations. Therefore, the error threshold is a deterministic effect in contrast to [|Muller's ratchet] (accumulating errors in a population, resulting in the death of the population), 2) The replicator equations have the nice feature of combining evolutionary and ecological time scales, and 3) By writing down simple differential equations we come to two fundamental concepts: error threshold and quasispecies. This turns Darwinian selection into a theory, away from a tautology; now it can be false.

Importance of the information threshold
But is the information threshold really a problem for evolution? The most likely answer to this question is //yes//, since we have seen that the information threshold already becomes problematic for relatively short RNA sequences (see above). Furthermore, by looking at genome size and mutation rates we get a good indication of its consequences: Lynch (2010) has shown that there is a negative correlation between genome size and base substitution mutation rate (Lynch, 2010).

So at this stage we could conclude that an RNA world would lead to some evolutionary optimization but only up to a certain point (genome size). From there on it looks like we need new innovations to go further and data is compatible with that notion. Also, it is important to note that we have still used a best-case scenario in the quasispecies model (i.e. to prove there is a problem) to show there is a limitation of the power of Darwinian selection:
 * 1) We have used an __infinite population__, meaning that:
 * Everyone is already there and therefore the best quasispecies is always selected
 * There are no stochastic population dynamics, meaning that even if something is present in very low concentration it doesn't go extinct (attofoxes!)
 * 1) We use __strong selection and a single peaked fitness landscape__:
 * There is a sharp transition in fitness. If the landscape is different, delocalization of the quasispecies still occurs but the transition is less sharp (see Takeuchi, 2007)
 * 1) We use a __fixed sequence length and no other constraints on length__:
 * Should expect further negative energetic selection on sequence length

Moreover, this analysis doesn't address how we get longer sequences in evolution, although it does allow us to focus on length alone (i.e. it is strongly a best-case scenario to prove we have a problem). So we definitely have a major problem, as unearthed by this simple model. It is also called __Eigen's paradox__: To store more information we need better replication, but to get better replication we need to store more information. Nonetheless we are here, so the problem must have been solved. So, how to cross the information threshold?

Next: Hypercycles


 * References**


 * Takeuchi N and Hogeweg P**, Error-threshold exists in fitness landscapes with lethal mutants. BMC Evol Biol (2007).
 * Lynch M**, Evolution of the mutation rate. Trends Genet (2010).