networks

Prev: Multiple coding

TODO List
 * REFS: power law and overrepresentation motifs
 * REF: in the above study (REF) expected number of motifs is calculated using a network randomization procedure that preserves the number of in- and out-connections per gene.
 * REF:deletion and innovation is a power law, as shown by [|Herbert Simon] (1955 REF).
 * REF: Kashtan et al (2004 REF) studied multi-node systems and was able to classify several different types of networks
 * CLARIFY: how different systems produce different netorks
 * REF: In an earlier study of FFLs, Wagner and Teichmann (REF) suggest that FFLs cannot arise through duplication processes and moreover suggested that their evolutionary origin lay in their potential function in regulation.
 * REF: Alon

=Gene Regulation Networks=

Gene regulation networks are an evolved coding for cellular coding. At present we can directly measure gene regulation networks, and this has been done for yeast, based on transcripton factor binding and [|chip-chip], where DNA binding presumably indicates gene regulation. For a given binding threshold we can then reconstruct gene regulation networks. Generally the networks that are reconstructed tend to be high-dimensional and complex. So are networks just high-dimensional mess or:
 * 1) Can we make sense of networks? What kind of structure do we see?
 * 2) If there are patterns, how can be explain them: what are their consequences in terms of cellular function, and how does evolution produce them?

From several studies (REFs) that have addressed these issues it has become apparent that there are two major patterns, or structures, that can be detected in the yeast network, namely: The **power law distribution** of connections observed at the macroscale means that there //many nodes with only a single, or few, connections//, and just a //few nodes with many connections//, the so-called **hubs**. This means that the probability of finding a node with k connections, P(k), follows something like P(k) ~ k^-a, where a is what is called the scaling exponent, i.e. the number of nodes or the number of connections in the network do not show up and are not important in the expression for the distrbution of P(k). P(k) is therefore independent of the size of the system and can be refered to as **scale-free** since it remains invariant under rescaling of some its properties. The biological meaning of these properties is however not clear.
 * **[|Power law] distribution of connections** in the network (macroscale)
 * There is an **overrepresentation** of certain **network [|motifs]**, i.e. specific cases of small circuits (microscale)

The type of motif that is **overrepresented** at the microscale tends to be the **[|feed forward] loop**, i.e. this motif appears much more often than would be expected from certain randomized networks and has the form A->B, A->C->B. The issue of //overrepresentation// however immediately raises a few questions:
 * 1) What is meant with **expected?**
 * 2) How did such overrepresentation evolve?

Of these, the first question is crucial since we need it in order to answer the second.

In the above study (REF) expected number of motifs is calculated using a network randomization procedure that preserves the number of in- and out-connections per gene. The actual number of motifs found in real networks are then compared with this, however it is difficult to know what is the //appropriate// randomization procedure. Moreover, even if motifs can be shown to be overrepresented relative to an appropriate randomization procedure, it is unclear what the evolutionary implications are. Is there an actual evolutionary driving force leading to overrepresentation, or can it be a side-effect of the evolutionary process.

[| Cordero & Hogeweg (2006)] studied this issue using a simulation model which incorporated reasonable assumptions of gene duplication and deletion and mutations. Genes were assumed to have binding sites and code for transcription factors which bind to the binding sites of other genes and so regulate them. Genomes were therefore envisaged as //bags of genes//, a commonly used incorrect assumption, but one that suffices here. The model includes no selection or population, but merely studies the baseline expectation that arises through the mutational process.



The results of this study show that the evolutionary process leads to genome structure in that absence of any selective process! At the macro-level the process produces networks with similar distributions of connections, i.e. power law, as observed in biological genomes. The fact that the mutational process produces a power law distribution can be understood if we think that the limit distribution of the underlying mutational protocol of duplication, deletion and innovation is a power law, as shown by [|Herbert Simon] (1955 REF). Actually how this happens is more complicated since events occur at the level of genes and binding sites and the proportion of regulators in the network plays an important role. At a micro-level the model also shows an over-representation of network motifs, namely the feed-forwards loops (FFLs). In fact in time the model shows sudden increases in FFL numbers, or FFL //[|avalanches]//. This process can be understood by considering how duplication events affect motifs. Given that we get a power-law distribution of connections, this means that the networks has a few hubs. If such hubs become duplicated this creates two hubs. If then a connection evolves between the hubs that automatically generates as many FFLs as the connections that the hubs make to other nodes. In this way we obtain a mechanism through which FFLs can easily be generated as a side-effect of the duplication process and therewith become overrepresented. //However, is this the process that is responsible for generating overrepresentation in biological networks?// By looking at yeast cell cycle genes it was found that many FFLs were formed by important duplicated hub genes, suggesting that in yeast at least FFL abundance is partly due to hub duplication.

So to what extent is this a specific example for a more general phenomenon? Kashtan et al (2004 REF) studied multi-node systems and was able to classify several different types of networks: These different structured networks can differ in their higher order properties (MORE DETAIL NEEDED!)
 * multi-input networks
 * multi-intermediate networks (ecological?)
 * multi-output (i.e. like transcription networks)

Evolution and side-effects
In an earlier study of FFLs, Wagner and Teichmann (REF) suggest that FFLs cannot arise through duplication processes and moreover suggested that their evolutionary origin lay in their potential function in regulation. However, the study by Cordero and Hogeweg ([|2006] ) shows that FFLs and their over-representation can arise as a side-effect of the evolutionary process. This raises questions:
 * 1) Are FFLs really overrepresented?, i.e. given the evolutionary process is the randomization procedure appropriate?
 * 2) If FFL arise without a selection pressure, is there no selection on them?

As mentioned above (REF), the FFL over-representation was calculated by considering how many FFLs one would expect given a randomization procedure of the network, but keeping some factors constant such as the total number of connections, and in- and out-degree. Swapping connections, but keeping in- and out-degree the same per node generates a distribution of the number of FFLs and if a given network falls within a 5% tail of this then it can be said to be statistically significantly over-representing FFLs.

The evolutionary process, however, does not maintain in- and out-degree, therefore the over-representation of FFLs may not be constant in time. Moreover the average in- and out-degree of regulators is lower than average in-degree. The number of FFLs also doesn't necessarily say anything about over-representation and perhaps over-representation is less than would be expected from jumps in motif numbers. The point is that in the randomization test a lot is conserved, but in the mutational dynamics everything changes at some point. There is therefore an increase in FFLs with increased connectivity: average in-degree regulators of regulators is a lot less than of other nodes, which reduces over-representation. Therefore although increases in FFLs are huge, over-representation may only be moderate. In yeast one could therefore say that FFLs are overrepresented given the profile of in- and out-degree, but only because regulators have lower in-degree and not because there are so many FFLs. One would still have as many FFLs with other network structures without having any over-representation. (NOT CLEAR!).

The mutational process therefore results in neutral mutational dynamics, however we know there is selection and the dynamical properties of FFLs are not arbitrary. We could therefore imagine that there could be some selective pressure on them. However we now know that they are not overrepresented because of their properties, but because they arise as a side-effect of the mutational process. One final point to ponder however, is that the mutational process discussed here generates OR gates or functions. In the study by Alon (REF) however, it is AND gates which are of interest and those are much harder to get!

To conclude we can state that both in the yeast gene network and in the simulated "neutral" networks we find power law distributed gene connections, an over-representation of FFLs, low in-degree for regulators or hubs, and a specific structure in which FFLs occur (i.e. duplicated hubs). Moreover we have seen that the randomization procedure of random swapping of connections leads to coupled in- and out-degree which may not be particularly relevant. If one would use another arrangement, keeping in- and out-degree within a given spectrum but where in- and out-degree are not coupled one would not obtain any over-representation.