boolean

Prev: Mean field assumption Next: Multilevel CA: emergence


 * TODO List**
 * REF to double knock-outs being invisible
 * REF Hogeweg showing
 * only a small fraction of nodes tends to be functional
 * proportion of XOR functions if all non-forcing gives chaotic behavoiur
 * REF Tyson
 * ADD: Encode network and the importance of self-loops
 * ADD: neural networks and pattern recognition

=[|Boolean Networks]=

(**Concepts**: multiple attractors, multiple causes, "how special is a specific case?", forcing structures, domains of attraction)

Before further considering the behaviour of CAs we make a small digression and discuss Boolean networks. This digression is useful considering the type of analysis conducted by [|Kauffman] ([|1969]) on random Boolean networks as a paradigm for gene regulation. In principle such networks are like **//binary// CAs** but with very specific IO systems whereby each node has its own transition rule. In special cases one can make a direct mapping from a Boolean network to a spatial CA, if we can structure the nodes such that they are "spatially arranged" (i.e. such that they need input from their neighbours only). The different transition rules can be incorporated in different states of cells. One example is a 10-position 1D CA with a rule layer and a state layer with different rules for specific positions (see Fig below).



Boolean network models are very useful with respect to understanding the gene-regulation networks that we are unraveling from data. For instance, we now have the "full" transcription network of yeast. This looks like a complex and messy network when visualized. Moreover, what we have is a very specific case, possibly with specific properties, and this raises several questions: > > In order to make some sense of this we can compare the observed specific network to a whole class of random networks, of which the behaviour has been characterized. Kauffman's approach to studying properties of gene regulation networks was based on studying the properties of random networks, i.e. what can be learned from the generation of random boolean networks, what type of behaviour is generic in such systems and how is it affected by changes in parameters? The behaviour of such a system shows that one can find a ttractors and different basins of attraction. Very generally his results can be condensed into his finding:
 * **How does it behave?**
 * **How special is it?**
 * **How did it evolve to be the way it is?**
 * **multiple attractors** (e.g. cell types): There are multiple domains of attraction (all leading to a different attractor).
 * **multiple causes** (vs THE cause and THE effect) (see [|Huang and Ingberg 2000]): There are multiple ways to get to a certain attractor.
 * **robustness**: A small change to the network (e.g. deletion of a node, "knock out") does not majorly change the network behaviour.

This means that random networks often harbour multiple attractors each fed by domains of attraction. Moreover the same signal can lead to different outcomes dependent on the state of the network, //and// several pathways can lead to the same attractor (cell state). In Huang and Ingberg ([|2000]) the role of gene expression and the resulting signals is studied with respect to outcomes of cell states and are found to differ leading either to [|quiescence], [|differentiation], [|proliferation] or [|apoptosis]. In this study, single signals are seen to interact with 4 genes allowing for many outcomes. In another example, Huang et al ([|2005]), show two alternative routes of neutrophil differentiation in a high dimensional state space: 2773 dimensions (nodes), n 2773 states! This example shows that there are two completely different transients (regarding gene expression patterns) that lead to the same attractor: a differentiated neutrophil. This goes some way to giving us insight or guiding our intuition in confounding experimental results. Since we know the generic properties of Boolean networks, we can have an expectation of different attractors and domains of attraction, i.e. different signals lead to the same result and vice versa.



//An important realization is that by studying random Boolean networks we can get an idea of generic properties of such systems, which allow us to have a conceptual advantage when dealing, for instance, with gene networks.//

//Forcing structures//
Also apparent from Kauffman's work is that some types of [|Boolean functions] have the property that they are insensitive to multiple inputs in the sense that a single input is sufficient to lead to an output. Such functions can be considered to be **forcing** in the sense that a single input forces the node to produce an output. One such function is the **OR gate or function**. Namely, if one of the input nodes is 1 ("on"), the output will be 1 regardless of the value of the second input node. Because such forcing functions are insensitive to several inputs, these structures lead to **redundancy** and **robustness**, i.e. not all input information is needed for the output and mutations (e.g. losing an input node) might not change the final result.

Interestingly, nearly 80% of [|yeast] genes can be knocked-out without any observable effects and even double knock-outs can be invisible (REF?). Does this mean that yeast gene networks contain many forcing functions with are insensitive to changes via knock-outs? Hogeweg ([|2000]?) showed that in random Boolean networks only a small fraction of nodes tends to be functional, i.e. the network can be highly reduced, and that this reduced subset is dominated by forcing functions. Such results are augmented by gene interaction studies which seem to find many false positives which are not actually functional. Given the role of forcing functions we should expect that most functional links should in fact be non-functional with respect to the network as a whole. Moreover, recent work on the domain of attraction of the [|cell cycle] shows that 85% of states goes to one mega-attractor ([|Li et al. 2004]).

//Properties of Boolean Networks: NK networks//
Kauffman's approach to Boolean networks was powerful in the sense that it revealed properties of a whole **ensemble of cases**. He did this by looking at changes in rules (vs ODE parameters) by generating networks with random connections (not [|small world]) where all nodes have the same [|connectivity] or number of incoming connections (N = no. nodes, K=indegree, i.e. no. of incoming connections for each node). In general, he found that random Boolean networks have many different attractors with a very long cycle length. Furthermore, these networks have a low **homeostatic stability** (i.e. small disturbances in the initial conditions can easily lead to convergence to a different attractor) and a high **reachabil****ity** (i.e. different attractors are easily reached by a small disturbance from a given attractor). However, his analysis also showed that the number of attractors depends on the connectivity, the least being found for K=2. Moreover, cycle length, stability and reachability between domains also vary with connectivity, with shortest cycle lengths, lowest reachability and highest stability also being found for K=2. The properties found for K=2 most resemble what we would expect from biological networks. For this reason there has been a large focus on K=2.

So, is the connectivity of the network the main determinant in these properties? In fact, Kauffman made special choices with respect to network types and finding K=2 for this "optimum" is in fact an artifact of sampling. This is because for binary Boolean functions, K=2 leads to the highest proportion of forcing functions in all possible function (14 out of 16) which in turn leads to the results on robustness. With more connections a higher proportion of functions becomes non-forcing (e.g. XOR-like) and therefore we find more domains of attractions and less stability. In fact Hogeweg (REF) studied the effect of the proportion of non-forcing (XOR) functions for K=2 and showed that if all function are non-forcing this results in chaotic behaviour or long state cycles while with only forcing functions one obtains single strong attractors. Stability is therefore not a result of a connectivity of 2, but the proportion of forcing structures, and such reasoning can be extended to biological systems.

//What do we expect from gene regulation networks and what is important?//
Not important:
 * even very simple networks have **multiple attractors**
 * we can identify **cell states** with a given gene regulatory attractor
 * **alternative trajectories** from A' and A'' to attractor B: this gives us **multiple causes** for a given end point
 * **domains of attraction** as an important measure of robustness (e.g. how likely to get to a certain attractor / cell state)
 * **forcing functions** are important for robustness (e.g. to knockouts)
 * connectivity of 2 as an ideal setting for biological networks.

//How to go from gene expression data to a boolean network?//
Nowadays, it is relatively easy to measure gene expression on a large scale. However, translating this gene expression data into a boolean regulation network is non-trivial. An example of this is given by van Wageningen et al. ([|2010]). For 141 kinases and 38 phosphatases in Yeast, they measured the effects on gene expression of single and double knock-outs. 60% of the single knock-outs did not lead to a different phenotype (i.e. less than 8 genes changed expression because of the mutation), and even in the double knock-outs buffering effects were found. These results show that there is a high level of redundancy (as we would expected) and that there are many **epistatic interactions.** Based on the knock-out data, van Wageningen et al. tried to reconstruct the underlying regulatory network. However, even for small subsets of the data they already found that there are several networks that can explain the same expression patterns (see picture). Hence, //backward engineering of networks is non-unique//! It is very unlikely that you could reconstruct the "true" underlying regulatory network from a single set of gene expression data.



//Modeling the cell cycle: data, boolean networks and ODEs//
Based on expression data, we now have many data on the details of regulation networks in cells, especially with respect to the cell cycle (G1, S, G2, M). Li et al. (2004) used these data to reconstruct a simplified regulation network of the cell cycle using a Boolean network. Using this model, they could further reduce the model to a minimal network needed to generate the cell cycle. They then conducted a robustness analysis of the behaviour of this network. This revealed that this network was highly robust in that a large proportion of initial states would converge on a central pathway which ultimately led to a single attractor. This central pathway can be interpreted as the cell cycle, which can be initiated again by a deviation out of the final attractor, thus generating a "limit cycle". It is interesting to compare this to ODE approaches to modeling the cell cycle. Tyson et al (1991 PNAS REF, and later papers) have modeled the cell cycle in terms of 6 kinetic equations governing the concentrations of the 6 molecules thought to be most important in the cell cycle. To analyze the model it was simplified to 2 variables, which enables 2D stability analysis. In this way they obtain a model with a stable attractor, which when disturbed generates a long cyclical trajectory which eventually returns to the original attractor. This trajectory can be interpreted as the cell cycle. Comparing the two models we can come to some interesting insights. In the Boolean network: In the ODE:
 * We have simple interactions between genes (on, off etc.)
 * We observe a "bottom-up" convergence on a limit cycle, generated by the structure of the network
 * Bottom-up: one implements measured interactions (simple binary activation, repression), then sees how the system behaves
 * Interpretation: Network structure and percolation through the network generates convergence on the central pathway and attractor. On the pathway genes are either off or on, and this can drive downstream processes in the cell cycle.
 * We see the minimization of the number of variables
 * We see "special" interactions are implemented (assumed) between some genes (top-down)
 * We get a "standard" result: excitable medium (limit cycle)
 * Top-down: One takes the idea of excitable medium and applies it to the cell cycle (assume how genes affect each other so that it fits).
 * Interpretation: Positions in the vector field respresent molecule concentrations, relative concentration differences (subtle) must drive cell cycle phases.

Next: Multilevel CA: emergence