(Concepts: multiple attractors, multiple causes, "how special is a specific case?", forcing structures, domains of attraction)

Before further considering the behaviour of CAs we make a small digression and discuss Boolean networks. This digression is useful considering the type of analysis conducted by Kauffman (1969) on random Boolean networks as a paradigm for gene regulation. In principle such networks are like binary CAs but with very specific IO systems whereby each node has its own transition rule. In special cases one can make a direct mapping from a Boolean network to a spatial CA, if we can structure the nodes such that they are "spatially arranged" (i.e. such that they need input from their neighbours only). The different transition rules can be incorporated in different states of cells. One example is a 10-position 1D CA with a rule layer and a state layer with different rules for specific positions (see Fig below).

Simple 1D Boolean network with multiple attractors

Boolean network models are very useful with respect to understanding the gene-regulation networks that we are unraveling from data. For instance, we now have the "full" transcription network of yeast. This looks like a complex and messy network when visualized. Moreover, what we have is a very specific case, possibly with specific properties, and this raises several questions:

How does it behave?

How special is it?

How did it evolve to be the way it is?

In order to make some sense of this we can compare the observed specific network to a whole class of random networks, of which the behaviour has been characterized. Kauffman's approach to studying properties of gene regulation networks was based on studying the properties of random networks, i.e. what can be learned from the generation of random boolean networks, what type of behaviour is generic in such systems and how is it affected by changes in parameters?
The behaviour of such a system shows that one can find a
ttractors and different basins of attraction.
Very generally his results can be condensed into his finding:

multiple attractors (e.g. cell types): There are multiple domains of attraction (all leading to a different attractor).

multiple causes (vs THE cause and THE effect) (see Huang and Ingberg 2000): There are multiple ways to get to a certain attractor.

robustness: A small change to the network (e.g. deletion of a node, "knock out") does not majorly change the network behaviour.

This means that random networks often harbour multiple attractors each fed by domains of attraction. Moreover the same signal can lead to different outcomes dependent on the state of the network, and several pathways can lead to the same attractor (cell state). In Huang and Ingberg (2000) the role of gene expression and the resulting signals is studied with respect to outcomes of cell states and are found to differ leading either to quiescence, differentiation, proliferation or apoptosis. In this study, single signals are seen to interact with 4 genes allowing for many outcomes. In another example, Huang et al (2005), show two alternative routes of neutrophil differentiation in a high dimensional state space: 2773 dimensions (nodes), n2773 states! This example shows that there are two completely different transients (regarding gene expression patterns) that lead to the same attractor: a differentiated neutrophil. This goes some way to giving us insight or guiding our intuition in confounding experimental results. Since we know the generic properties of Boolean networks, we can have an expectation of different attractors and domains of attraction, i.e. different signals lead to the same result and vice versa.

Huang et al. 2005: Two alternative routes of neutrophyl differentiation

An important realization is that by studying random Boolean networks we can get an idea of generic properties of such systems, which allow us to have a conceptual advantage when dealing, for instance, with gene networks.

Forcing structures

Also apparent from Kauffman's work is that some types of Boolean functions have the property that they are insensitive to multiple inputs in the sense that a single input is sufficient to lead to an output. Such functions can be considered to be forcing in the sense that a single input forces the node to produce an output. One such function is the OR gate or function. Namely, if one of the input nodes is 1 ("on"), the output will be 1 regardless of the value of the second input node. Because such forcing functions are insensitive to several inputs, these structures lead to redundancy and robustness, i.e. not all input information is needed for the output and mutations (e.g. losing an input node) might not change the final result.

Interestingly, nearly 80% of yeast genes can be knocked-out without any observable effects and even double knock-outs can be invisible (REF?). Does this mean that yeast gene networks contain many forcing functions with are insensitive to changes via knock-outs? Hogeweg (2000?) showed that in random Boolean networks only a small fraction of nodes tends to be functional, i.e. the network can be highly reduced, and that this reduced subset is dominated by forcing functions. Such results are augmented by gene interaction studies which seem to find many false positives which are not actually functional. Given the role of forcing functions we should expect that most functional links should in fact be non-functional with respect to the network as a whole. Moreover, recent work on the domain of attraction of the cell cycle shows that 85% of states goes to one mega-attractor (Li et al. 2004).

Properties of Boolean Networks: NK networks

Kauffman's approach to Boolean networks was powerful in the sense that it revealed properties of a whole ensemble of cases. He did this by looking at changes in rules (vs ODE parameters) by generating networks with random connections (not small world) where all nodes have the same connectivity or number of incoming connections (N = no. nodes, K=indegree, i.e. no. of incoming connections for each node).
In general, he found that random Boolean networks have many different attractors with a very long cycle length. Furthermore, these networks have a low homeostatic stability (i.e. small disturbances in the initial conditions can easily lead to convergence to a different attractor) and a high reachability (i.e. different attractors are easily reached by a small disturbance from a given attractor). However, his analysis also showed that the number of attractors depends on the connectivity, the least being found for K=2. Moreover, cycle length, stability and reachability between domains also vary with connectivity, with shortest cycle lengths, lowest reachability and highest stability also being found for K=2. The properties found for K=2 most resemble what we would expect from biological networks. For this reason there has been a large focus on K=2.

So, is the connectivity of the network the main determinant in these properties?

In fact, Kauffman made special choices with respect to network types and finding K=2 for this "optimum" is in fact an artifact of sampling. This is because for binary Boolean functions, K=2 leads to the highest proportion of forcing functions in all possible function (14 out of 16) which in turn leads to the results on robustness. With more
connections a higher proportion of functions becomes non-forcing (e.g. XOR-like) and therefore we find more domains of attractions and less stability. In fact Hogeweg (REF) studied the effect of the proportion of non-forcing (XOR) functions for K=2 and showed that if all function are non-forcing this results in chaotic behaviour or long state cycles while with only forcing functions one obtains single strong attractors. Stability is therefore not a result of a connectivity of 2, but the proportion of forcing structures, and such reasoning can be extended to biological systems.

What do we expect from gene regulation networks and what is important?

even very simple networks have multiple attractors

we can identify cell states with a given gene regulatory attractor

alternative trajectories from A' and A'' to attractor B: this gives us multiple causes for a given end point

domains of attraction as an important measure of robustness (e.g. how likely to get to a certain attractor / cell state)

forcing functions are important for robustness (e.g. to knockouts)

Not important:

connectivity of 2 as an ideal setting for biological networks.

How to go from gene expression data to a boolean network?

Nowadays, it is relatively easy to measure gene expression on a large scale. However, translating this gene expression data into a boolean regulation network is non-trivial. An example of this is given by van Wageningen et al. (2010). For 141 kinases and 38 phosphatases in Yeast, they measured the effects on gene expression of single and double knock-outs. 60% of the single knock-outs did not lead to a different phenotype (i.e. less than 8 genes changed expression because of the mutation), and even in the double knock-outs buffering effects were found. These results show that there is a high level of redundancy (as we would expected) and that there are many epistatic interactions. Based on the knock-out data, van Wageningen et al. tried to reconstruct the underlying regulatory network. However, even for small subsets of the data they already found that there are several networks that can explain the same expression patterns (see picture). Hence, backward engineering of networks is non-unique! It is very unlikely that you could reconstruct the "true" underlying regulatory network from a single set of gene expression data.

Modeling the cell cycle: data, boolean networks and ODEs

Based on expression data, we now have many data on the details of regulation networks in cells, especially with respect to the cell cycle (G1, S, G2, M). Li et al. (2004) used these data to reconstruct a simplified regulation network of the cell cycle using a Boolean network. Using this model, they could further reduce the model to a minimal network needed to generate the cell cycle. They then conducted a robustness analysis of the behaviour of this network. This revealed that this network was highly robust in that a large proportion of initial states would converge on a central pathway which ultimately led to a single attractor. This central pathway can be interpreted as the cell cycle, which can be initiated again by a deviation out of the final attractor, thus generating a "limit cycle".
It is interesting to compare this to ODE approaches to modeling the cell cycle. Tyson et al (1991 PNAS REF, and later papers) have modeled the cell cycle in terms of 6 kinetic equations governing the concentrations of the 6 molecules thought to be most important in the cell cycle. To analyze the model it was simplified to 2 variables, which enables 2D stability analysis. In this way they obtain a model with a stable attractor, which when disturbed generates a long cyclical trajectory which eventually returns to the original attractor. This trajectory can be interpreted as the cell cycle.

Central pathway attractor in boolean model of cell cycle (Li et al 2004)

Comparing the two models we can come to some interesting insights.
In the Boolean network:

We have simple interactions between genes (on, off etc.)

We observe a "bottom-up" convergence on a limit cycle, generated by the structure of the network

Bottom-up: one implements measured interactions (simple binary activation, repression), then sees how the system behaves

Interpretation: Network structure and percolation through the network generates convergence on the central pathway and attractor. On the pathway genes are either off or on, and this can drive downstream processes in the cell cycle.

In the ODE:

We see the minimization of the number of variables

We see "special" interactions are implemented (assumed) between some genes (top-down)

We get a "standard" result: excitable medium (limit cycle)

Top-down: One takes the idea of excitable medium and applies it to the cell cycle (assume how genes affect each other so that it fits).

Interpretation: Positions in the vector field respresent molecule concentrations, relative concentration differences (subtle) must drive cell cycle phases.

P Hogeweg (2000) Shapes in the shadow: Evolutionary dynamics of morphogenesis.Artif. Life, 6: 85-101. MEDLINE. DownLoad PDF. Huang S & Ingber DE (2000) Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp Cell Res. 261:91-103DOI Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 22:437-67. DOI Li et al. (2004) The yeast cell-cycle network is robustly designed. PNAS 101:4781-4786. DOI

(CHANGELOG 2014-2015)
- Extended explanation of bool net -> CA
- Extended explanation of Kaufmann results
- Added: expression data -> bool nets

(NOTES FROM COURSE 2006-2007)

Boolean Networks

As originally studied by Kauffman (1969) as a paradigm for gene regulation.
In principle they are like binary CAs but with a very specific IO system and each node has its own transition rule.

In special cases one can make a direct mapping to CA: example.
10 position CA with rule layer and state layer with different rules for specific positions.
Already in this case one can find different basins of attraction.

Kauffman's main points about what to expect from gene regulation networks (as studied in random networks):
1) multiple attractors (e.g. cell types)
2) multiple causes (vs THE cause and THE effect) (see Huang and Ingberg (2000)).

Expression > signal > more possible effects
i) quiesence
ii) differentiation
iii) proliferation
iv) apoptosis

one signal > four genes > 4 outcomes: explains uncomfortable results from experiments.
Because Boolean networks => we now have expectation of different attractors and domains of attraction: i.e. different signals lead to same result and vice versa.
This gives a conceptual advantage.

Forcing Structures

OR gates are forcing and therefore insensitive to other inputs: they are forcing and lead to structural stability

In yeast 80\% of genes can be knocked-out without observable effect, even for double knock-out: forcing functions?

Hogeweg: in random boolean networks only small fraction of nodes in functional (networks can be reduced to that dominated by forcing functions).

In experimental work there are a lot of false positives in gene interaction studies (binding). But they are not functional. But we should in fact expect a lot of non-functionality is there is a functional link.

Example: Li et al. > cell cycle > 85\% of states goes to one attractor.

Properties of Boolean Networks

How to get an idea about a whole ensemble of cases?
Here: look at changes in rules (vs ODE parameters) by generating random connections (not small world) where all nodes have the same number of connections (N=nodes, K=connections, NK networks).

The number of attractors depends on K, with least at K=2. Also give shortest cycle lengths and high stability and low reachability amongst domains.
Why? Often focus on K=2, but wrong reason.
In Kauffman's analysis he makes special choices of network type and K=2 finding is artefact of sampling, because for binary boolean functions K=2 leads to the high proportion of forcing functions (14 out of 16), which in turns leads to the results on robustness. With more connections a lower proportion becomes forcing (e.g. XOR) and therefore leads to more domains of attractions and less stability.

Hogeweg: study the effect of proportion of XOR functions for K=2. Results show that if all are non-forcing chaotic behaviour or long state cycles are obtained and complete forcing leads to a strong attractor.

It is therefore plausible that in biological systems it is not necessarily the connectivity, but the proportion of forcing function that determine stability / robustness.

Next: Multilevel CA: emergence

TODO List## Boolean Networks

(

Concepts: multiple attractors, multiple causes, "how special is a specific case?", forcing structures, domains of attraction)Before further considering the behaviour of CAs we make a small digression and discuss Boolean networks. This digression is useful considering the type of analysis conducted by Kauffman (1969) on random Boolean networks as a paradigm for gene regulation. In principle such networks are like

but with very specific IO systems whereby each node has its own transition rule. In special cases one can make a direct mapping from a Boolean network to a spatial CA, if we can structure the nodes such that they are "spatially arranged" (i.e. such that they need input from their neighbours only). The different transition rules can be incorporated in different states of cells. One example is a 10-position 1D CA with a rule layer and a state layer with different rules for specific positions (see Fig below).binaryCAsBoolean network models are very useful with respect to understanding the gene-regulation networks that we are unraveling from data. For instance, we now have the "full" transcription network of yeast. This looks like a complex and messy network when visualized. Moreover, what we have is a very specific case, possibly with specific properties, and this raises several questions:

In order to make some sense of this we can compare the observed specific network to a whole class of random networks, of which the behaviour has been characterized. Kauffman's approach to studying properties of gene regulation networks was based on studying the properties of random networks, i.e. what can be learned from the generation of random boolean networks, what type of behaviour is generic in such systems and how is it affected by changes in parameters?How does it behave?How special is it?How did it evolve to be the way it is?The behaviour of such a system shows that one can find a

ttractors and different basins of attraction.

Very generally his results can be condensed into his finding:

multiple attractors(e.g. cell types): There are multiple domains of attraction (all leading to a different attractor).multiple causes(vs THE cause and THE effect) (see Huang and Ingberg 2000): There are multiple ways to get to a certain attractor.robustness: A small change to the network (e.g. deletion of a node, "knock out") does not majorly change the network behaviour.This means that random networks often harbour multiple attractors each fed by domains of attraction. Moreover the same signal can lead to different outcomes dependent on the state of the network,

andseveral pathways can lead to the same attractor (cell state). In Huang and Ingberg (2000) the role of gene expression and the resulting signals is studied with respect to outcomes of cell states and are found to differ leading either to quiescence, differentiation, proliferation or apoptosis. In this study, single signals are seen to interact with 4 genes allowing for many outcomes. In another example, Huang et al (2005), show two alternative routes of neutrophil differentiation in a high dimensional state space: 2773 dimensions (nodes), n2773 states! This example shows that there are two completely different transients (regarding gene expression patterns) that lead to the same attractor: a differentiated neutrophil. This goes some way to giving us insight or guiding our intuition in confounding experimental results. Since we know the generic properties of Boolean networks, we can have an expectation of different attractors and domains of attraction, i.e. different signals lead to the same result and vice versa.An important realization is that by studying random Boolean networks we can get an idea of generic properties of such systems, which allow us to have a conceptual advantage when dealing, for instance, with gene networks.Forcing structuresAlso apparent from Kauffman's work is that some types of Boolean functions have the property that they are insensitive to multiple inputs in the sense that a single input is sufficient to lead to an output. Such functions can be considered to be

forcingin the sense that a single input forces the node to produce an output. One such function is theOR gate or function. Namely, if one of the input nodes is 1 ("on"), the output will be 1 regardless of the value of the second input node. Because such forcing functions are insensitive to several inputs, these structures lead toredundancyandrobustness, i.e. not all input information is needed for the output and mutations (e.g. losing an input node) might not change the final result.Interestingly, nearly 80% of yeast genes can be knocked-out without any observable effects and even double knock-outs can be invisible (REF?). Does this mean that yeast gene networks contain many forcing functions with are insensitive to changes via knock-outs? Hogeweg (2000?) showed that in random Boolean networks only a small fraction of nodes tends to be functional, i.e. the network can be highly reduced, and that this reduced subset is dominated by forcing functions. Such results are augmented by gene interaction studies which seem to find many false positives which are not actually functional. Given the role of forcing functions we should expect that most functional links should in fact be non-functional with respect to the network as a whole. Moreover, recent work on the domain of attraction of the cell cycle shows that 85% of states goes to one mega-attractor (Li et al. 2004).

Properties of Boolean Networks: NK networksKauffman's approach to Boolean networks was powerful in the sense that it revealed properties of a whole

ensemble of cases. He did this by looking at changes in rules (vs ODE parameters) by generating networks with random connections (not small world) where all nodes have the same connectivity or number of incoming connections (N = no. nodes, K=indegree, i.e. no. of incoming connections for each node).In general, he found that random Boolean networks have many different attractors with a very long cycle length. Furthermore, these networks have a low

homeostatic stability(i.e. small disturbances in the initial conditions can easily lead to convergence to a different attractor) and a highreachability(i.e. different attractors are easily reached by a small disturbance from a given attractor). However, his analysis also showed that the number of attractors depends on the connectivity, the least being found for K=2. Moreover, cycle length, stability and reachability between domains also vary with connectivity, with shortest cycle lengths, lowest reachability and highest stability also being found for K=2. The properties found for K=2 most resemble what we would expect from biological networks. For this reason there has been a large focus on K=2.So, is the connectivity of the network the main determinant in these properties?

In fact, Kauffman made special choices with respect to network types and finding K=2 for this "optimum" is in fact an artifact of sampling. This is because for binary Boolean functions, K=2 leads to the highest proportion of forcing functions in all possible function (14 out of 16) which in turn leads to the results on robustness. With more

connections a higher proportion of functions becomes non-forcing (e.g. XOR-like) and therefore we find more domains of attractions and less stability. In fact Hogeweg (REF) studied the effect of the proportion of non-forcing (XOR) functions for K=2 and showed that if all function are non-forcing this results in chaotic behaviour or long state cycles while with only forcing functions one obtains single strong attractors. Stability is therefore not a result of a connectivity of 2, but the proportion of forcing structures, and such reasoning can be extended to biological systems.

What do we expect from gene regulation networks and what is important?- even very simple networks have
- we can identify

Not important:multiple attractorscell stateswith a given gene regulatory attractoralternative trajectoriesfrom A' and A'' to attractor B: this gives usmultiple causesfor a given end pointdomains of attractionas an important measure of robustness (e.g. how likely to get to a certain attractor / cell state)forcing functionsare important for robustness (e.g. to knockouts)How to go from gene expression data to a boolean network?Nowadays, it is relatively easy to measure gene expression on a large scale. However, translating this gene expression data into a boolean regulation network is non-trivial. An example of this is given by van Wageningen et al. (2010). For 141 kinases and 38 phosphatases in Yeast, they measured the effects on gene expression of single and double knock-outs. 60% of the single knock-outs did not lead to a different phenotype (i.e. less than 8 genes changed expression because of the mutation), and even in the double knock-outs buffering effects were found. These results show that there is a high level of redundancy (as we would expected) and that there are many

epistatic interactions.Based on the knock-out data, van Wageningen et al. tried to reconstruct the underlying regulatory network. However, even for small subsets of the data they already found that there are several networks that can explain the same expression patterns (see picture). Hence,backward engineering of networks is non-unique! It is very unlikely that you could reconstruct the "true" underlying regulatory network from a single set of gene expression data.Modeling the cell cycle: data, boolean networks and ODEsBased on expression data, we now have many data on the details of regulation networks in cells, especially with respect to the cell cycle (G1, S, G2, M). Li et al. (2004) used these data to reconstruct a simplified regulation network of the cell cycle using a Boolean network. Using this model, they could further reduce the model to a minimal network needed to generate the cell cycle. They then conducted a robustness analysis of the behaviour of this network. This revealed that this network was highly robust in that a large proportion of initial states would converge on a central pathway which ultimately led to a single attractor. This central pathway can be interpreted as the cell cycle, which can be initiated again by a deviation out of the final attractor, thus generating a "limit cycle".

It is interesting to compare this to ODE approaches to modeling the cell cycle. Tyson et al (1991 PNAS REF, and later papers) have modeled the cell cycle in terms of 6 kinetic equations governing the concentrations of the 6 molecules thought to be most important in the cell cycle. To analyze the model it was simplified to 2 variables, which enables 2D stability analysis. In this way they obtain a model with a stable attractor, which when disturbed generates a long cyclical trajectory which eventually returns to the original attractor. This trajectory can be interpreted as the cell cycle.

Comparing the two models we can come to some interesting insights.

In the Boolean network:

- We have simple interactions between genes (on, off etc.)
- We observe a "bottom-up" convergence on a limit cycle, generated by the structure of the network
- Bottom-up: one implements measured interactions (simple binary activation, repression), then sees how the system behaves
- Interpretation: Network structure and percolation through the network generates convergence on the central pathway and attractor. On the pathway genes are either off or on, and this can drive downstream processes in the cell cycle.

In the ODE:Next: Multilevel CA: emergence

## References

P Hogeweg(2000) Shapes in the shadow: Evolutionary dynamics of morphogenesis.Artif. Life,6: 85-101. MEDLINE. DownLoad PDF.Huang S & Ingber DE(2000) Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp Cell Res. 261:91-103 DOIKauffman SA(1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 22:437-67. DOILi et al.(2004) The yeast cell-cycle network is robustly designed. PNAS 101:4781-4786. DOI(CHANGELOG 2014-2015)

- Extended explanation of bool net -> CA

- Extended explanation of Kaufmann results

- Added: expression data -> bool nets

(NOTES FROM COURSE 2006-2007)

Boolean Networks

As originally studied by Kauffman (1969) as a paradigm for gene regulation.

In principle they are like binary CAs but with a very specific IO system and each node has its own transition rule.

In special cases one can make a direct mapping to CA: example.

10 position CA with rule layer and state layer with different rules for specific positions.

Already in this case one can find different basins of attraction.

Kauffman's main points about what to expect from gene regulation networks (as studied in random networks):

1) multiple attractors (e.g. cell types)

2) multiple causes (vs THE cause and THE effect) (see Huang and Ingberg (2000)).

Expression > signal > more possible effects

i) quiesence

ii) differentiation

iii) proliferation

iv) apoptosis

one signal > four genes > 4 outcomes: explains uncomfortable results from experiments.

Because Boolean networks => we now have expectation of different attractors and domains of attraction: i.e. different signals lead to same result and vice versa.

This gives a conceptual advantage.

Forcing Structures

OR gates are forcing and therefore insensitive to other inputs: they are forcing and lead to structural stability

In yeast 80\% of genes can be knocked-out without observable effect, even for double knock-out: forcing functions?

Hogeweg: in random boolean networks only small fraction of nodes in functional (networks can be reduced to that dominated by forcing functions).

In experimental work there are a lot of false positives in gene interaction studies (binding). But they are not functional. But we should in fact expect a lot of non-functionality is there is a functional link.

Example: Li et al. > cell cycle > 85\% of states goes to one attractor.

Properties of Boolean Networks

How to get an idea about a whole ensemble of cases?

Here: look at changes in rules (vs ODE parameters) by generating random connections (not small world) where all nodes have the same number of connections (N=nodes, K=connections, NK networks).

The number of attractors depends on K, with least at K=2. Also give shortest cycle lengths and high stability and low reachability amongst domains.

Why? Often focus on K=2, but wrong reason.

In Kauffman's analysis he makes special choices of network type and K=2 finding is artefact of sampling, because for binary boolean functions K=2 leads to the high proportion of forcing functions (14 out of 16), which in turns leads to the results on robustness. With more connections a lower proportion becomes forcing (e.g. XOR) and therefore leads to more domains of attractions and less stability.

Hogeweg: study the effect of proportion of XOR functions for K=2. Results show that if all are non-forcing chaotic behaviour or long state cycles are obtained and complete forcing leads to a strong attractor.

It is therefore plausible that in biological systems it is not necessarily the connectivity, but the proportion of forcing function that determine stability / robustness.