Simulations and Modeling Group Project #1

Timothy Schoenharl
Ryan Hemme
Dongyoung Shin
Brandon Rich

Source code for this project


created with NetLogo

view/download model file: HW_Gen_Sim.nlogo


Biological Scenario

Population genetics in its simplest form can be defined as the study of the frequency of occurrence of alleles between and within populations and, furthermore, to understand the causes of the observed variations. Computer modeling is a useful tool for studying population genetics in that you can apply a priori knowledge of a population and build a simple model to help explain observed variation and potentially predict future changes in allele frequencies.

In our model, we set the following parameters; we decided to build a world that is comprised of a peak with a high elevation and a valley with a low elevation. Between these to points lay varying degrees of intermediate ranges of elevation. Next we determined that we would investigate two genes that contribute equally to our desired trait of fitness. Our trait is loosely defined as the organism’s ability to survive at a given level of elevation. Specifically we decided that an individual that was homozygous recessive for both alleles ‘a’ and ‘b’ would be more fit and therefore successful at a lower elevation and an individual that was homozygous dominant (AABB) would have a greater fitness at a higher elevation. Moreover, our model employed only two alleles for each gene which were present in equal frequency at the beginning of each simulation (AaBb). Our organisms were hermaphroditic and, in addition, were stationary because we hoped to model changes in gene frequency and by incorporating an organism that moves randomly throughout the world we might be inadvertently examining the behavior and movement of the organisms.

Our model negated the affects of migration or gene flow and mutation, but did attempt to examine the impact of genetic drift and selection on our population. However, our fitness and therefore selection was ‘forced’ by an equation and might therefore be an inaccurate way to observe fitness and natural selection in our population. In the model, mutation, which modifies a gene, does not exist but another mutation as a crossing over is considered. Based on the random mating population, the genes A (a) and B (b) are crossed over each other depending on the distance between two loci and we can observe the dynamics of the genotype’s population with controlling the distance. Earlier versions of our simulation resulted in organisms that were homozygous for both alleles and therefore more fit in one of the extreme elevation locales, which would be analogous to observing the effects of genetic drift. This is plausible when considering the small starting population size.


Model Characteristics

The NetLogo implementation of the model went through two distinct phases. Our progression demonstrates clearly the principles of development described in "Individual-Based Modeling and Ecology," which state that a complex model will not necessarily yield better results than a simple, purposeful representation of the real-life system. The first version of our simulation was built on the idea of a small time scale wherein each entity lived for a certain number of steps during which it matured, sought mates, and eventually died with some probability. By manipulating maurity and gestation times, a stable population could often be sustained. However, such details did not necessarily serve our purpose of investigating genetic patterns based on individual fitness. Moreover, this idea of fitness (which was to be determined by genetic affinity to terrain elevation), was not depicted in the first version of the model.

By widening the timestep and implementing survivability, our model becomes much more attuned to our specific problem statement. The second-version time scale now operates on a generational basis: each iteration of the model's time counter sees the creation of a number of offspring based on a simply-implemented proximity mating algorithm. After mating, the parents hatch five offspring, who disperse a short distance from their birthplace. The parents then die, and a fitness algorithm determines which offspring will live on until the next timestep, in which they themselves will mate.

That fitness algorithm allows entities with a higher number of dominant (capital A or B) genes a higher survivability rate in the elevated (white) region. Entities with a great number of recessive (lowercase) genes thrive in the darker lowlands. Because location is now a constant, genetic makeup is the only remaining influence on entity life.

By narrowing the focus of the model and limiting the number of details, we attempt to reach the "Medawar zone" described in our text -- that optimal range of complexity that allows us to reproduce expected patterns based on a small but relevant number of variables. Removing the complexity of individual motion and focusing on a proper time scale allows us to view population trends developing based solely on the distribution of genotypes and their associated terrain fitness -- precisely those variables we wish to investigate.


Analysis of Model

We initially developed the model as an exploratory simulation. We create a heterogeneous population of organisms with genome AaBb and place them randomly on the environment. We run the simulation until the population becomes homogenous, with all organisms either AABB or aabb. The simulation consistently converged to a point where either all organisms had genome AABB and were clustered around the peak, or all had genome aabb and we clustered around the valley. This behavior is consistent with models of genetic drift. As stated, this initial version of the simulation does not consider crossover.


Additions to Model

The current model presents a simulation of genetic drift in a population that spans a diverse landscape. Individual fitness is related to the genetic makeup of the organism, as well as its location on the landscape. We propose several extensions to the model in order to investigate additional phenomena. First, we will incorporate a realistic implementation of genetic recombination, which will allow us to examine the genetic drift of the population in a more detailed way. Next, we are interested in extending the fitness value to apply to mate selection as well as survival. We feel that this addition may allow us to observe different behavior in the simulation.

Following the tenets of pattern-based modeling, we seek to implement the simplest patterns that incorporate these behaviors. In order to explore the essential patterns that contribute to sympatric speciation, we will implement recombination first, then fitness-based mating. Should the additions of these patterns not yield the expected results, we will look into adding a modification to fitness to prevent the colony from clumping together, or adding a delay before sexual maturity.

We have done some investigation into incorporating genetic recombination. Initial results from implementing recombination have been interesting. Unlike the simplistic simulation, the population does not converge to a pure dominant or recessive genotype, but seems to be driven towards either aaBB or AAbb, with the population gathering in areas of median elevation. These results are a bit puzzling and we are re-evaluating the simulation.