Ryan C. Kennedy1, Kelly E. Lane2, S. M. Niaz Arifin1, Hope Hollocher2, Agustín Fuentes3, Gregory R. Madey1
1 Department of Computer Science and Engineering; 2 Department of Biological Sciences; 3 Department of Anthropology
University of Notre Dame
Agent-based modeling (ABM) is very adept at modeling complex systems and is particularly well-suited for modeling natural phenomena. Coupling an ABM with geographical information system (GIS) data increases the usefulness and realism of a model. Such integration is not trivial; here, we present our methods and recommendations for effectively integrating complex GIS data into a large-scale ABM, with careful attention paid to performance. We demonstrate our techniques on an advanced epidemiological model named LiNK.
In an ABM, agents are treated as entities with their own properties and behaviors. They have the ability to interact with other agents and their environment, with simple behavioral rules often leading to emergent system dynamics. Agents typically move about a grid-like structure; GIS-aware agents utilize their surroundings, both GIS data and other agents, to make more informed behavioral decisions. Performing spatial queries on GIS data at simulation runtime can be expensive, particularly with complex GIS data or complex behavioral rules. We next describe the basics of GIS data and then explore several techniques for agents to access GIS data within a simulation, including our hybrid approach.
A GIS is a system where real-world data, such as rivers, lakes, or even population data, is represented. There are two main types of GIS data: vector and raster data. Raster data is analogous to a collection of cells that make up a larger grid-like structure, with each cell possessing its own attributes. Vector data is coordinate-based and is therefore made up of points, lines, and polygons, each with associated attributes. The grid-like composition of raster data easily lends itself to the grid-like space of an ABM; however, vector data is more realistic and more efficiently stored.
The difficulty in integrating GIS data into an ABM lies in how the agents will interact with the GIS data at runtime. There are several techniques by which an agent can access GIS data from within a simulation. Raster-based queries work well and are straightforward to implement; however, the mechanisms by which agents access GIS data and the relatively inefficient means by which they are stored limit their usefulness. Performing spatial queries on vector data is the very accurate, but such queries are often very computationally expensive. For example, querying a complex polygon with 500 vertices once for each agent’s location, at each timestep in a simulation model is not advisable. To partially remedy this, spatial data can be simplified. The polygon mentioned above may be able to be represented with 50 vertices while still maintaining sufficient data integrity. In this case, queries would be much faster. The main problem with querying simplified vector data is that many queries will be repeated over the course of a simulation run. We create a hybrid approach, the precalculated query matrix (PQM), to remedy this. The PQM approach combines benefits of raster and vector data files. Here, at a specified granularity, every possible spatial query is performed on the vector data prior to simulation runtime. This data is then loaded, via serialization, at runtime and agents can access the GIS data in constant time. PQMs allow us to create many agents with the complex GIS-based behavioral rules our simulation calls for, while also making our model fast and scalable. We present limited performance data for the various spatial queries in Tables I and II.
Our work is applied to LiNK, a simulation model of pathogen transmission among macaques on the Indonesian island of Bali. The goal of LiNK is to study the effects of GIS data on host and parasite dynamics. LiNK is complex and creates vast amounts of data; therefore, we have created an additional tool, LiNKStat, to analyze and display statistics about the model in an interactive and graphical manner. Figure 1 shows a screen capture of the LiNK model. The efficient integration of GIS data into an ABM has allowed us to create a more informative and realistic simulation model, helping to gain insight on the impact of landscape on transmission patterns, and thusly assisting solutions to global health concerns.
Figure 1: Screen capture of the LiNK mode
This material is based upon work supported by the National Science Foundation under Grant No. 0639787. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Development of this model was also made possible through partial support from the University of Notre Dame Center for Research Computing and the Leakey Foundation.