A new approach to synthesize heterogeneous agents and their associations for urban microsimulations

Microsimulation of urban transportation and land use evolution require base year, individual characteristics and disaggregate locations of the households and persons living in the study area. On the other hand, mainly due to privacy reasons, at best the census and travel survey, which are the primary sources of the data, provide only cross tabulations at various level of spatial aggregations (sector, commune, region, and country) and a small sample of the individual level information (microdata) that usually doesnt have the spatial information attached to it. This necessitates generation of the baseline population using some synthetic means. Currently, various variants of the Iterative Proportional Fitting (IPF) are predominantly used to generate the base year synthetic population. IPF essentially creates clones of the individual records of households and persons from microdata in a way that the marginal at one or more levels of spatial aggregations are satisfied. In the process of doing so, the IPF ensures that the correlation structure of the sample is preserved in the synthesized populations. The key shortcomings of IPF include: a) losing the heterogeneity that may not have been captured in the microdata, due to cloning rather than true synthesis of the population b) over reliance on the accuracy of the data to determine the cloning weights c) very poor scalability with respect to the increased demand in the number of characteristics of the population that need to be synthesized. In order to overcome these shortcomings and move the research in population synthesis for microsimulations significantly forward, we propose a Markov Chain Monte Carlo Simulation based approach that its core uses Gibbs and Metropolis-Hasting sampling methods. This approach, instead of cloning the microdata, generates the joint distribution of the characteristics of the households, persons, and the associations between them, by using any available data on these three dimensions. The associations are defined in terms of the position (head, adult, kid etc.) a person can take in a household. The problem is divided into generating two types of independent and un-normalized distributions i.e. agent distributions and association distributions for each realization of agents. The resulting joint distribution out of these independent distributions is thus the best possible representation of the real population, given all the available information. The required synthetic population for an urban microsimulation can then be generated by simply taking a realization out of this un-normalized joint distribution. This way the population synthesis can become seamless part of these urban microsimulations and thus it can also be included in the sensitivity analysis of them. In terms of the implementation, we have developed a C++ based code and are testing the methodology by generating the synthetic population for Brussels, Belgium. We are also in the process of synthesizing the population for Switzerland, where we have access to the entire census. This way, we will be able to compare the performance of our proposed methodology with IPF and combinatorial optimization based methods in terms of reproducing the actual population.

Presented at:
2nd Workshop of Urban Dynamics, Termas de Chillán, Chile, March 28, 2012

 Record created 2014-01-20, last modified 2019-03-16

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)