Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations
 
doctoral thesis

Generalizing Bulk-Synchronous Parallel Processing for Data Science: From Data to Threads and Agent-Based Simulations

Tian, Zilu  
2023

Agent-based simulations have been widely applied in many disciplines, by scientists and engineers alike. Scientists use agent-based simulations to tackle global problems, including alleviating poverty, reducing violence, and predicting the impact of pandemics. In industry, engineers use agent-based simulations to reduce cost and improve efficiency, by creating virtual worlds to model different scenarios and explore various designs with fast feedback at low cost. Agent-based simulations play an increasingly prominent role in modern society.

Despite their significance, agent-based simulations have benefited little from the recent progress in computer science, especially on the fronts of parallel computing and data management. While there has been a growing need to simulate at an ever-increasing scale with finer details, developments on systems that support fast execution of large-scale simulations and efficient integration of simulations with existing data science pipeline operators are dragging behind. This creates new challenges and opportunities for computer scientists.

In this work, we make the first foray into defining a clean semantics that serves as the foundation of agent-based simulations, an abstraction that facilitates users to integrate simulations into data science pipelines, a scalable system architecture with efficient optimizations, and a high-level user-friendly programming model. In particular, we generalize the bulk-synchronous parallel (BSP) processing model to make it better support agent-based simulations. Such simulations frequently exhibit hierarchical structure in their communication patterns which can be exploited to improve performance. We allow for the creation of temporary artificial network partitions during which agents synchronize only locally within their group in a way that does not compromise the correctness of a simulation. We also propose to encapsulate simulations via a $\syntax{Simulate}$ operator, which enables users to compose and nest simulations just like other data science pipeline operators. In addition, we have designed and developed an open-source distributed system for large-scale agent-based simulations, CloudCity, which implements our semantics to improve the locality of computation, communication, and synchronization in simulations. This system contains efficient optimizations to allow fast execution and efficient query of simulation results. To accommodate users from different backgrounds, we have also developed a user-friendly domain-specific language (DSL) embedded in the programming language Scala, which allows users to write parallel agent programs easily, even with little or no background in distributed computing. We experimentally evaluate the performance of our system on a benchmark suite of agent-based simulations and compare it against existing state-of-the-art BSP-like distributed systems, including Spark, GraphX, Giraph, and Flink Gelly, obtaining insights into the impact of various system design choices and optimization on simulation engine performance.

  • Files
  • Details
  • Metrics
Type
doctoral thesis
DOI
10.5075/epfl-thesis-8865
Author(s)
Tian, Zilu  
Advisors
Koch, Christoph  
Jury

Prof. Bryan Alexander Ford (président) ; Prof. Christoph Koch (directeur de thèse) ; Prof. Anne-Marie Kermarrec, Dr Milos Nikolic, Prof. Dan Olteanu (rapporteurs)

Date Issued

2023

Publisher

EPFL

Publisher place

Lausanne

Public defense year

2023-09-04

Thesis number

8865

Total of pages

182

Subjects

agent-based simulations

•

distributed systems

•

bulk-synchronous parallel processing

•

compilation

•

query languages

EPFL units
DATA  
Faculty
IC  
School
IINFCOM  
Doctoral School
EDIC  
Available on Infoscience
August 24, 2023
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/200044
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés