Population Sensing Using Mobile Devices: a Statistical Opportunity or a Privacy Nightmare?

In our daily lives, our mobile phones sense our movements and interactions via a rich set of embedded sensors such as a GPS, Bluetooth, accelerometers, and microphones. This enables us to use mobile phones as agents for collecting spatio-temporal data. The idea of mining these spatio-temporal data is currently being explored for many applications, including environmental pollution monitoring, health care, and social networking. When used as sensing devices, a particular feature of mobile phones is their aspect of mobility, in contrast to static sensors. Furthermore, despite having useful applications, collecting data from mobile phones introduces privacy concerns, as the collected data might reveal sensitive information about the users, especially if the collector has access to auxiliary information. In the first part of this thesis, we use spatio-temporal data collected by mobile phones in order to evaluate different features of a population related to their mobility patterns. In particular, we consider the problems of population-size and population-density estimation that have applications, among others, in crowd monitoring, activity-hotspot detection, and urban analysis. We first conduct an experiment where ten attendees of an open-air music festival act as Bluetooth probes. Next, we construct parametric statistical models to estimate the total number of visible Bluetooth devices and their density in the festival area. We further test our proposed models against Wi-Fi traces obtained on the EPFL campus. We conclude that mobile phones can be effectively used as sensing devices to evaluate mobility-related parameters of a population. For the specific problem of population-density estimation, we investigate the mobility aspect of sensing: We quantitatively analyze the performance of mobile sensors compared to static sensors. Under an independent and identically distributed mobility model for the population, we derive the optimal random-movement strategy for mobile sensors in order to yield the best estimate of population density (in the mean-squared error sense). This enables us to plan an adaptive trajectory for the mobile sensors. In particular, we demonstrate that mobility brings an added value to the sensors; these sensors outperform static sensors for long observation intervals. In the second part of this thesis, we analyze the vulnerability of anonymized mobility statistics stored in the form of histograms. We consider an attacker who has access to an anonymized set of histograms of a set of users’ mobility traces and to an independent set of non-anonymized histograms of traces belonging to the same users. We study the hypothesis-testing problem of identifying the correct matching between the anonymized histograms and the non-anonymized histograms. We show that the solution can be obtained by using a minimum-weight matching algorithm on a complete weighted bipartite graph. By applying the algorithm to Wi-Fi traces obtained on the EPFL campus, we demonstrate that in fact anonymized histograms contain a significant amount of information that could be used to uniquely identify users by an attacker with access to auxiliary information about the users. Finally, we demonstrate how trust relationships between users can be exploited to enhance their privacy. We consider the specific problem of the privacy-preserving computation of functions of data that belong to users in a social network. An example of an application is a poll or a survey on a private issue. Most of the known non-cryptographic solutions to this problem can be viewed as belonging to one of the following two extreme regimes. The first regime is when every user trusts only herself and she is responsible for protecting her own privacy. In other words, the circle of trust of a user has a single member: herself. In the second regime, every user trusts herself and the server, but not any of the other users. In other words, the circle of trust of a user comprises herself and the server. We investigate this problem under the assumption that users are willing to share their private data with trusted friends in the network, hence we consider a regime in which the circle of trust of a user consists of herself and her friends. Thus, our approach falls in-between the two mentioned regimes. Our algorithm consists of first partitioning users into circles of trust and then computing the global function by using results of local computations within each circle. We demonstrate that such trust relationships can be exploited to significantly improve the tradeoff between the privacy of users' data and the accuracy of the computation.


Related material