In this paper, we present an acoustic localization system for multiple devices. In contrast to systems which localize a device relative to one or several anchor points, we focus on the joint localization of several devices relative to each other. We present a prototype of our system on off-the-shelf smartphones. No user interaction is required, the phones emit acoustic pulses according to a precomputed schedule. Using the elapsed time between two times of arrivals (ETOA) method with sample counting, distances between the devices are estimated. These, possibly incomplete, distances are the input to an efficient and robust multi-dimensional scaling algorithm returning a position for each phone. We evaluated our system in real-world scenarios, achieving error margins of 15 cm in an office environment.