Localization is one of the key challenges that needs to be considered beforehand to design truly autonomous MAV teams. In this paper, we present a cooperative method to address the localization problem for a team of MAVs, where individuals obtain their position through perceiving a sound-emitting beacon MAV that is flying relative to a reference point in the environment. For this purpose, an on-board audio-based localization system is proposed that allows individuals to measure the relative bearing to the beacon robot and furthermore to localize themselves and the beacon robot simultaneously, without the need for a communication network. Our method is based on coherence testing among signals of a small on-board microphone array, to obtain the relative bearing measurements, and an estimator, to fuse these measurements with sensory information about the motion of the robot throughout time, to estimate robustly the MAV positions. The proposed method is evaluated both in simulation and in real world experiments.