A methodology towards person clustering in meeting databases is presented in this report. Such goal is generic to a number of problem in computer vision and more specifically in content-based video indexing and retrieval. First, the audio-stream was considered alone, leading to the speaker clustering problem. An already existing algorithm has been used to this end. Then, the video stream was analysed, leading to a face clustering algorithm build from probability density distribution of face similarity distances. Finally, both modalities are considered to combine voice clustering and face clustering and achieve person clustering. This method has been tested on the IDIAP meeting database, and many results are given to prove the efficiency of the method and to show that it can be applied to other databases.