In this paper we present an empirical study of a workload gathered by crawling the eDonkey network - a dominant peer-to-peer file sharing system - for over 50 days. We first confirm the presence of some known features, in particular the prevalence of free-riding and the Zipflike distribution of file popularity. We also analyze the evolution of document popularity. We then provide an in-depth analysis of several clustering properties of such workloads. We measure the geographical clustering of peers offering a given file. We find that most files are offered mostly by peers of a single country, although popular files don’t have such a clear home country. We then analyze the overlap between contents offered by different peers. We find that peer contents are highly clustered according to several metrics of interest. We propose to leverage this property by allowing peers to search for content without server support, by querying suitably identified semantic neighbours. We find via trace-driven simulations that this approach is generally effective, and is even more effective for rare files. If we further allow peers to query both their semantic neighbours, and in turn their neighbours’ neighbours, we attain hit rates as high as over 55% for neighbour lists of size 20.