Digital libraries are libraries in which collections are stored in a digital format (the metadata at least). Digital libraries are now being made publicly available. However, building good user interfaces to query heterogeneous libraries requires to have a good knowledge on the type of available information (e.g. which attributes are useful for filtering). In this project, we harvest (using the Z39.50 and OAI-PMH protocol) and analyze (in terms of useful attributes for querying) four important digital libraries: Nebis (five million items), Infoscience (sixty thousand items), CiteSeer (seven hundred thousand items) and The European Library (one and a half million items).