MapPool -Bubbling up an extremely large corpus of maps for AI
MapPool is a dataset of 75 million potential maps and textual captions. It has been derived from CommonPool, a dataset consisting of 12 billion text-image pairs from the Internet. The images have been encoded by a vision transformer and classified into maps and non-maps by a support vector machine. This approach outperforms previous models and yields a validation accuracy of 98.5%. The MapPool dataset may help to train data-intensive architectures in order to establish vision and language foundation models specialized in maps. The analysis of the dataset and the exploration of the embedding space offers a large potential for future work. It is accessible via https://geoai.icaci.org/mappool/
MapPool.pdf
main document
openaccess
CC BY
234.88 KB
Adobe PDF
f1360f7d7d36eaa07407ab46eb22087a