MapPool -Bubbling up an extremely large corpus of maps for AI
MapPool is a dataset of 75 million potential maps and textual captions. It has been derived from CommonPool, a dataset consisting of 12 billion text-image pairs from the Internet. The images have been encoded by a vision transformer and classified into maps and non-maps by a support vector machine. This approach outperforms previous models and yields a validation accuracy of 98.5%. The MapPool dataset may help to train data-intensive architectures in order to establish vision and language foundation models specialized in maps. The analysis of the dataset and the exploration of the embedding space offers a large potential for future work. It is accessible via https://geoai.icaci.org/mappool/
EPFL
2024
REVIEWED
EPFL
| Event name | Event acronym | Event place | Event date |
CartoVis24 | Warsaw, Poland | 2024-09-07 | |