Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

Wang, Zifu; Berman, Maxim; Rannen-Triki, Amal; Torr, Philip H. S.; Tuia, Devis; Tuytelaars, Tinne; Van Gool, Luc; Yu, Jiaqian; Blaschko, Matthew B.

doi:10.48550/arxiv.2310.19252

conference paper not in proceedings

Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

Wang, Zifu

•

Berman, Maxim

•

Rannen-Triki, Amal

2023

37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}. NeurIPS 2023

Name

162_revisiting_evaluation_metrics_.pdf

Type

Publisher

Version

Published version

Access type

openaccess

License Condition

CC BY

Size

34.06 MB

Format

Adobe PDF

Checksum (MD5)

495514369cc3688de52c30b83ab6de25