How to Benchmark Objective Quality Metrics from Paired Comparison Data?
The procedures commonly used to evaluate the performance of objective quality metrics rely on ground truth mean opinion scores and associated confidence intervals, which are usually obtained via direct scaling methods. However, indirect scaling methods, such as the paired comparison method, can also be used to collect ground truth preference scores. Indirect scaling methods have a higher discriminatory power and are gaining popularity, for example in crowdsourcing evaluations. In this paper, we present how the classification errors, an existing analysis tool, can also be used with subjective preference scores. Additionally, we propose a new analysis tool based on the receiver operating characteristic analysis. This tool can be used to further assess the performance of objective metrics based on ground truth preference scores. We provide a MATLAB script with an implementation of the proposed tools and we show one example of application of the proposed tools.
Record created on 2016-04-30, modified on 2016-08-09