Learning curves for the multi-class teacher-student perceptron

Cornacchia, ElisabettaMignacco, FrancescaVeiga, RodrigoGerbelot, CedricLoureiro, BrunoZdeborova, Lenka2023-03-132023-03-132023-03-132023-03-0110.1088/2632-2153/acb428https://infoscience.epfl.ch/handle/20.500.14299/195881WOS:000931296200001One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with a single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal (BO) estimation and empirical risk minimisation (ERM) were extensively analysed in this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for the BO and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for Rademacher teacher we show that a first-order phase transition arises in the BO performance.Computer Science, Artificial IntelligenceComputer Science, Interdisciplinary ApplicationsMultidisciplinary SciencesComputer ScienceScience & Technology - Other Topicsmulti-class classificationempirical risk minimizationhigh-dimensional statisticsmessage-passing algorithmsstatistical-mechanicsLearning curves for the multi-class teacher-student perceptrontext::journal::journal article::research article