Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Unlabeled Principal Component Analysis and Matrix Completion
 
Loading...
Thumbnail Image
research article

Unlabeled Principal Component Analysis and Matrix Completion

Yao, Yunzhen  
•
Peng, Liangzu
•
Tsakiris, Manolis C.
January 1, 2024
Journal Of Machine Learning Research

We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem since we prove that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Allowing for missing entries on top of permutations in UPCA leads to the problem of unlabeled matrix completion, for which we derive theory and algorithms of similar flavor. Experiments on synthetic data, face images, educational and medical records reveal the potential of our algorithms for applications such as data privatization and record linkage.

  • Details
  • Metrics
Type
research article
Web of Science ID

WOS:001204593000001

Author(s)
Yao, Yunzhen  
•
Peng, Liangzu
•
Tsakiris, Manolis C.
Date Issued

2024-01-01

Publisher

Microtome Publ

Published in
Journal Of Machine Learning Research
Volume

25

Start page

1

End page

38

Subjects

Technology

•

Robust Principal Component Analysis

•

Matrix Completion

•

Record Linkage

•

Data Re-Identification

•

Algebraic Geometry

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LINX  
FunderGrant Number

National Key R&D Program of China

2023YFA1009402

CAS Project for Young Scientists in Basic Research

YSBR- 034

Available on Infoscience
May 1, 2024
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/207711
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés