Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. TAROT: Targeted Data Selection via Optimal Transport
 
conference paper

TAROT: Targeted Data Selection via Optimal Transport

Feng, Lan  
•
Nie, Fan
•
Liu, Yuejiang  
Show more
November 30, 2024
The Forty-Second International Conference on Machine Learning

We propose TAROT, a Targeted data selection framework grounded in Optimal Transport theory. Previous targeted data selection methods primarily use influencebased greedy heuristics to enhance domain-specific performance. These methods perform well on limited, unimodal data (i.e., data following a single pattern) but become less effective as target data increases in complexity. Specifically, in multimodal distributions, these heuristics fail to account for multiple inherent patterns, leading to suboptimal data selection. This work identifies two primary factors contributing to this limitation: (i) the disproportionate impact of dominant feature components in high-dimensional influence estimation, and (ii) the restrictive linear additive assumptions inherent in greedy selection strategies. To address these challenges, TAROT incorporates whitened feature distance to mitigate dominant feature bias, offering a more reliable measure of data influence. Building on this, TAROT uses whitened feature distance to quantify and minimize the optimal transport distance between the selected data and target domains. Notably, this minimization also facilitates the estimation of optimal selection ratios. We evaluate TAROT across multiple tasks, including semantic segmentation, motion prediction, and instruction tuning. Results consistently show that TAROT outperforms state-of-the-art methods, highlighting its versatility across various deep learning tasks. Code is available at https://github.com/vita-epfl/TAROT.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

TAROT.pdf

Type

Main Document

Version

http://purl.org/coar/version/c_ab4af688f83e57aa

Access type

openaccess

License Condition

N/A

Size

9.03 MB

Format

Adobe PDF

Checksum (MD5)

41a63aa36aac48c22bdbebf45dd11f48

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés