Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Visual Correspondence : From Geometry, Appearance, to Reasoning Across Vision Tasks
 
doctoral thesis

Visual Correspondence : From Geometry, Appearance, to Reasoning Across Vision Tasks

Ren, Yufan  
2025

Modern computer vision systems often underperform in complex real-world scenarios characterized by sparse data, noise, or multi-step reasoning because they cannot maintain robust visual correspondence, defined as the consistent mapping of visual signals across varying levels of abstraction. This thesis hypothesizes that explicitly modeling and improving such correspondences can improve model accuracy and reliability. We validate this hypothesis through four contributions that address distinct aspects of visual correspondence. First, for geometric correspondence in sparse-view 3D reconstruction, our method VolRecon employs a ray transformer with multi-view projection feature fusion, reducing the Chamfer Distance on the DTU dataset by about 30% compared with current state-of-the-art methods. Second, to improve overlap correspondence in low-overlap point-cloud registration, we propose a degradation-aware multi-step refinement framework with generalized one-way attention, achieving an 80.4% registration recall on the 3DLoMatch benchmark, 1.2 percentage points higher than the previous best. Third, we address appearance and semantic correspondence in text-guided image editing by applying wavelet decomposition to diffusion-model latents for targeted frequency sub-band optimization. This approach preserves non-target regions and maintains semantic alignment; user studies report over 80% preference for our edits for detail and color fidelity. Finally, for high-level reasoning correspondence, we introduce VGRP-Bench, a set of 20 parameterizable visual-grid puzzles with reference reasoning chains. VGRP-Bench identifies spatial-reasoning gaps in leading vision--language models such as GPT-4o and provides a reproducible metric for future work. Overall, enforcing visual correspondence---from geometry and sensor alignment through appearance to reasoning---improves accuracy and robustness and supports progress toward reliable open-world visual intelligence.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH10896.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

N/A

Size

57.78 MB

Format

Adobe PDF

Checksum (MD5)

bba84ce433fe98f30482a7291e046511

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés