6Img-to-3D: Few-Image Large-Scale Outdoor Novel View Synthesis

Gieruc, ThéoKästingschäfers, MariusBernhard, SebastianSalzmann, Mathieu2025-08-202025-08-202025-08-192025-06-2210.1109/iv64158.2025.11097387https://infoscience.epfl.ch/handle/20.500.14299/253265Current 3D reconstruction techniques struggle to infer unbounded scenes from a few images faithfully. Most existing methods have high computational demands, require detailed pose information, and cannot reconstruct occluded regions reliably. We introduce 6Img-to-3D, a novel transformer-based encoder-renderer method for single-shot image-to-3D reconstruction. Our method outputs a 3D-consistent param-eterized triplane from only six outward-facing input images for large-scale, unbounded outdoor driving scenarios. We take a step towards resolving existing shortcomings by combining contracted custom cross- and self-attention mechanisms for tri-plane parameterization, differentiable volume rendering, scene contraction, and image feature projection. We showcase on synthetic data that six surround-view vehicle images from a single timestamp are enough to reconstruct 360° scenes during inference time, taking 395 ms. Our method allows, for example, rendering third-person images and birds-eye views. Code, and more results are available at https: / /6Img-to-3D. GitHub. io/.en6Img-to-3D: Few-Image Large-Scale Outdoor Novel View Synthesistext::conference output::conference proceedings::conference paper