Towards Visually-Plausible and Controllable 3D Representations

Wang, Dongqing

doi:10.5075/epfl-thesis-11457

doctoral thesis

Towards Visually-Plausible and Controllable 3D Representations

2025

Recreating real-world objects and scenes in a visually plausible manner and making them editable with intuitive instructions is central to Virtual and Augmented Reality (VR/AR) applications. This research lies at the intersection of neural rendering, 3D content generation, and scene reconstruction. Radiance fields, built on volumetric primitives, effectively model real-world scenes for photorealistic novel view synthesis and are emerging as key tools for 3D representation. However, current methods face limitations in realism, editability, and dynamic modeling.

First, most radiance fields use emissive volumes rendered via rasterization, which restricts accurate light transport modeling and degrades plausibility for reflective and refractive materials. Second, volumetric primitives lack embedded semantic understanding, limiting user editability via text or image prompts. Third, extending radiance fields to 4D for dynamic scenes presents challenges in maintaining spatiotemporal consistency. This thesis addresses these gaps by proposing methods that improve expressiveness and controllability in 3D representations, focusing on transparency modeling, object removal and inpainting, and volumetric stylization.

To better model refractive materials, we introduce an end-to-end pipeline combining implicit Signed Distance Functions (SDFs) with a refraction-aware Ray Bending Network. This allows reconstruction and relighting of transparent objects with complex geometry and unknown indices of refraction. Our method separates geometry from appearance and compensates for the lack of sharp refractive details in volumetric fields, substantially improving novel-view synthesis and relighting quality.

Beyond object reconstruction, the ability to modify and manipulate existing representations is crucial for practical applications in content creation. To this end, we present the first text-guided object-inpainting pipeline for 360-degree Neural Radiance Field based scenes. The proposed pipeline achieves accurate semantic selection through depth-space warping, ensuring multiview-consistent segmentations. It refines inpainted regions using perceptual priors and 3D diffusion-based geometric constraints. This enables seamless object removal while preserving scene coherence, thereby significantly enhancing NeRF's adaptability for real-world scene editing.

We further explore the controllability of dynamic radiance fields by introducing a volumetric variant based on neural cellular automata, termed VNCA. This architecture generates spatiotemporally consistent appearances with naturally emerging motion, driven by input images. Unlike prior approaches, our VNCA model maintains both temporal and multiview consistency by integrating the emergence property of NCAs within an Eulerian framework and supervising motion with optical flow. Beyond smoke simulation, VNCA supports stylization of solid textures on meshes, demonstrating its versatility in dynamic texture synthesis.

By advancing techniques for reconstruction, editing, and stylization, this thesis contributes to the development of more controllable and visually plausible 3D representations. Our work paves the way for enhanced neural-rendering applications, from photorealistic content creation to interactive scene manipulation, hence making 3D content modeling both higher quality and more artist-friendly.

Name

EPFL_TH11457.pdf

Type

Main Document

Version

Not Applicable (or Unknown)

Access type

openaccess

License Condition

N/A

Size

35.75 MB

Format

Adobe PDF

Checksum (MD5)

3e8eef9aaef9335945bc1c0b4c27e85e