Files

Abstract

Diffusion models generating images conditionally on text, such as Dall-E 2 [51] and Stable Diffusion[53], have recently made a splash far beyond the computer vision com- munity. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically- motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. This approach improves geometric consistency and yields greater fidelity than current methods relying on unstruc- tured, global latent codes. Additionally, we show how to ap- ply recent continuous-time diffusion schemes [59, 21]. Our method performs on par or above the state of art on con- ditional and unconditional experiments on synthetic data, while being faster, lighter, and delivering tractable likeli- hoods. We show it can also scale to diverse indoors scenes.

Details

PDF