This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses).
We show the capacity of MVDiffusion on two challanging multi-view image generation tasks: 1) panorama generation and 2) multi-view depth-to-image generation.
Examples results of panorama generation from text prompts using MVDiffusion. Check out our online demo to generate panorama images using your own descriptions.
|
|
|
|
|
|
|
|
|
|
|
|
Given a sequence of depth maps from a raw mesh, MVDiffusion can generate a sequence of RGB images while preserving the underlying geometry and maintaining multi-view consistency. The generation results can be further exported to a textured mesh. Check out more results in the gallery page.
@article{Tang2023mvdiffusion,
author = {Tang, Shitao and Zhang, Fuyang and Chen, Jiacheng and Wang, Peng and Furukawa, Yasutaka},
title = {MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion},
journal = {arXiv},
year = {2023},
}
This research is partially supported by NSERC Discovery Grants with Accelerator Supplements and DND/NSERC Discovery Grant Supplement, NSERC Alliance Grants, and John R. Evans Leaders Fund (JELF). We thank the Digital Research Alliance of Canada and BC DRI Group for providing computational resources.