Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ICLR 2024 Guocheng Qian1,2 Jinjie Mai1 Abdullah Hamdi3 Jian Ren2 Aliaksandr Siarohin2 Bing Li1 Hsin-Ying Lee2 Ivan Skorokhodov1,2 Peter Wonka1 Sergey Tulyakov2 Bernard Ghanem1 1King Abdullah University of Science and Technology (KAUST) 2Snap Inc. 3Visual Geometry Group, University of Oxford Arxiv Code Abstract We present "Magic123", a two-stage coarse-to-fine solution for high-quality, textured 3D meshes generation from a single unposed image in the wild using both 2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by both 2D and 3D diffusion priors. We introduce a single tradeoff parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, We employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images.
PixelShift200 Dataset About News Overview Download Paper Team About We employ advanced pixel shift technology to perform a full color sampling of the image. Pixel shift technology takes four samples of the same image at nearly the same time, and physically controls the camera sensor to move one pixel horizontally or vertically at each sampling to capture all color information at each pixel. The pixel shift technology ensures that the sampled images follow the distribution of natural images sampled by the camera, and the full information of the color (R, Gr, Gb, B channel) is completely obtained without any need of interpolation. In this way, the collected RGB images are artifacts-free, which leads to better training results for demosaicing related tasks.