Generative Models

LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas

Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present LayerComposer, an interactive …

ComposeMe: Attribute-Specific Image Prompts for Controllable Human Image Generation

Generating high-fidelity images of humans with fine-grained control over attributes such as hairstyle and clothing remains a core challenge in personalized text-to-image synthesis. While prior methods emphasize identity preservation from a reference …

Wonderland: Navigating 3D Scenes from a Single Image

This paper addresses a challenging question: How can we efficiently create high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods face several constraints, such as requiring multi-view data, time-consuming per-scene …

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It …

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

In this work, we analyze camera motion from a first principles perspective, uncovering insights that enable precise 3D camera manipulation without compromising synthesis quality. First, we determine that motion induced by camera movements in videos …

AToM: Amortized Text-to-Mesh using 2D Diffusion

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly …

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

We present “Magic123”, a two-stage coarse-to-fine solution for high-quality, tex-tured 3D meshes generation from a single unposed image in the wild using both 2D and 3D priors. In the first stage, we optimize a coarse neural radiance field and focus …

PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks

Upsampling sparse, noisy, and non-uniform point clouds is a challenging task. In this paper, we propose 3 novel point upsampling modules: Multi-branch GCN, Clone GCN, and NodeShuffle. Our modules use Graph Convolutional Networks (GCNs) to better …