Gordon Qian is a Senior Research Scientist on the Snap Research Creative Vision team, where he focuses on end-to-end research and development for video, image, and 3D media generation. He earned his Ph.D. in Computer Science from KAUST, where he was advised by Prof. Bernard Ghanem.
He has authored 20+ papers in top-tier conferences and journals, with over 3,500 citations and a h-index of 19. His work has been shipped to three Snapchat products serving 900 million monthly active users, with six patents filed.
His representative work includes PointNeXt (NeurIPS, 1.2K+ citations, 1K+ GitHub stars), Magic123 (ICLR, 450+ citations, 1.6K+ GitHub stars), and Omni-ID (CVPR, integrated into products, 1 patent). He also serves as an Area Chair for ICLR 2026.
If you are interested in working on image and video generative models with me, please reach out at guocheng.qian [at] outlook.com

Ph.D. in CS
KAUST , 2019 - 2023

B.Eng in ME
XJTU , 2014 - 2018
Selected projects below; * / † denote equal contribution / corresponding author. See Full publication list .
Diffusion-DRF uses a frozen VLM critic to provide free, rich, and differentiable feedback for stable video diffusion fine-tuning.
EasyV2V: A high-quality instruction-based video editing framework that enables intuitive video manipulation through natural language instructions.
Omni-Attribute can isolate a specific attribute, whether it is an abstract concept or not, from any image and merge those selected attributes from multiple images into a coherent generation.
Canvas-to-Image introduces a unified framework that consolidates heterogeneous controls (subject references, bounding boxes, pose skeletons) into a single canvas interface for high-fidelity compositional image generation.
We prevent shortcuts in adapter training by explicitly providing the shortcuts during training, forcing the model to learn more robust representations.
ComposeMe is a human-centric generative model that enables disentangled control over multiple visual attributes — such as identity, hair, and garment — across multiple subjects, while also supporting text-based control.
ThinkDiff enables multimodal in-context reasoning in diffusion models by aligning vision-language models to LLM decoders, transferring reasoning capabilities without requiring complex reasoning-based datasets.
Omni-ID is a novel facial representation tailored for generative tasks, encoding identity features from unstructured images into a fixed-size representation that captures diverse expressions and poses.
WonderLand is a video-latent based approach for single-image 3D reconstruction in large-scale scenes.
AC3D studies when and how you should condition camera signals into a video diffusion model for a better camera control and a higher video quality.
Magic123 proposes a hybrid score distillation algorithm and a coarse-to-fine image-to-3D pipeline that produces high-quality high-resolution 3D content from a single unposed image.
Pix4Point shows that image pretraining significantly improves point cloud understanding.
PointNeXt boosts the performance of PointNet++ to the state-of-the-art level with improved training and scaling strategies.
ASSANet makes PointNet++ faster and more accurate.

