Gordon Qian is a Senior Research Scientist on the Snap Research Creative Vision team, where he focuses on end-to-end research and development for video, image, and 3D media generation. He earned his Ph.D. in Computer Science from KAUST advised by Prof. Bernard Ghanem.
He has authored 20+ papers in top-tier conferences and journals, with over 3,500 citations and a h-index of 19. His work has been shipped to three Snapchat products serving 900 million monthly active users, with six patents filed.
His representative work includes PointNeXt (NeurIPS, 1.2K+ citations, 1K+ GitHub stars), Magic123 (ICLR, 450+ citations, 1.6K+ GitHub stars), and Omni-ID (CVPR, integrated into products, 1 patent). He also serves as an Area Chair for ICLR 2026.
If you are interested in working on image and video generative models with me, please reach out at gordonqian2017 [at] gmail.com

Ph.D. in CS
KAUST , 2019 - 2023

B.Eng in ME
XJTU , 2014 - 2018
Selected projects below; * / † denote equal contribution / corresponding author. See Full publication list .
Canvas-to-Image introduces a unified framework that consolidates heterogeneous controls (subject references, bounding boxes, pose skeletons) into a single canvas interface for high-fidelity compositional image generation.
Diffusion-DRF uses a frozen VLM critic to provide free, rich, and differentiable feedback for stable video diffusion fine-tuning.
EasyV2V: A high-quality instruction-based video editing framework that enables intuitive video manipulation through natural language instructions.
Omni-Attribute can isolate a specific attribute, whether it is an abstract concept or not, from any image and merge those selected attributes from multiple images into a coherent generation.
We prevent shortcuts in adapter training by explicitly providing the shortcuts during training, forcing the model to learn more robust representations.
ComposeMe is a human-centric generative model that enables disentangled control over multiple visual attributes — such as identity, hair, and garment — across multiple subjects, while also supporting text-based control.
ThinkDiff enables multimodal in-context reasoning in diffusion models by aligning vision-language models to LLM decoders, transferring reasoning capabilities without requiring complex reasoning-based datasets.
Omni-ID is a novel facial representation tailored for generative tasks, encoding identity features from unstructured images into a fixed-size representation that captures diverse expressions and poses.
WonderLand is a video-latent based approach for single-image 3D reconstruction in large-scale scenes.
AC3D studies when and how you should condition camera signals into a video diffusion model for a better camera control and a higher video quality.
Magic123 proposes a hybrid score distillation algorithm and a coarse-to-fine image-to-3D pipeline that produces high-quality high-resolution 3D content from a single unposed image.
Pix4Point shows that image pretraining significantly improves point cloud understanding.
PointNeXt boosts the performance of PointNet++ to the state-of-the-art level with improved training and scaling strategies.
ASSANet makes PointNet++ faster and more accurate.

