Guocheng Gordon Qian
Home
News
Publications
Experience
Multimodal
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
This paper presents ThinkDiff, a novel alignment paradigm that enables multimodal in-context understanding and reasoning capabilities in text-to-image diffusion models by integrating the capabilities of vision-language models (VLMs). Directly …
Cite
×