Wei Xiong

I am a senior research scientist at NVIDIA Research. Before that, I spent two wonderful years at Adobe. I obtained my Ph.D. in Computer Science at the University of Rochester, under the supervision of Prof. Jiebo Luo.

My research interests include visual generative models and multimodal language models, with a special focus on identity preservation of visual concepts. I am also interested in image composition, relighting, shadow synthesis, and representation learning.

I am looking for research interns to work on generative modeling, especially in the generative image/video enhancement domains. 1-2 slots are open. If you are interested, feel free to reach out with your resume.

I am also open to long-term university research collaborations, where I can provide mentorship for research projects, including: high-level and detailed research ideas, co-debugging if necessary, and connections to top researchers in the related fields. If you are interested, feel free to reach out with your resume.

Email / Google Scholar / Linkedin / Github

News!

Sep. 2024 -- Check our new work "GroundingBooth: Grounding Text-to-Image Customization", an advanced image Customization model.
Jul. 2024 -- 2 papers get accepted to ECCV 2024 . Many thanks to the collaborators! Check the details below.
Feb. 2024 -- 3 papers get accepted to CVPR 2024 . Many thanks to the collaborators! Check the details below.
Feb. 2024 -- Our work "IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation" is accepted to CVPR 2024 .
Feb. 2024 -- Our work "Relightful Harmonization: Lighting-aware Portrait Background Replacement" is accepted to CVPR 2024 .
Feb. 2024 -- Our work "InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning" is accepted to CVPR 2024 . [Project Page]
Sep. 2023 -- One paper "Photoswap: Personalized subject swapping in images" is accepted to NeurIPS 2023 . [Project Page]
Jun. 2022 -- I obtain my Ph.D from the University of Rochester. See my PhD thesis. Many thanks to my advisor Jiebo Luo, my mentors and collaborators, who have giving me this unforgettable experience.
Jun. 2022 -- Our paper "Unsupervised Low-light Image Enhancement with Decoupled Networks" is accepted to ICPR 2022 .
Mar. 2022 -- Our paper "Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders" is accepted to BME Frontiers .

Selected Research Work

I am recently focusing on large generative modeling and personalized content creation. Representative works are highlighted. See full list in my Google Scholar.

	GroundingBooth: Grounding Text-to-Image Customization Zhexiao Xiong, Wei Xiong, Jing Shi, He Zhang, Yizhi Song, Nathan Jacobs Arxiv 2024 [PDF] [Project Page] We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task.
	WAS: Dataset and Methods for Artistic Text Segmentation Xudong Xie, Yuzhe Li, Yang Liu, Zhifei Zhang, Zhaowen Wang, Wei Xiong, Xiang Bai ECCV 2024 [PDF] We tackle the task of artistic text segmentation and constructs a real artistic text segmentation dataset.
	SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang ECCV 2024 [PDF] [Project Page] we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged.
	IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian L. Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga CVPR 2024 [PDF] [Project Page] Our work achieves advanced image composition with a decent identity preservation, automatic object viewpoint/pose adjustment, color and lighting harmonization, and shadow synthesis. All these effects are achieved in a single framework!
	Relightful Harmonization: Lighting-aware Portrait Background Replacement Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang ( Work done while Mengwei was an intern at Adobe) CVPR 2024 [PDF] [Project Page] We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image.
	PHOTOSWAP: Personalized Subject Swapping in Images Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Juang, Eric Wang NeurIPS 2023 [PDF] [Project Page] [Code] We present Photoswap, a novel approach that enables image editing experience through personalized subject swapping in existing images.
	InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning Jing Shi, Wei Xiong*, Zhe Lin, HyunJoon Jung ( Equal Contribution) CVPR 2024 [PDF] [Project Page] We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables fast personalized text-to-image generation without test-time finetuning.
	Guidance-driven Visual Synthesis with Generative Models Wei Xiong PhD Thesis 2022 [PDF] My PhD thesis summarizes my main research works on Guided Visual Content Creation during my PhD program. Part I introduces guidance-driven visually pleasing data synthesis. Part II presents guidance-driven synthesis for downstream visual recognition tasks.
	Unsupervised Low-light Image Enhancement with Decoupled Networks Wei Xiong, Ding Liu, Xiaohui Shen, Chen Fang, Jiebo Luo ICPR 2022 [PDF] We are among the few pioneering works on unsupervised real-world low-light image enhancement. Specifically, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion. To this end, we explicitly decouple this task into two sub-problems: illumination enhancement and noise suppression.
	Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders Wei Xiong, Neil Yeung, Shubo Wang, Haofu Liao, Jiebo Luo, Liyun Wang (* Equal Contribution) BME Frontiers 2022 [PDF] We adopt a temporal variational auto-encoder (T-VAE) model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases.
	Image Sentiment Transfer Tianlang Chen, Wei Xiong, Haitian Zheng, Jiebo Luo ACM MM 2020 [PDF] We introduce an important but still unexplored research task Image Sentiment Transfer and propose an effective and flexible framework that performs image sentiment transfer at both the image level and the object level.
	Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision Haitian Zheng, Haofu Liao, Lele Chen, Wei Xiong, Tianlang Chen, Jiebo Luo ECCV 2020 [PDF] We tackle a challenging exemplar-guided image synthesis task, where the exemplar providing the style guidance is an arbitrary scene image which is semantically different from the given pixel-wise label map.
	Fine-grained Image-to-Image Transformation towards Visual Recognition Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo CVPR 2020 [PDF] [Supplementary] We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.
	CariGAN: Caricature Generation through Weakly Paired Adversarial Learning. Neural Networks Wei Xiong, Wenbin Li, Haofu Liao, Jing Huo, Yang Gao, Jiebo Luo (* Equal Contribution) Neural Networks 2020 [PDF] We frame the caricature generation task as a weakly paired image-to-image translation task, and propose CariGAN model to generate high-fidelity caricature images from human faces with proper exaggerations.
	Foreground-aware Image Inpainting Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo CVPR 2019 [PDF] [Hole Mask Dataset] We propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Our model first learns to predict the foreground contour, and then inpaints the missing region using the predicted contour as guidance.
	Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks Wei Xiong, Wenhan Luo, Lin Ma, Wei Liu, Jiebo Luo CVPR 2018 [Project Page] [PDF] [Code] [Timelapse Video Dataset] We propose a two-stage GAN model to generate vivid yet content-preserving time-lapse videos from only a single starting frame. To this end, we desentangle the task into content generation and motion enhancement.
	Stacked Convolutional Denoising Auto-Encoders for Feature Representation Bo Du, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, Dacheng Tao ( First author was my advisor) IEEE Trans. Cybernetics 2017 [PDF] We proposes an unsupervised feature learning model, named the Stacked Convolutional Denoising Auto-Encoders, that can map an image to hierarchical representations without any label information.
	Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint Wei Xiong, Bo Du, Lefei Zhang, Ruimin Hu, Dacheng Tao ICDM 2016 [PDF] We propose a group regularization method, Structured Decorrelation Constraint (SDC), that regularizes the activations of the hidden layers in groups to achieve better generalization.

Work Experience

Research Scientist (Full-Time) at Adobe (2022 - now)

Location: San Jose, CA.
Project: Image synthesis and editing.

Research Intern at Google Research (2021)

Location: Mountain View, CA.
Project: Example guided image inpainting.

Research Intern at Google Cloud (2020)

Location: Sunnyvale, CA.
Mentors: Mingyang Ling, Han Zhang, Zizhao Zhang.
Project: Augmenting data with generative models for downstream recognition tasks.

Research Intern at ByteDance (2019)

Location: Palo Alto, CA.
Mentors: Mingyang Ling, Han Zhang, Zizhao Zhang.
Project: Real-world low-light image enhancement.

Research Intern at Adobe Research (2018)

Location: San Jose, CA.
Mentors: Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes.
Project: Knowledge-guided Image Inpainting.

Research Intern at Tencent AI Lab (2017)

Location: ShenZhen, CA.
Mentors: Wenhan Luo, Lin Ma, Wei Liu.
Project: Time-lapse video generation.

Mentoring

I am fortunate to have mentored and collaborated with a few talented researchers:

Zhexiao Xiong (2024): Grounded text-to-image customization.
Mengwei Ren (2023): Portrait relighting using diffusion models.
Yizhi Song (2023): Customized image composition.
Tianyu Wang (2023): Shadow detection and synthesis.
Jing Gu (2023): Subject-driven image editing.
Neil Yeung (2022): Tumor growth prediction.
Yutong He (2020): Image transformation for visual recognition.

Teaching

Teaching Assistant

Spring 2019 - CSC 240/440 Data Mining

Instructor: Ted Pawlicki
Office Hours: Monday and Wednesday, 2:30 - 4:00 PM.

Fall 2018 - CSC 240/440 Data Mining

Instructor: Ted Pawlicki
Office Hours: Monday and Wednesday, 2:00 -3:00 PM.

Spring 2018 - CSC 240/440 Data Mining

Instructor: Anand Ajay

Academic Service

Conference Reviewer

CVPR: IEEE/CVF Conference on Computer Vision and Pattern Recognition

ICCV: IEEE/CVF International Conference on Computer Vision

ECCV: European Conference on Computer Vision

NeurIPS: Conference on Neural Information Processing Systems

ICLR: International Conference on Learning Representations

AAAI: AAAI Conference on Artificial Intelligence

ICML: International Conference on Machine Learning

MICCAI: International Conference On Medical Image Computing & Computer Assisted Intervention

BMVC: The British Machine Vision Association

WACV: Winter Conference on Applications of Computer Vision

Journal Reviewer

TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence

TMLR: The Transactions on Machine Learning Research

TIP: IEEE Transactions on Image Processing

TNNLS: IEEE Transactions on Neural Networks and Learning Systems

TMM: IEEE Transactions on Multimedia

TCSVT: IEEE Transcations on Circuits and Systems for Video Technology

CVIU: Computer Vision and Image Understanding

SPIC: Signal Processing: Image Communication

This webpage is based on the source code from Jon Barron. Many thanks.