Wei Xiong

I am a senior research scientist at NVIDIA. I obtained my Ph.D. in Computer Science at the University of Rochester, under the supervision of Prof. Jiebo Luo.

My research interests include visual generative models and multimodal language models, with a special focus on identity preservation of visual concepts. I am also interested in image composition, relighting, shadow synthesis, and representation learning.

I am looking for research interns to work on generative modeling, especially in the generative image/video enhancement domains. 1-2 slots are open. If you are interested, feel free to reach out with your resume.

I am also open to long-term university research collaborations, where I can provide mentorship for research projects, including: high-level and detailed research ideas, co-debugging if necessary, and connections to top researchers in the related fields. If you are interested, feel free to reach out with your resume.

Email  /  Google Scholar  /  Linkedin  /  Github

profile photo

News!



Selected Research Work

I am recently focusing on large generative modeling, especially content creation with identity preservation, including generative image enhancement and and personalized visual concept generation. Representative works are highlighted. See full list in my Google Scholar.

Moving Image 1

GroundingBooth: Grounding Text-to-Image Customization
Zhexiao Xiong, Wei Xiong*, Jing Shi, He Zhang, Yizhi Song, Nathan Jacobs
(* Main Advising)
Arxiv 2024
[PDF] [Project Page]

We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects in the text-to-image customization task.

Moving Image 1

WAS: Dataset and Methods for Artistic Text Segmentation
Xudong Xie, Yuzhe Li, Yang Liu, Zhifei Zhang, Zhaowen Wang, Wei Xiong, Xiang Bai
ECCV 2024
[PDF]

We tackle the task of artistic text segmentation and constructs a real artistic text segmentation dataset.

Moving Image 1

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang
ECCV 2024
[PDF] [Project Page]

we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged.

Moving Image 1

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian L. Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga
CVPR 2024
[PDF] [Project Page]

Our work achieves advanced image composition with a decent identity preservation, automatic object viewpoint/pose adjustment, color and lighting harmonization, and shadow synthesis. All these effects are achieved in a single framework!

Moving Image 1

Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren*, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang
(* Work done while Mengwei was an intern at Adobe)
CVPR 2024
[PDF] [Project Page]

We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image.

Moving Image 1

PHOTOSWAP: Personalized Subject Swapping in Images
Jing Gu, Yilin Wang, Nanxuan Zhao, Tsu-Jui Fu, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Juang, Eric Wang
NeurIPS 2023
[PDF] [Project Page] [Code]

We present Photoswap, a novel approach that enables image editing experience through personalized subject swapping in existing images.

Moving Image 2

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi*, Wei Xiong*, Zhe Lin, HyunJoon Jung (* Equal Contribution)
CVPR 2024
[PDF] [Project Page]

We propose InstantBooth, a novel approach built upon pre-trained text-to-image models that enables fast personalized text-to-image generation without test-time finetuning.

Guidance-driven Visual Synthesis with Generative Models
Wei Xiong
PhD Thesis 2022
[PDF]

My PhD thesis summarizes my main research works on Guided Visual Content Creation during my PhD program. Part I introduces guidance-driven visually pleasing data synthesis. Part II presents guidance-driven synthesis for downstream visual recognition tasks.

Moving Image 4

Unsupervised Low-light Image Enhancement with Decoupled Networks
Wei Xiong, Ding Liu, Xiaohui Shen, Chen Fang, Jiebo Luo
ICPR 2022
[PDF]

We are among the few pioneering works on unsupervised real-world low-light image enhancement. Specifically, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion. To this end, we explicitly decouple this task into two sub-problems: illumination enhancement and noise suppression.

Moving Image 4

Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders
Wei Xiong*, Neil Yeung*, Shubo Wang, Haofu Liao, Jiebo Luo, Liyun Wang (* Equal Contribution)
BME Frontiers 2022
[PDF]

We adopt a temporal variational auto-encoder (T-VAE) model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases.

Image Sentiment Transfer
Tianlang Chen, Wei Xiong, Haitian Zheng, Jiebo Luo
ACM MM 2020
[PDF]

We introduce an important but still unexplored research task Image Sentiment Transfer and propose an effective and flexible framework that performs image sentiment transfer at both the image level and the object level.

Example-Guided Image Synthesis across Arbitrary Scenes using Masked Spatial-Channel Attention and Self-Supervision
Haitian Zheng, Haofu Liao, Lele Chen, Wei Xiong, Tianlang Chen, Jiebo Luo
ECCV 2020
[PDF]

We tackle a challenging exemplar-guided image synthesis task, where the exemplar providing the style guidance is an arbitrary scene image which is semantically different from the given pixel-wise label map.

Fine-grained Image-to-Image Transformation towards Visual Recognition
Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo
CVPR 2020
[PDF] [Supplementary]

We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.

CariGAN: Caricature Generation through Weakly Paired Adversarial Learning. Neural Networks
Wei Xiong*, Wenbin Li*, Haofu Liao, Jing Huo, Yang Gao, Jiebo Luo (* Equal Contribution)
Neural Networks 2020
[PDF]

We frame the caricature generation task as a weakly paired image-to-image translation task, and propose CariGAN model to generate high-fidelity caricature images from human faces with proper exaggerations.

Foreground-aware Image Inpainting
Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo
CVPR 2019
[PDF] [Hole Mask Dataset]

We propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Our model first learns to predict the foreground contour, and then inpaints the missing region using the predicted contour as guidance.

Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks
Wei Xiong, Wenhan Luo, Lin Ma, Wei Liu, Jiebo Luo
CVPR 2018
[Project Page] [PDF] [Code] [Timelapse Video Dataset]

We propose a two-stage GAN model to generate vivid yet content-preserving time-lapse videos from only a single starting frame. To this end, we desentangle the task into content generation and motion enhancement.

Stacked Convolutional Denoising Auto-Encoders for Feature Representation
Bo Du*, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, Dacheng Tao
(* First author was my advisor)
IEEE Trans. Cybernetics 2017
[PDF]

We proposes an unsupervised feature learning model, named the Stacked Convolutional Denoising Auto-Encoders, that can map an image to hierarchical representations without any label information.

Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint
Wei Xiong, Bo Du, Lefei Zhang, Ruimin Hu, Dacheng Tao
ICDM 2016
[PDF]

We propose a group regularization method, Structured Decorrelation Constraint (SDC), that regularizes the activations of the hidden layers in groups to achieve better generalization.


Work Experience

Research Scientist (Full-Time) at Adobe (2022 - now)

  • Location: San Jose, CA.
  • Project: Image synthesis and editing.

Research Intern at Google Research (2021)

  • Location: Mountain View, CA.
  • Project: Example guided image inpainting.

Research Intern at Google Cloud (2020)

  • Location: Sunnyvale, CA.
  • Mentors: Mingyang Ling, Han Zhang, Zizhao Zhang.
  • Project: Augmenting data with generative models for downstream recognition tasks.

Research Intern at ByteDance (2019)

  • Location: Palo Alto, CA.
  • Mentors: Mingyang Ling, Han Zhang, Zizhao Zhang.
  • Project: Real-world low-light image enhancement.

Research Intern at Adobe Research (2018)

  • Location: San Jose, CA.
  • Mentors: Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes.
  • Project: Knowledge-guided Image Inpainting.

Research Intern at Tencent AI Lab (2017)

  • Location: ShenZhen, CA.
  • Mentors: Wenhan Luo, Lin Ma, Wei Liu.
  • Project: Time-lapse video generation.

Mentoring

I am fortunate to have mentored and collaborated with a few talented researchers:

  • Zhexiao Xiong (2024): Grounded text-to-image customization.
  • Mengwei Ren (2023): Portrait relighting using diffusion models.
  • Yizhi Song (2023): Customized image composition.
  • Tianyu Wang (2023): Shadow detection and synthesis.
  • Jing Gu (2023): Subject-driven image editing.
  • Neil Yeung (2022): Tumor growth prediction.
  • Yutong He (2020): Image transformation for visual recognition.

Teaching

Teaching Assistant

Spring 2019 - CSC 240/440 Data Mining

  • Instructor: Ted Pawlicki
  • Office Hours: Monday and Wednesday, 2:30 - 4:00 PM.

Fall 2018 - CSC 240/440 Data Mining

  • Instructor: Ted Pawlicki
  • Office Hours: Monday and Wednesday, 2:00 -3:00 PM.

Spring 2018 - CSC 240/440 Data Mining

  • Instructor: Anand Ajay

Academic Service

Conference Reviewer

  • CVPR: IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • ICCV: IEEE/CVF International Conference on Computer Vision
  • ECCV: European Conference on Computer Vision
  • NeurIPS: Conference on Neural Information Processing Systems
  • ICLR: International Conference on Learning Representations
  • AAAI: AAAI Conference on Artificial Intelligence
  • ICML: International Conference on Machine Learning
  • MICCAI: International Conference On Medical Image Computing & Computer Assisted Intervention
  • BMVC: The British Machine Vision Association
  • WACV: Winter Conference on Applications of Computer Vision
  • Journal Reviewer

  • TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
  • TMLR: The Transactions on Machine Learning Research
  • TIP: IEEE Transactions on Image Processing
  • TNNLS: IEEE Transactions on Neural Networks and Learning Systems
  • TMM: IEEE Transactions on Multimedia
  • TCSVT: IEEE Transcations on Circuits and Systems for Video Technology
  • CVIU: Computer Vision and Image Understanding
  • SPIC: Signal Processing: Image Communication

  • This webpage is based on the source code from Jon Barron. Many thanks.