Xiaoyu Shi

🌐">

Xiaoyu Shi 石晓宇

I am a 4th-year Ph.D student at MMLab of the Chinese University of Hong Kong, advised by Prof. Hongsheng Li. Prior to this, I obtained Bachelor of Engineer & Bachelor of Science degrees from Zhejiang University and Simon Fraser University, respectively. I'm interested in computer vision and machine learning, with a special focus on video generation and correspondence learning.

Email / Google Scholar / Github

News

Three papers accepted to ECCV 2024.
One paper accepted to SIGGRAPH 2024.
Our VideoFlow, FlowFormer++, and FlowFormer occupy the TOP 3 places in the Sintel Optical Flow benchmark among published papers.
Two papers accepted to NeurIPS 2023.
One paper accepted to ICCV 2023.
One paper accepted to IROS 2023.
Two papers accepted to CVPR 2023.
One paper accepted to ECCV 2022.

Publication

Representative papers are highlighted.

	Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Xiaoyu Shi, Zhaoyang Huang, Fu-Yun, Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Kachun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li Siggraph, 2024 Paper / Project page We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling.
	Context-TAP: Tracking Any Point Demands Spatial Context Features Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yitong Dong, Yijin Li, Hongsheng Li NeurIPS, 2023 Project page / Paper We set new SOTA on the task of Tracking Any Point (TAP) by introducing rich spatial context features.
	A Unified Conditional Framework for Diffusion-based Image Restoration Yi Zhang, Xiaoyu Shi, Dasong Li, Xiaogang Wang, Hongsheng Li NeurIPS, 2023 Project page / Paper / Code A unified conditional framework based on diffusion models for image restoration.
	VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Kachun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li ICCV, 2023 Paper / Code First method to achieve sub-pixel accuracy on the Sintel benchmark. 19.2% error reduction from the best published result on the KITTI-2015 benchmark.
	BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation Yijin Li, Zhaoyang Huang, Shuo Chen, Xiaoyu Shi, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang IROS, 2023 Paper We build a benchmark BlinkFlow for training and evaluating event-based optical flow estimation method.
	KBNet: Kernel Basis Network for Image Restoration Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li Arxiv, 2023 Paper / Code A general-purpose backbone for image restoration tasks.
	FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Kachun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li CVPR, 2023 Paper / Code Ranks 1st on Sintel Optical Flow benchmark on Mar. 1st, 2023.
	A Simple Baseline for Video Restoration with Spatial-temporal Shift Dasong Li, Xiaoyu Shi, Yi Zhang, Kachun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li CVPR, 2023 Project Page / Paper / Code Our approach is based on grouped spatial-temporal shift, which is a lightweight technique that can implicitly capture inter-frame correspondences for multi-frame aggregation.
	FlowFormer: A Transformer Architecture for Optical Flow Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Kachun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li ECCV, 2022 Project Page / Paper / Code Ranks 1st on Sintel Optical Flow benchmark on Mar. 17th, 2022.
	Decoupled spatial-temporal transformer for video inpainting Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li Arxiv, 2021 Paper / Code We propose a novel decoupled spatial-temporal Transformer (DSTT) framework for video inpainting to improve video inpainting quality with higher running efficiency.
	FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li ICCV, 2021 Paper / Code A Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations.