About me

I am a Ph.D. student from the Department of Computer Science and Technology at Tsinghua University. I am supervised by Prof. Zhiyuan Liu and affiliated with the THUNLP Lab. My research interests lie within the intersection of Natural Language Processing and Information Retrieval.

I received my bachelor degree from Tsinghua University in Jun. 2021.

Projects

I’m the project leader and developer of MiniCPM-Embedding, MiniCPM-Reranker, MiniCPM-Embedding-Light, MiniCPM-Reranker-Light, and OpenMatch-v2.

Selected Preprints and Publications

Please refer to my Google Scholar for other preprints/publications.

* indicates equal contribution.

2025

Shi Yu*, Chaoyue Tang*, Bokai Xu*, Junbo Cui*, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun. VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents. The Thirteenth International Conference on Learning Representations (ICLR 2025). pdf GitHub slides

2024

Shi Yu*, Chenghao Fan*, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu. Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). pdf GitHub

2023

Shi Yu, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu. OpenMatch-v2: An All-in-one Multi-Modality PLM-based Information Retrieval Toolkit. The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR 2023) pdf GitHub

2022

Xiaomeng Hu*, Shi Yu*, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022). pdf GitHub

2021

Shi Yu*, Zhenghao Liu*, Chenyan Xiong, Tao Feng, Zhiyuan Liu. Few-Shot Conversational Dense Retrieval. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). pdf GitHub

2020

Chenyan Xiong*, Zhenghao Liu*, Si Sun*, Zhuyun Dai*, Kaitao Zhang*, Shi Yu*, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett. CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search. pdf

Shi Yu*, Jiahua Liu*, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, Zhiyuan Liu. Few-Shot Generative Conversational Query Rewriting. The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). Best Short Paper Award. pdf GitHub