About me
I am a Ph.D. student from the Department of Computer Science and Technology at Tsinghua University. I am supervised by Prof. Zhiyuan Liu and affiliated with the THUNLP Lab. My research interests lie within the intersection of Natural Language Processing and Information Retrieval.
I received my bachelor degree from Tsinghua University in Jun. 2021.
Projects
I’m the project leader and developer of MiniCPM-Embedding, MiniCPM-Reranker, MiniCPM-Embedding-Light, MiniCPM-Reranker-Light, and OpenMatch-v2.
Selected Preprints and Publications
Please refer to my Google Scholar for other preprints/publications.
* indicates equal contribution.
2025
Shi Yu*, Chaoyue Tang*, Bokai Xu*, Junbo Cui*, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun. VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents. The Thirteenth International Conference on Learning Representations (ICLR 2025).
2024
Shi Yu*, Chenghao Fan*, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu. Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
2023
Shi Yu, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu. OpenMatch-v2: An All-in-one Multi-Modality PLM-based Information Retrieval Toolkit. The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. (SIGIR 2023)
2022
Xiaomeng Hu*, Shi Yu*, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu, Ge Yu. P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning. The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022).
2021
Shi Yu*, Zhenghao Liu*, Chenyan Xiong, Tao Feng, Zhiyuan Liu. Few-Shot Conversational Dense Retrieval. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021).
2020
Chenyan Xiong*, Zhenghao Liu*, Si Sun*, Zhuyun Dai*, Kaitao Zhang*, Shi Yu*, Zhiyuan Liu, Hoifung Poon, Jianfeng Gao, Paul Bennett. CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search.
Shi Yu*, Jiahua Liu*, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, Zhiyuan Liu. Few-Shot Generative Conversational Query Rewriting. The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020). Best Short Paper Award.