The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user’s information need. In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Recently, a large number of works, which are dedicated to the application of PTMs in IR, have been introduced to promote the retrieval performance. Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR. To be specific, we present an overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.
Article navigation
18 August 2022
Research Article|
August 18 2022
Pre-training Methods in Information Retrieval
Jiafeng Guo
Jiafeng Guo
ICT, CAS
, China
Search for other works by this author on:
*
Yixing Fan and Xiaohui Xie contributed equally to this survey. Jiafeng Guo is the corresponding author.
Online ISSN: 1554-0677
Print ISSN: 1554-0669
© 2022 Y. Fan et al.
2022
Y. Fan et al.
Licensed re-use rights only
Foundations and Trends in Information Retrieval (2022) 16 (3): 178–317.
Citation
Fan Y, Xie X, Cai Y, Chen J, Ma X, Li X, Zhang R, Guo J (2022), "Pre-training Methods in Information Retrieval". Foundations and Trends in Information Retrieval, Vol. 16 No. 3 pp. 178–317, doi: https://doi.org/10.1561/1500000100
Download citation file:
Suggested Reading
Semantic voice search in IETP: filling the gap for maintenance 4.0
Journal of Quality in Maintenance Engineering (December,2020)
Recommended for you
These recommendations are informed by your reading behaviors and indicated interests.
