Abstract
Existing neural ranking models follow the text matching paradigm, where document-to-query relevance is estimated through predicting a matching score. Drawing from the rich literature of classical generative retrieval models, we introduce and formalize the paradigm of deep generative retrieval models defined via the cumulative probabilities of generating query terms. This paradigm offers a grounded probabilistic view on relevance estimation while still enabling the use of modern BERT and transformer architectures. In contrast to the matching paradigm, the probabilistic nature of these generative rankers readily offers a fine-grained measure of uncertainty, without imposing any computational overhead nor any need for model modification. We adopt several current neural generative models in our framework and also introduce a novel generative ranker (T-PGN), which combines the encoding capacity of Transformers with the Pointer Generator Network model. We conduct an extensive set of evaluation experiments on passage retrieval, leveraging the MS MARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking collections. Lastly, to demonstrate potential benefits of using neural generative retrieval models for downstream tasks, we leverage the uncertainty information they provide to significantly improve the cut-off prediction task.
Original language | English |
---|---|
Title of host publication | Proceedings of the 7th ACM SIGIR International Conference on the Theory of Information Retrieval and the 11th International Conference on the Theory of Information Retrieval |
Number of pages | 10 |
Publication status | Published - 2021 |
Fields of science
- 202002 Audiovisual media
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
- 102015 Information systems
JKU Focus areas
- Digital Transformation