TY - GEN
T1 - Measuring Bias in Search Results Through Retrieval List Comparison
AU - Ratz, Linda
AU - Schedl, Markus
AU - Kopeinik, Simone
AU - Rekabsaz, Navid
PY - 2024
Y1 - 2024
N2 - Many IR systems project harmful societal biases, including gender bias, in their retrieved contents. Uncovering and addressing such biases requires grounded bias measurement principles. However, defining reliable bias metrics for search results is challenging, particularly due to the difficulties in capturing gender-related tendencies in the retrieved documents. In this work, we propose a new framework for search result bias measurement. Within this framework, we first revisit the current metrics for representative search result bias (RepSRB) that are based on the occurrence of gender-specific language in the search results. Addressing their limitations, we additionally propose a metric for comparative search result bias (ComSRB) measurement and integrate it into our framework. ComSRB defines bias as the skew in the set of retrieved documents in response to a non-gendered query toward those for male/female-specific variations of the same query. We evaluate ComSRB against RepSRB on a recent collection of bias-sensitive topics and documents from the MS MARCO collection, using pre-trained bi-encoder and cross-encoder IR models. Our analyses show that, while existing metrics are highly sensitive to the wordings and linguistic formulations, the proposed ComSRB metric mitigates this issue by focusing on the deviations of a retrieval list from its explicitly biased variants, avoiding the need for sub-optimal content analysis processes.
AB - Many IR systems project harmful societal biases, including gender bias, in their retrieved contents. Uncovering and addressing such biases requires grounded bias measurement principles. However, defining reliable bias metrics for search results is challenging, particularly due to the difficulties in capturing gender-related tendencies in the retrieved documents. In this work, we propose a new framework for search result bias measurement. Within this framework, we first revisit the current metrics for representative search result bias (RepSRB) that are based on the occurrence of gender-specific language in the search results. Addressing their limitations, we additionally propose a metric for comparative search result bias (ComSRB) measurement and integrate it into our framework. ComSRB defines bias as the skew in the set of retrieved documents in response to a non-gendered query toward those for male/female-specific variations of the same query. We evaluate ComSRB against RepSRB on a recent collection of bias-sensitive topics and documents from the MS MARCO collection, using pre-trained bi-encoder and cross-encoder IR models. Our analyses show that, while existing metrics are highly sensitive to the wordings and linguistic formulations, the proposed ComSRB metric mitigates this issue by focusing on the deviations of a retrieval list from its explicitly biased variants, avoiding the need for sub-optimal content analysis processes.
M3 - Conference proceedings
VL - 14612
T3 - Lecture Notes in Computer Science
BT - Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024)
ER -