Binary Losses for Density Ratio Estimation

Research output: Working paper and reportsPreprint

Abstract

Estimating the ratio of two probability densities from a finite number of observations is a central machine learning problem. A common approach is to construct estimators using binary classifiers that distinguish observations from the two densities. However, the accuracy of these estimators depends on the choice of the binary loss function, raising the question of which loss function to choose based on desired error properties. For example, traditional loss functions, such as logistic or boosting loss, prioritize accurate estimation of small density ratio values over large ones, even though the latter are more critical in many applications. In this work, we start with prescribed error measures in a class of Bregman divergences and characterize all loss functions that result in density ratio estimators with small error. Our characterization extends results on composite binary losses from (Reid & Williamson, 2010) and their connection to density ratio estimation as identified by (Menon & Ong, 2016). As a result, we obtain a simple recipe for constructing loss functions with certain properties, such as those that prioritize an accurate estimation of large density ratio values. Our novel loss functions outperform related approaches for resolving parameter choice issues of 11 deep domain adaptation algorithms in average performance across 484 real-world tasks including sensor signals, texts, and images.
Original languageEnglish
Number of pages33
DOIs
Publication statusPublished - 01 Jul 2024

Publication series

NamearXiv.org
No.2407.01371

Fields of science

  • 101019 Stochastics
  • 102003 Image processing
  • 103029 Statistical physics
  • 101018 Statistics
  • 101017 Game theory
  • 102001 Artificial intelligence
  • 202017 Embedded systems
  • 101016 Optimisation
  • 101015 Operations research
  • 101014 Numerical mathematics
  • 101029 Mathematical statistics
  • 101028 Mathematical modelling
  • 101026 Time series analysis
  • 101024 Probability theory
  • 102032 Computational intelligence
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 101027 Dynamical systems
  • 305907 Medical statistics
  • 101004 Biomathematics
  • 305905 Medical informatics
  • 101031 Approximation theory
  • 102033 Data mining
  • 102 Computer Sciences
  • 305901 Computer-aided diagnosis and therapy
  • 102019 Machine learning
  • 106007 Biostatistics
  • 102018 Artificial neural networks
  • 106005 Bioinformatics
  • 202037 Signal processing
  • 202036 Sensor systems
  • 202035 Robotics

JKU Focus areas

  • Digital Transformation

Cite this