How Much is an Augmented Sample Worth?

Hamid Eghbal-Zadeh, Gerhard Widmer

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

Data Augmentation (DA) methods are widely-used in various areas of machine learning, and have been associated with the generalization capabilities of deep neural networks. Data Augmentation incorporates certain invariances and Inductive Biases (IBs) into models by applying transformations that are aligned with the task at hand, and extends the training samples beyond the training set. Models trained on augmented data are then equipped with the priors incorporated by these IBs, allowing them to better generalize onto unseen examples. In addition to inductive bias, data augmentation methods introduce randomness, to increase the variety of augmented data, and prevent overfitting. However, in the literature the success of DA has been mostly associated with the choice of IBs, and the role of randomness has been mostly ignored. In this work, we investigate the role of randomness on the regularization effects of DA, by taking the number of augmented samples required to achieve a certain performance improvement into account. We provide a hypothesis that regularization effects of DA are not only due to IBs used, but that randomness has a causal effect in regularizing models incorporating DA. Further, we provide an experimental protocol to test and validate our hypothesis, comparing different popular DA algorithms. Finally, using our proposed protocol we evaluate different DAs under limited randomness, measuring the alignment of their IBs w.r.t the data and the task at hand
Original languageEnglish
Title of host publicationPre-registration workshop (NeurIPS 2021)
Number of pages8
Publication statusPublished - 2021

Fields of science

  • 202002 Audiovisual media
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102015 Information systems

JKU Focus areas

  • Digital Transformation

Cite this