Self-Normalizing Neural Networks

Research output: Working paper and reportsPreprint

Abstract

Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are "scaled exponential linear units" (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance -- even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep. Implementations are available at: github.com/bioinf-jku/SNNs.
Original languageEnglish
Number of pages9
DOIs
Publication statusPublished - 2017

Publication series

NamearXiv.org
ISSN (Print)2331-8422

Fields of science

  • 303 Health Sciences
  • 304 Medical Biotechnology
  • 304003 Genetic engineering
  • 305 Other Human Medicine, Health Sciences
  • 101004 Biomathematics
  • 101018 Statistics
  • 102 Computer Sciences
  • 102001 Artificial intelligence
  • 102004 Bioinformatics
  • 102010 Database systems
  • 102015 Information systems
  • 102019 Machine learning
  • 106023 Molecular biology
  • 106002 Biochemistry
  • 106005 Bioinformatics
  • 106007 Biostatistics
  • 106041 Structural biology
  • 301 Medical-Theoretical Sciences, Pharmacy
  • 302 Clinical Medicine

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Nano-, Bio- and Polymer-Systems: From Structure to Function
  • Medical Sciences (in general)
  • Health System Research
  • Clinical Research on Aging

Cite this