TY - GEN
T1 - Rectified Factor Networks and Dropout
AU - Clevert, Djork-Arné
AU - Unterthiner, Thomas
AU - Hochreiter, Sepp
PY - 2014/12
Y1 - 2014/12
N2 - The success of deep learning techniques is based on their robust, effective and abstract representations of the input. In particular, sparse representations that are obtained from rectified linear units and dropout increased classification performance at various tasks. Deep architectures are often constructed by unsupervised pretraining and stacking of either restricted Boltzmann machines (RBMs) or autoencoders. We propose rectified factor networks (RFNs) for pretraining of deep networks. In contrast to RBMs and autoencoders, RFNs (1) estimate the noise of each input component, (2) aim at decorrelating the hidden units (factors), (3) estimate the precision of hidden units by the posterior variance. In the E-step of an EM algorithm, RFN learning (i) enforces non-negative posterior means, (ii) allows dropout of hidden units, and (iii) normalizes the signal part of the hidden units. In the M-step, RFN learning applies gradient descent along the Newton direction to allow rectifying, dropout, and fast GPU implementations. RFN learning can be considered as a variational EM algorithm with unknown prior which is estimated during maximizing the likelihood. Using a fixed point analysis, we show RFNs explain the data variance like factor analysis.
AB - The success of deep learning techniques is based on their robust, effective and abstract representations of the input. In particular, sparse representations that are obtained from rectified linear units and dropout increased classification performance at various tasks. Deep architectures are often constructed by unsupervised pretraining and stacking of either restricted Boltzmann machines (RBMs) or autoencoders. We propose rectified factor networks (RFNs) for pretraining of deep networks. In contrast to RBMs and autoencoders, RFNs (1) estimate the noise of each input component, (2) aim at decorrelating the hidden units (factors), (3) estimate the precision of hidden units by the posterior variance. In the E-step of an EM algorithm, RFN learning (i) enforces non-negative posterior means, (ii) allows dropout of hidden units, and (iii) normalizes the signal part of the hidden units. In the M-step, RFN learning applies gradient descent along the Newton direction to allow rectifying, dropout, and fast GPU implementations. RFN learning can be considered as a variational EM algorithm with unknown prior which is estimated during maximizing the likelihood. Using a fixed point analysis, we show RFNs explain the data variance like factor analysis.
UR - http://www.bioinf.jku.at/publications/
M3 - Other contribution
T3 - Workshop on Deep Learning and Representation Learning
ER -