Abstract
As neural networks grow in complexity, Batch Normalization (BN) has become essential for stabilizing training and improving generalization. However, BN introduces drawbacks such as increased memory usage, computational overhead, and reduced effectiveness for small batch sizes. This study aims to modify EfficientNet to function without normalization while maintaining stable signal propagation.
Signal Propagation Plots (SPPs) were used to visualize activations, gradients, and weights, identifying instability sources and revealing BN’s influence on training dynamics. Testing on feed-forward neural networks showed that BN actively scaled gradients when its learnable parameters were enabled, potentially explaining its regularization effects.
EfficientNet-B0 was redesigned without BN by mainly using Scaled Exponential Linear Units (SELU). While this stabilized signal propagation, a performance trade-off was observed, with variance fluctuations in Squeeze-and-Excitation blocks.
This study confirms stable training without normalization is possible, though alternative regularization techniques are needed to retain performance. Future work could focus on designing novel regularization techniques that mimic how BN is able to influence both the forward and backward pass using its learnable parameters. The full code implementation is available at: https://github.com/PDewyse/MT_Normalizer_free_Neural_Nets
Signal Propagation Plots (SPPs) were used to visualize activations, gradients, and weights, identifying instability sources and revealing BN’s influence on training dynamics. Testing on feed-forward neural networks showed that BN actively scaled gradients when its learnable parameters were enabled, potentially explaining its regularization effects.
EfficientNet-B0 was redesigned without BN by mainly using Scaled Exponential Linear Units (SELU). While this stabilized signal propagation, a performance trade-off was observed, with variance fluctuations in Squeeze-and-Excitation blocks.
This study confirms stable training without normalization is possible, though alternative regularization techniques are needed to retain performance. Future work could focus on designing novel regularization techniques that mimic how BN is able to influence both the forward and backward pass using its learnable parameters. The full code implementation is available at: https://github.com/PDewyse/MT_Normalizer_free_Neural_Nets
| Original language | English |
|---|---|
| Qualification | Master |
| Awarding Institution |
|
| Supervisors/Reviewers |
|
| Publication status | Published - Apr 2025 |
Fields of science
- 102019 Machine learning
- 102018 Artificial neural networks
- 102032 Computational intelligence
- 202037 Signal processing
- 101016 Optimisation
- 101028 Mathematical modelling
- 101031 Approximation theory
- 101019 Stochastics
- 102003 Image processing
- 103029 Statistical physics
- 101018 Statistics
- 101017 Game theory
- 102001 Artificial intelligence
- 202017 Embedded systems
- 101015 Operations research
- 101014 Numerical mathematics
- 101029 Mathematical statistics
- 101026 Time series analysis
- 101024 Probability theory
- 102004 Bioinformatics
- 102013 Human-computer interaction
- 101027 Dynamical systems
- 305907 Medical statistics
- 101004 Biomathematics
- 305905 Medical informatics
- 102033 Data mining
- 102 Computer Sciences
- 305901 Computer-aided diagnosis and therapy
- 106007 Biostatistics
- 106005 Bioinformatics
- 202036 Sensor systems
- 202035 Robotics
JKU Focus areas
- Digital Transformation