Abstract
In this paper, we propose three criteria for efficient sample selection in case of data stream regression
problems within an on-line active learning context. The selection becomes important whenever the target values, which guide the update of the regressors as well as the implicit model structures, are costly or time-consuming to measure and also in case when very fast models updates are
required to cope with stream mining real-time demands. Reducing the selected samples as much as possible while keeping the predictive accuracy of the models on a high level is thus a central challenge. This should be ideally achieved in unsupervised and single-pass manner. Our selection criteria rely on
three aspects: 1.) the extrapolation degree combined with the model’s non-linearity degree, which is measured in terms of a new specific homogeneity criterion among adjacent local approximators, 2.) the uncertainty in model outputs which can be measured in terms of confidence intervals using so-called
adaptive local error bars — we integrate a weighted localization of an incremental noise level estimator
and propose formulas for on-line merging of local error bars; 3.) the uncertainty in model parameters
which is estimated by the so-called A-optimality criterion which relies on the Fisher information matrix.
The selection criteria are developed in combination with evolving generalized Takagi-Sugeno (TS) fuzzy
models (containing rules in arbitrarily rotated position), as it could be shown in previous publications
that these outperform conventional evolving TS models (containing axis-parallel rules). The results
based on three high-dimensional real-world streaming problems show that a model update based on
only 10%-20% selected samples can still achieve similar accumulated model errors over time to the
case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer (ALLB).
Original language | English |
---|---|
Article number | 7820039 |
Pages (from-to) | 292-309 |
Number of pages | 18 |
Journal | IEEE Transactions on Fuzzy Systems |
Volume | 26 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2018 |
Fields of science
- 101 Mathematics
- 101013 Mathematical logic
- 101024 Probability theory
- 102001 Artificial intelligence
- 102003 Image processing
- 102019 Machine learning
- 603109 Logic
- 202027 Mechatronics
JKU Focus areas
- Computation in Informatics and Mathematics
- Mechatronics and Information Processing
- Nano-, Bio- and Polymer-Systems: From Structure to Function