On-line Active Learning in Data Stream Regression using Uncertainty Sampling based on Evolving Generalized Fuzzy Models

Edwin Lughofer, Mahardhika Pratama

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we propose three criteria for efficient sample selection in case of data stream regression problems within an on-line active learning context. The selection becomes important whenever the target values, which guide the update of the regressors as well as the implicit model structures, are costly or time-consuming to measure and also in case when very fast models updates are required to cope with stream mining real-time demands. Reducing the selected samples as much as possible while keeping the predictive accuracy of the models on a high level is thus a central challenge. This should be ideally achieved in unsupervised and single-pass manner. Our selection criteria rely on three aspects: 1.) the extrapolation degree combined with the model’s non-linearity degree, which is measured in terms of a new specific homogeneity criterion among adjacent local approximators, 2.) the uncertainty in model outputs which can be measured in terms of confidence intervals using so-called adaptive local error bars — we integrate a weighted localization of an incremental noise level estimator and propose formulas for on-line merging of local error bars; 3.) the uncertainty in model parameters which is estimated by the so-called A-optimality criterion which relies on the Fisher information matrix. The selection criteria are developed in combination with evolving generalized Takagi-Sugeno (TS) fuzzy models (containing rules in arbitrarily rotated position), as it could be shown in previous publications that these outperform conventional evolving TS models (containing axis-parallel rules). The results based on three high-dimensional real-world streaming problems show that a model update based on only 10%-20% selected samples can still achieve similar accumulated model errors over time to the case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer (ALLB).
Original languageEnglish
Article number7820039
Pages (from-to)292-309
Number of pages18
JournalIEEE Transactions on Fuzzy Systems
Volume26
Issue number1
DOIs
Publication statusPublished - 2018

Fields of science

  • 101 Mathematics
  • 101013 Mathematical logic
  • 101024 Probability theory
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102019 Machine learning
  • 603109 Logic
  • 202027 Mechatronics

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Mechatronics and Information Processing
  • Nano-, Bio- and Polymer-Systems: From Structure to Function

Cite this