Abstract
This paper describes new dynamic split-andmerge
operations for evolving cluster models, which are
learned incrementally and expanded on-the-fly from data
streams. These operations are necessary to resolve the
effects of cluster fusion and cluster delamination, which
may appear over time in data stream learning. We propose
two new criteria for cluster merging: a touching and a
homogeneity criterion for two ellipsoidal clusters. The
splitting criterion for an updated cluster applies a 2-means
algorithm to its sub-samples and compares the quality of
the split cluster with that of the original cluster by using a
penalized Bayesian information criterion; the cluster partition
of higher quality is retained for the next incremental
update cycle. This new approach is evaluated using twodimensional
and high-dimensional streaming clustering
data sets, where feature ranges are extended and clusters
evolve over time—and on two large streams of classification
data, each containing around 500K samples. The
results show that the new split-and-merge approach
(a) produces more reliable cluster partitions than conventional
evolving clustering techniques and (b) reduces
impurity and entropy of cluster partitions evolved on the
classification data sets.
Original language | English |
---|---|
Pages (from-to) | 135-151 |
Number of pages | 17 |
Journal | Evolving Systems |
Volume | 3 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2012 |
Fields of science
- 101001 Algebra
- 101 Mathematics
- 102 Computer Sciences
- 101013 Mathematical logic
- 101020 Technical mathematics
- 102001 Artificial intelligence
- 102003 Image processing
- 202027 Mechatronics
- 101019 Stochastics
- 211913 Quality assurance
JKU Focus areas
- Computation in Informatics and Mathematics
- Mechatronics and Information Processing
- Nano-, Bio- and Polymer-Systems: From Structure to Function