Scalable teacher forcing network for semi-supervised large scale data streams

Mahardhika Pratama, Choiru Zain, Edwin Lughofer, Eric Pardede, Wenny Rahayu

Research output: Contribution to journalArticlepeer-review

Abstract

The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions.
Original languageEnglish
Pages (from-to)407-431
Number of pages25
JournalInformation Sciences
Volume576
DOIs
Publication statusPublished - Oct 2021

Fields of science

  • 101 Mathematics
  • 101013 Mathematical logic
  • 101024 Probability theory
  • 102001 Artificial intelligence
  • 102003 Image processing
  • 102019 Machine learning
  • 102035 Data science
  • 603109 Logic
  • 202027 Mechatronics

JKU Focus areas

  • Digital Transformation

Cite this