Probabilistic Active Learning in Data Streams

by Daniel Kottke, Georg Krempl, Myra Spiliopoulou. 

In recent years, stream-based active learning has become an intensively investigated research topic. In this work, we propose a new algorithm for stream-based active learning that decides immediately whether to acquire a label (selective sampling). It uses Probabilistic Active Learning (PAL) to measure the spatial usefulness of each instance in the stream. To determine if a currently arrived instance belongs to the most useful instances (temporal usefulness) given a predefined budget, we propose BIQF – a Balanced Incremental Quantile Filter. It uses a sliding window to represent the distribution of the most recent usefulness values and finds a labeling threshold using quantiles. The balancing mechanism ensures that the predefined budget will be met within a given tolerance window. We evaluate our approach against other stream active learning approaches on multiple datasets. The results confirm the effectiveness of our method.

Published on The Fourteenth International Symposium on Intelligent Data Analysis (IDA), 2015, Saint-Etienne.

Slides: www.daniel.kottke.eu/talks/2015_IDA/slides

Supplemental Material: http://kmd.cs.ovgu.de/res/pals/

General Information about PAL: http://kmd.cs.ovgu.de/res/pal/