Toward Adaptive Disk Failure Prediction via Stream Mining
This paper presents StreamDFP, a general stream mining framework for disk failure predition with concept-driven adaption.
Problems to solve:
Concept Drift: the relationship between the input and output continuously changes over time. Which should be p(y_t | x_t). Solution: Change detection |
Learning Algorithms. Commonlyused decision tree, ensemble learning algorithms are used.
Studied the concept drifts p(y_t | x_t) by measuring p(x_t) and p(y_t). Conclusion: the concept drift likely exists. |
Enabling concept-drift adaption increases classification accuracy for different learning algorithms.
Online labeling improves the overall accuracy.
Compatibility of Regression and Classification.
Speed viable for pratical stream processing usage.
Validation in Alibaba Cloud dataset, which is large
What’s the advantage of StreamDFP compared with its related work[43] ORF? And what about the performance comparison between the two works? Speed and accuracy? ORF method focused on aging issue in online learning method, but this paper’s work changes the perspective, it viewed the workflow as a data stream. What’s the difference?
How about other datasets? Are those datasets available? (Needs to figure out)