Contribute Media
A thank you to everyone who makes this possible: Read More

Online Change Point Detection Using Spark Streaming

Description

Scan Statistics is a distribution based methodology for detecting anomalies. This talk will explore the use of scan statistics to perform real time analysis on streaming data using Spark Streaming.

Abstract

Scan Statistics is a distribution based methodology for detecting anomalous data. Unlike simpler methodologies like moving average and exponential smoothing that rely on previous data, we can perform a hypothesis test regarding the distribution of the data and thus perform the analysis in real time. Spark Streaming is a framework that lends itself well to this use case. This talk will introduce a Python package built for Spark Streaming that performs real time anomaly detection using various distributions of count data.

Details

Improve this page