In this talk, we describe how Python in combination with Apache Spark helps Avast to fight bad guys. We demonstrate different use cases how we apply machine learning to a wide range of security applications from anomaly detection on time series to clustering of malicious files.
Avast is dedicated to creating a world that provides safety and privacy for all. Every month we stop over 1.5 billion attacks and analyze 30 million new executable files. Robust big data pipelines are crucial for us to ensure the safety of our customers. We use Apache Spark and machine learning frameworks, including TensorFlow, in different areas such as network security and malware detection and classification.
In the first part of the presentation, we describe our cluster environment and talk about how we analyze, cluster, and build classification models for malicious files. Clustering by itself is widely used for different security applications, and Spark enables us with a fast way of conducting our experiments. The pipeline is useful for the research on new algorithms and the evaluation of the production ones.
In the second part, we show the application of anomaly detection on time series. As an antivirus company, we receive thousands of different incident reports daily. We help malware experts to analyze threats by notifying them about sudden changes. We will walk you through our streaming application with parallel training and serving of multiple TensorFlow models.