PyData Amsterdam 2017
This talk gives a basic overview of machine learning on top of H2O and Spark and explains different ways how to scale your tasks on top of these technologies to fit your use case(s).
Sparkling Water integrates H2O, open source distributed machine learning platform, with the capabilities of Apache Spark. It allows users to leverage H2O’s machine learning algorithms with Apache Spark applications via Scala, Python, R or H2O’s Flow GUI which makes Sparkling Water a great enterprise solution. Sparkling Water 2.0 was built to coincide with the release of Apache Spark 2.0 and introduces several new features. One of the latest and largest features is the ability to configure Sparkling Water for different workloads, scale and optimize the platform according to your data and needs. In this talk we will introduce the basic architecture of Sparkling Water, go over different scaling strategies and explain the pros and cons of each solution. We will also compare the approaches with regards to the specific use cases and provide the rationale why or why not each solution may be a good fit for the desired use case. This talk will finish with a live demo demonstrating the mentioned approaches and should give you a real time experience of configuring and running Sparkling Water for your use case(s)!