A look at how PySpark "works" today and how we can make it better in the future + insert engine noises of a fast car +
This talk will introduce PySpark along with the magic done to make it work and be friends with the JVM. We will discuss why lazy evaluation makes a huge difference in PySpark, both in terms of general optimizations it opens up as well as Python specific considerations. From there we will explore much of the future of Spark, DataFrames & Datasets and what this means for PySpark. Most Spark DataFrame examples limit them selves to things written in the relational style query language, but we will explore how to add more functionality through UDFS.
Hopefully no one is scared away from using Spark once they see the 300 small gnome like creatures behind the curtain, but parental guidance is encouraged for those who still believe in magic, reliable distributed systems, and vendor marketing brochures.