Contribute Media
A thank you to everyone who makes this possible: Read More

Introduction to Zeppelin Notebooks and PySpark 2.0


Apache Zeppelin is interactive data analytics environment for distributed data processing system. This talk will give a brief overview of what Zeppelin is and where Zeppelin fits into the larger data science/big data ecosystem, discuss how it differs from Jupyter and cover several of Zeppelin's key features via a live demo use the integrated (and just released) PySpark 2.0 interpreter .

Apache Zeppelin is interactive, multi-purpose, data analytics environment for distributed data processing system. It provides beautiful interactive web-based interface, data visualization, collaborative work environment and many other nice features to make your data analytics more fun and enjoyable. This talk will provide a brief overview (via live demo) of some of Zeppelin's key features such as it's pluggable architecture for backend integration, drag and drop visualizations, dynamic forms, notebook persistence, Shiro and notebook authorization, and it's ability to share variables BETWEEN contexts )E.g. the results of a Flink paragraph can be passed to a Spark paragraph; the best tool can be used for the job can be used at each step in analytics pipeline and a data scientist who loves Scala Flink can easily work with a data scientist who loves pyspark.)


Improve this page