Solving Data Science Problems using a Jupyter Notebook and SAP HANA's in-database Machine Learning Libraries


Companies store their data in databases with highly restricted access regulations. The latest regulatorily changes enforces the need to work on the datasets in this controlled environment without created additional external copies. However Data Scientists prefer to work with tools they are most familiar like Python, R and Jupyter Notebooks using to a large amount of open- source packages (numpy, matplotlib, pandas, ..). SAP HANA provides highly optimized in-database machine learning libraries. In this talk we will present how a Data Scientist can work in an environment he/she is most familiar with and access the data stored in SAP HANA using SAP HANA machine learning libraries with a scikit-learn type interface. Data will remain in the database and will be exposed as dataframes (similar to Pandas dataframes). We will explain the software architecture and present a complete end-to-end use case by using a Jupyter Notebook.


