Python is the lingua franca for data analytics and machine learning. Its superior productivity makes it the preferred tool for prototyping. However, traditional Python packages are not necessarily designed to provide high performance and scalability for large datasets. We start our tutorial with a short introduction on how to get close-to-native performance with Intel-optimized packages, such as numpy, scipy, and scikit-learn. The next part of the tutorial is focused on getting high performance and scalability from multi-cores on a single machine to large clusters of workstations. We will demonstrate that with Python it is possible to achieve the same performance and scalability as with hand-tuned C++/MPI code: - Scalable Dataframe Compiler (SDC) is used to compile analytics code using pandas/Python and scale it to bare-metal cluster performance. It compiles a subset of Python code into efficient parallel binaries that use message passing to perform collective communications. - A convenient Python API to data analytics and machine learning primitives (daal4py). While its interface is scikit-learn-like, its MPI-based engine allows to scale machine learning algorithms to bare-metal cluster performance. - In the tutorial, we will use SDC and daal4py together to build an end-to-end analytics pipeline that scales to clusters, requiring only minimal code changes.