Contribute Media
A thank you to everyone who makes this possible: Read More

Building Daft: Python + Rust = a better distributed query engine

Description

Python is a popular language for data engineering workloads. In data engineering, developers must use a "Query Engine" to efficiently retrieve data, run data processing and then send data back out to a destination storage system or application.

The Python API for Apache Spark (PySpark) is currently the most popular framework that most data engineers use for data engineering at large scale. However, PySpark has a heavy dependency on the JVM which causes high friction during the development process.

In this talk, we discuss our work with the Daft Python Dataframe (www.getdaft.io) which is a distributed Python query engine built with Rust. We will perform a deep-dive into Daft architecture, and talk about how the strong synergy between Python and Rust enables key advantages for Daft to succeed as a query engine.

Details

Improve this page