Contribute Media
A thank you to everyone who makes this possible: Read More

How to Trick SQL into Doing All the Work for You

Description

Consider the process of importing data into a SQL database with a SQL COPY within your Python app. As data grows, it becomes more and more important that your preprocessing steps are as efficient as possible.

You must validate and reformat before importing the data. If SQL has control over the import step, most would think that at least one additional iteration is needed to validate and format before sending to COPY. But what if you tricked the copy step into doing the validating and formatting for you?

In this talk, I will demonstrate how to create a file-like object that COPY will use to validate and reformat the data as it is read. This will remove inefficiencies when processing large data sets and prevent execution time from growing. While this talk explores sending a file-like object to a copy command, the technique can be extended to any method that reads from an object.

Improve this page