Contribute Media
A thank you to everyone who makes this possible: Read More

Boosting command line data manipulation with Python and AWK


It is often required to manipulate the data as fast as possible, be it either column average calculation or simple join and filtering. Servers often do not have convenient tools, such as Python (numpy/pandas) of R, moreover, the data might not fit into memory. This talk shows how to make fast but inconvenient command line tools great again. This is based on work done in Yandex and allows to process Gbs of data with one-liner commands.


Improve this page