Contribute Media
A thank you to everyone who makes this possible: Read More

Using pandas and pyspark to address challenges in processing and storing time series instrument data


Time series data from scientific instruments for fermentation, environmental sensors, or spectroscopy often comes in proprietary or unusual formats that are require custom logic to process. In addition, processing data at scale is challenge since enterprise laboratory information management systems (LIMS) typically rely on transactional, row-oriented databases that are not designed to handle millions of records at a time. However, with clever use of pandas for unusually formatted files or pyspark (via Databricks) for large numbers of records, this data can be processed into cleaner, more useful forms for further analysis.


Improve this page