Contribute Media
A thank you to everyone who makes this possible: Read More

PySHED: a Python framework for Streaming Heterogeneous Event Data

Description

Data is naturally heterogeneous, containing data and metadata in a highly interrelated web. Financial data, where the goal is to correlate stock price with contextual metadata like news stories, is highly heterogeneous. However, this class of data is very difficult to handle in a traditional pipelining sense, as the different data types need to be treated in their own bespoke way. Our new library PySHED aims to tackle these issues by creating a streaming data processing protocol for heterogeneous data. The simple, elegant, and flexible protocol enables developers to properly handle their different data types while retaining all the pipelining power for combining, processing, and splitting streams of data. Furthermore, our approach automatically stores provenance information enabling traceback, reanalysis of data, and data introspection. We will discuss the application of this framework to live x-ray experiment data analysis. Finally we'll discuss future integrations with parallel processing and feedback between data collection and analysis.

Improve this page