PyData Amsterdam 2016
Are you looking for accessible, compressed, organized data? HDF5 might be the solution you’re looking for. HDF5 works like a file system within a file, designed for flexible and efficient storage and I/O for high volume, complex data. Come learn from a Pyentist how to leverage HDF5, get started with h5py, and see a real-world example of a processing pipeline utilizing HDF5.
a Pyentist1? frequently ‘grep’-ing? drowning in ASCII files? extending filenames for each processing step? looking for accessible, compressed, organized data? If you answered yes to any of these questions, then HDF5 might be the solution you’re looking for. HDF5 is entirely open source and supported by a variety of programming languages and tools, including Python (h5py). HDF5 not only supports large, complex, heterogeneous data but is self-describing and supports data slicing. In this talk, you’ll learn about embracing HDF5 from a Pyentist.
This talk is aimed at data scientists who have large, numerical datasets that need to be managed and stored but also accessed and processed efficiently. Basic knowledge of NumPy and UNIX will be useful for attendees but not required. Attendees will learn how to get started with h5py, as well as how to leverage HDF5 in order to attain accessible, compressed, and organized data.
HDF5 stands for Hierarchical Data Format, version 5. It is a file format, library, and data model for storing and managing data. More simply, HDF5 can be described as a file system within a file. An HDF5 file contains two kinds of objects, namely, datasets and groups. Datasets work like NumPy arrays while groups work like dictionaries that hold datasets and other groups. In addition, objects can have attributes, or metadata. HDF5 is designed for flexible and efficient storage and I/O for high volume, complex data. Data scientists will find HDF5 to be invaluable for managing, manipulating, and storing their data.
Part of this talk will demonstrate how to get started with HDF5. In this demo, attendees will learn how to: create and handle HDF5 files using h5py, manage and manipulate datasets, work with groups, and make use of attributes. A real-world example of a processing pipeline of brain recordings, utilizing HDF5 for storing and managing data at each processing step, will be presented. Attendees will have access to an IPython notebook to follow along during the demo and explore examples. After this talk, attendees will be able to begin using HDF5 to effortlessly store and manage their data.