An Open Source System for De-identification and Use of Medical Images for Research
Authors: Miller, Jeffrey, Center for Biomedical Informatics, The Children's Hospital of Philadelphia
Track: Medical Imaging
Medical images captured from X-ray, MRI, CT, and ultrasound modalities represent a wealth of data for clinical researchers. Direct access to imaging studies establishes a greater opportunity for research purposes than a text-only system. However, imaging data can be difficult to work with outside of clinical systems and can contain Protected Health Information (PHI) in diverse and unexpected locations, presenting a barrier for multi-institutional, collaborative research. While there are existing integration solutions, such as the Clinical Trials Processor, they do not provide for manual curation of images to screen for relevancy and PHI, a crucial step for using images within a research application. To address these issues, we developed a system for the end-to-end provisioning of de-identified image studies. This includes a Django app for users to review and record metadata for each study, a pipeline for anonymizing and provisioning images to a production image archive, and finally an application for viewing images in the browser as part of a research application. We take advantage of the Python Ruffus pipeline framework and the PyDICOM library to orchestrate the work of moving, anonymizing, and annotating millions of files in a repeatable and auditable manner. This workflow has been used to integrate images into AudGenDB (http://audgendb.chop.edu), a publicly available hearing impairment research database. The results of the AudGenDB image integration enables researchers to visualize and assess images in direct context with clinical and genetic variables for research subjects. The source code is available under a BSD license at http://github.com/cbmi/dicom-pipeline.