lpEdit: An editor to facilitate reproducible analysis; SciPy 2013 Presentation

Summary

lpEdit: An editor to facilitate reproducible analysis via literate programming

Authors: Richards, Adam, Duke University, CNRS France; Kosinski Andrzej, Duke University; Bonneaud, Camille,

Track: Reproducible Science

There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are essential to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analyses more reproducible in the general scientific community.

The application, lpEdit, is a cross-platform editor, written with PyQt4, that enables a broad range of scientists to carry out the analytic component of their work in a reproducible manner---through the use of literate programming. Literate programming mixes code and prose to produce a final report that reads like an article or book. A major target audience of lpEdit are the researchers getting started with statistics or programming, so the hurdles associated with setting up a proper pipeline are kept to a minimum and the learning burden is reduced through the use of templates and documentation. The documentation for lpEdit is centered around learning by example, and accordingly we use several increasingly involved examples to demonstrate the software's capabilities.

Because it is commonly used, we begin with an example of Sweave in lpEdit and then in the same way R may be embedded into LaTeX we go on to show how Python can also be used. Next, we demonstrate how both R and Python code may be embedded into reStructuredText (reST). Finally, we walk through a more complete example, where we perform a functional analysis of high-throughput sequencing data, using the transcriptome of the butterfly species Pieris brassicae. There is substantial flexibility that is made available through the use of LaTeX and reST, which facilitates reproducibility through the creation of reports, presentations and web pages.