Cellular populations in biology are often heterogeneous, and aggregate assays such as expression arrays can obscure the small differences between these populations. Examples where these differences can be highly significant include the identification of antigen-specific immune cells, stem cells and circulating cancer cells. As the frequency of such cells in the blood can be vanishingly small, assays to detect signals at the single cell level are essential. Flow cytometry is probably the best established single cell assay, and has been an integral tool in immunology and biology for decades, able to measure cellular marker levels for individual cells, as well as population statistics over millions of cells.
Recent technological innovations in flow cytometry have increased the number of cell markers capable of being resolved simultaneously, and visual analysis (gating) is difficult and error prone with increasing data dimensionality. Hence there is increasing demand for tools to automate the analysis and management of flow data, so as to increase accuracy and reproducibility. However, essentially all software used by flow cytometry laboratories is commercial and based on the visual analysis paradigm. With the exception of the R BioConductor project, we are not aware of any other full-featured open source tools for analyzing flow data. The few open source flow software modules that exist simply extracts data from FCS (flow cytometry standard) files into tabular/csv format, losing all metadata associated with the file, and provide no additional tools for analysis. We therefore decided to develop the fcm library in python that would provide a foundation for flow cytometry data management and analysis.
The fcm library provides functions to load fcs files, apply spectral compensation, and perform standard log and log-like transforms for visualization. The library also provides objects and methods for traditional gating-based analysis, including standard polygon, threshold, interval, and quadrant gates. Using fcm and other common python libraries, one can quickly write scripts for doing large scale batch analysis. In addition to gating- based analysis, fcm provides methods to do model-based analysis, utilizing GPU-optimized statistical models to identify cell subsets. These statistical models provide a data-driven way to construct generative probability models that scale well with the increasing dimensionality of flow data and do not require expert input to identify cell subsets. High performance computational routines to fit statistical models are optimized using cython and pycuda. More specialized tools for the analysis of flow data include the use of a novel information measure to optimize reagent panels and analysis strategies, and optimization methods for automatic determination of positivity thresholds.
We are currently using the fcm library for the analysis of tetramer assays for cancer immunotherapy, as well as intracellular expression of effector molecules in the NIAID-sponsored External Quality Assurance Policy Oversight Laboratory (EQAPOL) program to standardize flow cytometry assays in HIV studies. An illustrative example is the use of fcm in building a pipeline for the Cytostream application to automate the analysis of 459 FCS files from 12 laboratories, reducing the analysis time of one month to a single evening.