Reproducibility of research is a common issue in science, especially in computationally expensive research fields e.g. cancer research.
A comprehensive picture of the genomic aberrations that occur during tumour progression and the resulting intra-tumour heterogeneity, is essential for personalised and precise cancer therapies. With the change in the tumour environment under treatment, heterogeneity allows the tumour additional ways to evolve resistance, such that intra-tumour genomic diversity is a cause of relapse and treatment failure. Earlier bulk sequencing technologies were incapable of determining the diversity in the tumour.
Single-cell DNA sequencing - a recent sequencing technology - offers resolution down to the level of individual cells and is playing an increasingly important role in this field.
We present a reproducible and scalable Python data analysis pipeline that employs a statistical model and an MCMC algorithm to infer the evolutionary history of copy number alterations of a tumour from single cells. The pipeline is built using Python, Conda environment management system and the Snakemake workflow management system. The pipeline starts from the raw sequencing files and a settings file for parameter configurations. After running the data analysis, pipeline produces report and figures to inform the treatment decision of the cancer patient.