Contribute Media
A thank you to everyone who makes this possible: Read More

Big data biology for pythonistas: getting in on the genomics revolution


Darya Vanichkina In 2001 Bill Clinton unveiled "the most important, most wondrous map ever produced by humankind" - the human genome. This monumental endeavour cost $3 billion, and took hundreds of scientists from all over the world 13 years. Today, a single person can generate such a map in ~2 days for $1000. This dramatic drop in cost means that we now have data for hundreds of thousands of people - and other species - from all corners of the globe, and cohorts are available for every major disease under the sun. Petabytes of new data are also being generated every day.

Most of this data is publicly available, so anyone with an internet connection can try in silico biology from the comfort of their own home. In my talk, I'll walk through what this data looks like, and how it's analysed - with a special focus on where python fits into the workflow (;tldr the most interesting parts!). I will also highlight some common pitfalls software engineers and developers face when getting into this space. Finally, I'll showcase several other facets of bioinformatics that sorely need contributions from good coders.

Genomics is rapidly entering the world of health care in both the public and private hospital sectors, and in direct-to-consumer genetic testing. Understanding this data, the challenges and limitations of its analytics will help us all make better-informed health and medical decisions, affecting our quality of life and those we love.


Improve this page