I provide an overview of the challenges we’ve tackled at UC Berkeley deploying scientific compute environments in both educational and research contexts. After a discussion of how these needs can be served by devops tools like Docker and Ansible, I argue that a coherent, easy-to-understand philosophy around reproducible compute environments is fundamental.
As the line between developer and researcher becomes ever more blurred, the challenge of sharing your compute environment with students and colleagues becomes ever more complex. Large, private organizations have been grappling with this issue for a while, spawning a great deal of enthusiasm around tools like Docker, Puppet, Vagrant, and Packer. And let’s not forget notable python-based upstarts, Ansible and Salt! These tools can generate immense enthusiasm, followed by the question, “Why are we doing this?”
The problem is that researcher / developers can become overwhelmed by the complexity and variety inherent in devops tools - all the while losing sight of the real reason for using these tools: a philosophy of documenting your research compute environments in a reproducible fashion, with a focus on scripting as much as is reasonable.
At UC Berkeley, members of the D-Lab, the Statistical Compute Facility, Computer Science and Research IT have organized a project to develop the Berkeley Common Environment (BCE). I’ll provide an overview of the challenges we’ve tackled in both educational and research contexts, and the needs served by the above-mentioned devops tools. In the end, I argue that a coherent, easy-to-understand philosophy around scientific compute environments is fundamental - the tools are just a way to make your collaboration architecture a little easier for the people building these environments a few times a year. What we should focus on, though, is end-user experience and research community buy-in.