Statistical learning provides a set of powerful principles and tools to interpret data from many different domains. This enables insights about a range of phenomena through the construction of accurate models of the data. The application of these principles and tools to data from a specific scientific domain often presents challenges, because it requires an understanding of both the phenomena measured, as well as the properties of the measurement. In this talk, I will explore these challenges, by focusing on data from measurements of the living human brain with MRI.
Diffusion MRI (dMRI) measures water diffusion in the brain, and because tissue compartments form boundaries to free diffusion, these measurements allow us to probe the structure of the tissue, and delineate the trajectories of bundles of nerve cell projections (axons) connecting different parts of brain. Therefore, it can be used to make inferences about brain structure and connectivity, and about the tissue properties of different parts of the brain, as well as their relation to health and to cognitive abilities.
Here, I focus on the use of cross-validation to compare different models of the dMRI signal. I will discuss the cross-validation API that we developed in the open-source Dipy project (http://dipy.org). This API was designed to match specific features of the data and the measurement, but also to generalize across different models. The data contains information at multiple size scales, and cross-validation can be applied at different levels, to evaluate and validate models of the microscopic distribution of fiber directions in small regions of the brain as well as long-range connections between distant brain regions.
Materials available here: http://arokem.github.io/2015-pydatanw/#/