Description
One of the principal goals of the Janelia Farm Research Campus is the reconstruction of complete neuronal circuits. This involves 3D electron- microscopy (EM) volumes many microns across with better than 10nm resolution, resulting in gigavoxel scale images. From these, individual neurons must be segmented out. Although image segmentation is a well-studied problem, these data present unique challenges in addition to scale: neurons have an elongated, irregular branching structure, with processes up to 50nm thin but hundreds of micrometers long); one neuron looks much like the next, with only a thin cellular boundary separating densely packed neurons; and internal neuronal structures can look similar to the cellular boundary. The first problem in particular means that small errors in segment boundary predictions can lead to large errors in neuron shape and neuronal network connectivity.
Our segmentation workflow has three main steps: a voxelwise edge classification, a fine-grained segmentation into supervoxels (which can reasonably be assumed to be atomic groups of voxels), and hierarchical region agglomeration.
For the first step, we use Ilastik, a pixel-level interactive learning program. Ilastik uses the output of various image filters as features to classify voxels as labeled by the user. We then use the watershed algorithm on the resulting edge probability map to obtain supervoxels. For the last step, we developed a new machine learning algorithm (Nunez-Iglesias et al, in preparation).
Prior work has used the mean voxel-level edge-probability along the boundaries between regions to agglomerate them. This strategy works extremely well because boundaries get longer as agglomeration proceeds, resulting in ever- improving estimates of the mean probability. We hypothesized that we could improve agglomeration accuracy by using a classifier (which can use many more features than the mean). However, a classifier can perform poorly because throughout agglomeration we may visit a part of the feature space that has not yet been sampled. In our approach, we use active learning to ensure that we have examples from all parts of the space we are likely to encounter.
We implemented our algorithm in arbitrary dimensions in an open-source, MIT- licensed Python library, Ray (https://github.com/jni/ray). Ray combines leading scientific computing Python libraries, including NumPy, SciPy, NetworkX, and scikits-learn to deliver state of the art segmentation accuracy in Python.