Measuring and modeling the complexity of children's books

Summary

Researchers have been modeling text difficulty for over 50 years. A variety of models have been developed, but few have focused on books for emerging readers (Grades K-2). We used Python for nearly every aspect of the project including collecting data from reading educators, analyzing text features and psychometric data, and creating a predictive model. Tools used include scipy, scikit-learn, pandas, and extensive use of the IPython Notebook which is demonstrated in the talk.