Using Python for authorship attribution in Renaissance drama.
About a third of the plays written for the London theatres of 400 years ago -- when Shakespeare was writing -- were published anonymously, so we don't know who wrote them. For the last century, investigators have counted things in these plays in an attempt to decide who wrote them. Such inquiries have been helped by advances in technology. We can now store these texts electronically and automate much of the counting. These advances have led to a refinement in methods for attributing authorship, with scholars having a battery of reliable tests for determining the author or authors of a piece of writing.
Scholars who specialise in authorship attribution tend to give their attention almost exclusively to Shakespeare's writing and few use automated means to produce their results. This talk -- and the programs behind it -- addresses both these facts. It will show how authorship attribution, by a number of methods, can be automated in Python and yield worthwhile results for the study of Renaissance drama. By using its inbuilt data structures and a few libraries (like pandas, numpy, and math), we can write programs that find likely candidate authors for writing from the period.
Discovering more about authorship in the Renaissance is vital to our understanding of the period. Slowly, ideas that placed Shakespeare alone as a solitary genius are being replaced by models of a more collaborative theatre industry, where people co-authored plays more frequently than previously thought. This talk will help to continue this work as well as showing how researchers can automate their endeavours in a way that others can replicate and understand with only a little programming knowledge.