Contribute Media
A thank you to everyone who makes this possible: Read More

Designing spaCy: Industrial-strength NLP

Description

PyData Berlin 2016

The spaCy natural language processing (NLP) library features state-of-the-art performance, and a high-level Python API. Efficiency is crucial for NLP, because job sizes are constantly increasing. This talk describes how we’ve met these challenges in spaCy, by implementing the library in Cython.

The spaCy natural language processing (NLP) library features state-of-the-art performance, and a high-level Python API. Efficiency is crucial for NLP, because job sizes are constantly increasing. The key algorithms are also relatively complicated, and frequently subject to change, as new research is published. This talk describes how we’ve met these challenges in spaCy, by implementing the library in Cython. Unlike many Cython users, we did not write the library in Python first, and then optimize it. Instead, we designed the library as a C extension from the start, and added the Python API on top. This allows us to build the library on top of efficient, memory-managed data structures, without having to maintain a separate C or C++ codebase. The result is the fastest NLP library in the world, support for GIL-free multithreading, in a concise readable codebase, and with no compromise on user friendliness.

Details

Improve this page