In the second part, we introduce the vtext project that allows fast text processing in Python using Rust. In particular, we consider the problems of text tokenization, and (parallel) token counting resulting in a sparse vector representation of documents. These can then be used as input in machine learning or information retrieval applications. We outline the approach used in vtext and compare to existing solutions of these problems in the Python ecosystem.
In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the vtext project, that aims to be a high- performance library for text processing.