Description
RegEx Strikes Back: Regular Expressions for Text Mining - PyCon Italia 2022
A short time ago in a galaxy not so far away a regular expression was taking 5 days to run. In this talk you will learn why regular expressions can be slow, how to make them fast using a trie regex data structure and the many uses a good old regular expression can have. Regular Expressions have a bad reputation, and they are slow (or so they say) for text mining tasks. In this talk you’ll learn why regex can be slow and how to use a Trie Regex to craft blazingly fast regular expressions with no effort. How regular expressions integrate smoothly with many libraries (pandas, spacy, etc) and how to use the regex module for common text cleaning tasks such as: prefix finding, fuzzy matching and many more.
Slides:
Speaker: Daniel Mesejo