Description
Speaker:: Sebastian Wanner Christopher Lennan
Track: PyData: Natural Language Processing In this talk we will be presenting a case-study of how we used two open source libraries, Sentence-Transformers and Facebook Faiss, to successfully cluster offers at idealo.de based on text data.
Clustering text data is a well studied problem and we want to show how a state of the art approach succeeded in a business setting and how relatively easy it is to realise such a project with current open source tools.
We will present our Transformer based clustering approach in detail and compare its performance across different optimisation strategies (additive angular margin, contrastive, and triplet loss), as well as against other approaches, e.g. probabilistic record linkage.
Recorded at the PyConDE & PyData Berlin 2022 conference, April 11-13 2022. https://2022.pycon.de More details at the conference page: https://2022.pycon.de/program/URDTCT Twitter: https://twitter.com/pydataberlin Twitter: https://twitter.com/pyconde