Description
PyData Berlin 2016
The goal was not only to support high write volumes of over 10k/s but also to support fast lookup of similar images around 1-2s for over 1B images. Though similar paid services and free image hashing libraries exist, this may be the first complete free open-source solution. Available at: https://github.com/ascribe/image-match
image-match started as an internal project. We needed a way, given some target image, to find similar images downloaded by our web-crawler (think Tineye).
So not only did we need to support fast, accurate lookup for millions or even billions of images, we also needed to facilitate very high volume insertion -- around 10k images per second.
In my talk, I will cover:
- The Problem: why is finding similar images hard?
- Algorithm: based on this paper
- Performance: but does it scale?
- Alternatives