Contribute Media
A thank you to everyone who makes this possible: Read More

One in a billion: finding matching images in very large corpora

Description

PyData Berlin 2016

The goal was not only to support high write volumes of over 10k/s but also to support fast lookup of similar images around 1-2s for over 1B images. Though similar paid services and free image hashing libraries exist, this may be the first complete free open-source solution. Available at: https://github.com/ascribe/image-match

image-match started as an internal project. We needed a way, given some target image, to find similar images downloaded by our web-crawler (think Tineye).

So not only did we need to support fast, accurate lookup for millions or even billions of images, we also needed to facilitate very high volume insertion -- around 10k images per second.

In my talk, I will cover:

  • The Problem: why is finding similar images hard?
  • Algorithm: based on this paper
  • Performance: but does it scale?
  • Alternatives

Details

Improve this page