Recognizing Cards - Finding the Match

In the previous two posts in this series (1, 2) I talked about capturing a reliable image of the art on a magic card from a webcam, and how we can hash those images for an effective and mostly efficient comparison. Now we're left with a hash for the card we'd like to match, and an enormous table of hashes for all the possible matches (something like 18,000 cards). The first question is how to compare the hashes in a meaningful way, but thankfully this is made easy by the nature and documentation for phash. The Hamming distance, or number of dissimilar characters in the string, is the metric of choice. This method is demonstrated step-wise below, first for the correct target, then for a random non-target image.

When compared to the actual match, the Hamming distance is 9. Mismatched character are highlighted in red.
When compared to the actual match, the Hamming distance is 9. Mismatched characters are highlighted in red. While the cutoff value might take some fine tuning, for our purposes (and my camera), 9 is a relatively strong match.
Hashing_example_2
Compared to an incorrect image, the Hamming distance is 15. The single matched character is effectively within the noise.

Hamming distance is a reliable metric for this hashing approach, and is relatively easy to compute if the hashes are all already calculated. That being said, we don't really want to do 18,000 x 16 charter-character comparisons in order to determine which card we're looking at. Instead we can use a binary search tree. There is a lot already written on binary trees, but the short version is this: by splitting space up one can iteritively narrow down the possibilities, rather than look at each candidate individually. It does require that you take the time to build the tree, but individual searches become substantially faster.

But wait, the Hamming distance doesn't provide coordinates in some searchable space, it provides a measure describing how two data are related (or not), but this can be thought of as a 16 dimensional metric space. The approach I've gone with is a vantage-point, or VP tree, chosen primarily since Paul Harrison was kind enough to post his Python implementation. The idea behind VP trees is to pick a point in your space, e.g. the hash "1111aaaabbbbcccc", and then break your member-set into two parts: those "nearer" than some cut-off Hamming distance, and those further out. By repeating this process a tree of relations can be built up, with adjacent 'branches' having smaller hamming distances than 'far' branches. This means that a hash you're trying to match can rapidly traverse the tree and only run direct comparison with one or two actual set members. The paper by Kumar et.al has an excellent explanation of how this compares with other binary-tree approaches, and while they were doing image-patch analysis, the content is still incredible relevant and well presented. Figure 2 in that paper, not reproduced here, is perfect for visualizing the structure of VP trees!

I'm still in the process of cleaning up code, but plan to shortly follow up with a video demonstration of the code in action, as well a few snippets of particular interest.

Leave a Reply