Recognizing Cards - Finding the Match

In the previous two posts in this series (1, 2) I talked about capturing a reliable image of the art on a magic card from a webcam, and how we can hash those images for an effective and mostly efficient comparison. Now we're left with a hash for the card we'd like to match, and an enormous table of hashes for all the possible matches (something like 18,000 cards). The first question is how to compare the hashes in a meaningful way, but thankfully this is made easy by the nature and documentation for phash. The Hamming distance, or number of dissimilar characters in the string, is the metric of choice. This method is demonstrated step-wise below, first for the correct target, then for a random non-target image.

When compared to the actual match, the Hamming distance is 9. Mismatched character are highlighted in red.
When compared to the actual match, the Hamming distance is 9. Mismatched characters are highlighted in red. While the cutoff value might take some fine tuning, for our purposes (and my camera), 9 is a relatively strong match.
Hashing_example_2
Compared to an incorrect image, the Hamming distance is 15. The single matched character is effectively within the noise.

Hamming distance is a reliable metric for this hashing approach, and is relatively easy to compute if the hashes are all already calculated. That being said, we don't really want to do 18,000 x 16 charter-character comparisons in order to determine which card we're looking at. Instead we can use a binary search tree. There is a lot already written on binary trees, but the short version is this: by splitting space up one can iteritively narrow down the possibilities, rather than look at each candidate individually. It does require that you take the time to build the tree, but individual searches become substantially faster.

But wait, the Hamming distance doesn't provide coordinates in some searchable space, it provides a measure describing how two data are related (or not), but this can be thought of as a 16 dimensional metric space. The approach I've gone with is a vantage-point, or VP tree, chosen primarily since Paul Harrison was kind enough to post his Python implementation. The idea behind VP trees is to pick a point in your space, e.g. the hash "1111aaaabbbbcccc", and then break your member-set into two parts: those "nearer" than some cut-off Hamming distance, and those further out. By repeating this process a tree of relations can be built up, with adjacent 'branches' having smaller hamming distances than 'far' branches. This means that a hash you're trying to match can rapidly traverse the tree and only run direct comparison with one or two actual set members. The paper by Kumar et.al has an excellent explanation of how this compares with other binary-tree approaches, and while they were doing image-patch analysis, the content is still incredible relevant and well presented. Figure 2 in that paper, not reproduced here, is perfect for visualizing the structure of VP trees!

I'm still in the process of cleaning up code, but plan to shortly follow up with a video demonstration of the code in action, as well a few snippets of particular interest.

Recognizing Cards - Effective Comparisons with Hashing

In the previous post we got as far as isolating and pre-processing the art from a card placed in front of the camera; now we come to the problem of effectively comparing it with all the possible matches. Given the possible "attacks" against the image we're trying to match, e.g. rotation, color balance, and blur, it's important to choose a comparison method that will be insensitive to the ones we can't control without losing the ability to clearly identify the correct match among thousands of impostors. A bit of googling led me to phash, a perceptual hashing algorithm that seemed ideal for my application. A good explanation of how the algorithm works can be found here, and illustrates how small attacks on the image can be neglected. I've illustrated the algorithm steps below using one of the cards from my testing group, Snowfall.

Illustration of the phash algorithm from left to right. DCT is the discrete cosine transform. Click for full-size.

The basic identification scheme is simple: calculate the hash for each possible card, then calculate the hash for the art we're identifying. These hashes are converted to ASCII strings and stored. For each hash in the collection, calculate the hamming distance (essentially how many characters in the hash string are dissimilar), and that number describes how different they are. The process of searching through a collection of hashes to find the best match in a reasonable amount of time will be the subject of the next post in this series (hint: it involves VP trees.) Obtaining hashes for all the possible card-arts is an exercise in web scrapping and loops, and isn't something I need to dive into here.

One of my first concerns upon seeing the algorithm spelled out was the discarding of color. The fantasy art we're dealing with is, in general, more colorful than most test image sets, so we might be discarding more information for less of a performance gain than usual. To that end, I decided to try a very simple approach, referred to as phash_color below: get a phash from each of the color channels and simply append them end-to-end. While it takes proportionally longer to calculate, I felt it should provide better discrimination. This expanded algorithm is illustrated below. While it is true that the results (far right column) appear highly similar across color channels, distinct improvements to identification were found across the entire corpus of images compared to the simpler (and faster) approach.

The color-aware extension of the phash algorithm. The rows correspond to individual color channels.
The color-aware extension of the phash algorithm. The rows correspond to individual color channels. Click for full-size.

I decided to make a systematic test of it, and chose four cards from my old box and grabbed images, shown below. Some small attempt was made to vary the color content and level of detail across the test images.

The four captured arts for testing the hashing algorithms.
The four captured arts for testing the hashing algorithms. The art itself is the property of Wizards of the Coast.

For several combinations of hash and pre-processing I found what I'm calling the SNR, after 'signal-to-noise ratio'. This SNR is essentially how well the hash matches the image it should, divided by the quality of the next best match. The ideal hash size was found to be 16 by a good deal of trial and error. A gallery of showing the matching strength for the four combinations (original phash, the color version, with equalized histograms, and without pre-processing) are shown below, but the general take-away is that histogram equalization makes matching easier, and including color provides additional protection against false positives.

This slideshow requires JavaScript.

If there is interest I can post the code for the color-aware phash function, but it really is as simple as breaking the image into three greyscale layers and using phash function provided by the imagehash package. Up next: VP trees and quickly determining which card it is we're looking at!