Recognizing Cards - Finding the Match

In the previous two posts in this series (1, 2) I talked about capturing a reliable image of the art on a magic card from a webcam, and how we can hash those images for an effective and mostly efficient comparison. Now we're left with a hash for the card we'd like to match, and an enormous table of hashes for all the possible matches (something like 18,000 cards). The first question is how to compare the hashes in a meaningful way, but thankfully this is made easy by the nature and documentation for phash. The Hamming distance, or number of dissimilar characters in the string, is the metric of choice. This method is demonstrated step-wise below, first for the correct target, then for a random non-target image.

When compared to the actual match, the Hamming distance is 9. Mismatched character are highlighted in red.
When compared to the actual match, the Hamming distance is 9. Mismatched characters are highlighted in red. While the cutoff value might take some fine tuning, for our purposes (and my camera), 9 is a relatively strong match.
Hashing_example_2
Compared to an incorrect image, the Hamming distance is 15. The single matched character is effectively within the noise.

Hamming distance is a reliable metric for this hashing approach, and is relatively easy to compute if the hashes are all already calculated. That being said, we don't really want to do 18,000 x 16 charter-character comparisons in order to determine which card we're looking at. Instead we can use a binary search tree. There is a lot already written on binary trees, but the short version is this: by splitting space up one can iteritively narrow down the possibilities, rather than look at each candidate individually. It does require that you take the time to build the tree, but individual searches become substantially faster.

But wait, the Hamming distance doesn't provide coordinates in some searchable space, it provides a measure describing how two data are related (or not), but this can be thought of as a 16 dimensional metric space. The approach I've gone with is a vantage-point, or VP tree, chosen primarily since Paul Harrison was kind enough to post his Python implementation. The idea behind VP trees is to pick a point in your space, e.g. the hash "1111aaaabbbbcccc", and then break your member-set into two parts: those "nearer" than some cut-off Hamming distance, and those further out. By repeating this process a tree of relations can be built up, with adjacent 'branches' having smaller hamming distances than 'far' branches. This means that a hash you're trying to match can rapidly traverse the tree and only run direct comparison with one or two actual set members. The paper by Kumar et.al has an excellent explanation of how this compares with other binary-tree approaches, and while they were doing image-patch analysis, the content is still incredible relevant and well presented. Figure 2 in that paper, not reproduced here, is perfect for visualizing the structure of VP trees!

I'm still in the process of cleaning up code, but plan to shortly follow up with a video demonstration of the code in action, as well a few snippets of particular interest.

Recognizing Cards - Effective Comparisons with Hashing

In the previous post we got as far as isolating and pre-processing the art from a card placed in front of the camera; now we come to the problem of effectively comparing it with all the possible matches. Given the possible "attacks" against the image we're trying to match, e.g. rotation, color balance, and blur, it's important to choose a comparison method that will be insensitive to the ones we can't control without losing the ability to clearly identify the correct match among thousands of impostors. A bit of googling led me to phash, a perceptual hashing algorithm that seemed ideal for my application. A good explanation of how the algorithm works can be found here, and illustrates how small attacks on the image can be neglected. I've illustrated the algorithm steps below using one of the cards from my testing group, Snowfall.

Illustration of the phash algorithm from left to right. DCT is the discrete cosine transform. Click for full-size.

The basic identification scheme is simple: calculate the hash for each possible card, then calculate the hash for the art we're identifying. These hashes are converted to ASCII strings and stored. For each hash in the collection, calculate the hamming distance (essentially how many characters in the hash string are dissimilar), and that number describes how different they are. The process of searching through a collection of hashes to find the best match in a reasonable amount of time will be the subject of the next post in this series (hint: it involves VP trees.) Obtaining hashes for all the possible card-arts is an exercise in web scrapping and loops, and isn't something I need to dive into here.

One of my first concerns upon seeing the algorithm spelled out was the discarding of color. The fantasy art we're dealing with is, in general, more colorful than most test image sets, so we might be discarding more information for less of a performance gain than usual. To that end, I decided to try a very simple approach, referred to as phash_color below: get a phash from each of the color channels and simply append them end-to-end. While it takes proportionally longer to calculate, I felt it should provide better discrimination. This expanded algorithm is illustrated below. While it is true that the results (far right column) appear highly similar across color channels, distinct improvements to identification were found across the entire corpus of images compared to the simpler (and faster) approach.

The color-aware extension of the phash algorithm. The rows correspond to individual color channels.
The color-aware extension of the phash algorithm. The rows correspond to individual color channels. Click for full-size.

I decided to make a systematic test of it, and chose four cards from my old box and grabbed images, shown below. Some small attempt was made to vary the color content and level of detail across the test images.

The four captured arts for testing the hashing algorithms.
The four captured arts for testing the hashing algorithms. The art itself is the property of Wizards of the Coast.

For several combinations of hash and pre-processing I found what I'm calling the SNR, after 'signal-to-noise ratio'. This SNR is essentially how well the hash matches the image it should, divided by the quality of the next best match. The ideal hash size was found to be 16 by a good deal of trial and error. A gallery of showing the matching strength for the four combinations (original phash, the color version, with equalized histograms, and without pre-processing) are shown below, but the general take-away is that histogram equalization makes matching easier, and including color provides additional protection against false positives.

This slideshow requires JavaScript.

If there is interest I can post the code for the color-aware phash function, but it really is as simple as breaking the image into three greyscale layers and using phash function provided by the imagehash package. Up next: VP trees and quickly determining which card it is we're looking at!

Recognizing Cards - Image Capture

Back in October I posted a short blurb on my first attempts on recognizing Magic cards through webcam imagery. A handful of factors have brought me back around to it, not the least of which is a still un-sorted collection. Also, it happened to be a good excuse to dig into image processing and search trees, things I’ve heard a lot about but never really dug into. Probably the biggest push to get back on this project was a snippet of python I found for live display of the pre-and-post processed webcam frames in real time, here. There is real novelty in seeing your code in action in a very immediate way, and it also eliminated all of the frustration I was having with convincing the camera to stay in focus between captures. At present, the program appears to behave well and recognize cards reliably!

I plan to break my thoughts on this project into a few smaller posts focusing on the specific tasks and problems that came up along the way, so I can devote enough space to the topics I found most interesting.

  • Image Pre-Processing
  • Recognizing Blurry Images: Hashing and Performance
  • Finding Matches: Fancy Binary Trees

I should note here: a lot of the ideas used in this project were taken from code others posted online. Any time I directly used (or was heavily inspired by) a chunk of code, I’ll link out to the original source as well as include a listing at the bottom of each post in this series.

Pre-Processing

The goal here was to take the camera imagery and produce an image that was most likely to be recognized as "similar" by our hashing algorithm. First and foremost, we need to deal with the fact that our camera (1) is not perfect, the white-balance, saturation, and focus of our acquired image may all be different than the image we're comparing with, and (2) the camera captures a lot more than the card alone. Let's focus on the latter problem first, isolating the card from the background.

The method I described in the previous post works sometimes, but not particularly well. It required exactly ideal lighting and a perfectly flat background. The algorithm I ended up settling on is:

  1. Convert a copy of the frame to grey-scale
  2. Store the absolute difference between that frame, and the background (more on that later)
  3. Threshold that difference-image to a binary image
  4. Find the contours present using cv2.findContours()
  5. Only look at the contours with a bounded area greater than 10k pixels (based on my camera)
  6. Find a bounding box for each of these contours and compute the aspect ratio.
  7. Throw out contours with a bounding box aspect ratio less than 0.65 or greater than 1.0
  8. If we've got exactly one contour left in the set, that's our card!

The next problem to tackle is that of perspective and rotation, which thankfully we can tackle simultaneously. In the previous steps we were able to find the contour of the card and the bounding rectangle for that contour, and we can use these.

  • Find the approximate bounding polygon for our contour using cv2.approxPolyDp().
  • If the result has more than four corners, we need to trim out the spurious corners by finding the ones closest to any other corner. These might result from a hand holding the card, for example.
  • Using the width of the bounding box, known aspect ratio of a real card, and the corners of the trapezoid bounding the card, we can construct the perspective transformation matrix.
  • Apply the perspective transform.
Camera input image. Card contour is shown in red, bounding rectangle is shown in green.
Camera input image. Card contour is shown in red, bounding rectangle is shown in green. The text labels are the result of the look-up process I'll explain in the coming posts.
The isolated and perspective-corrected card image.
The isolated and perspective-corrected card image.

Lastly, to isolate the art we simply rely on the consistency of the printed cards. By measuring the cards it was fairly easy to pick out the fractional width and height bounds for the art, and simply crop to those fractions. Now we're left with the first problem: the imperfect camera.  Due to the way we're hashing images, which will be discussed in the next post in this series, we're not terribly worried about image sharpness as the method does not preserve high frequencies. Contrast however, is a big concern. After much experimentation I settled on a very simple histogram equalization. Essentially modifying the image such that the brightest color is white and darkest color is black, without disrupting how the bits in the middle correspond. An example of this is given below.

Sample image showing (cw) the camera capture, the target image, the result of histogram equalizing the input, and the result of equalizing the target.
Sample image showing the camera capture, the target image, the result of histogram equalizing the input, and the result of equalizing the target.

So now we're at the point where we can capture convincing versions of the card art reliably from the webcam. In the next post I'll go over how I chose the hashing algorithm to compare each captured image against all the potential candidates, so we can tell which card we've actually got!