In the previous post we got as far as isolating and pre-processing the art from a card placed in front of the camera; now we come to the problem of effectively comparing it with all the possible matches. Given the possible "attacks" against the image we're trying to match, e.g. rotation, color balance, and blur, it's important to choose a comparison method that will be insensitive to the ones we can't control without losing the ability to clearly identify the correct match among thousands of impostors. A bit of googling led me to phash, a perceptual hashing algorithm that seemed ideal for my application. A good explanation of how the algorithm works can be found here, and illustrates how small attacks on the image can be neglected. I've illustrated the algorithm steps below using one of the cards from my testing group, Snowfall.
The basic identification scheme is simple: calculate the hash for each possible card, then calculate the hash for the art we're identifying. These hashes are converted to ASCII strings and stored. For each hash in the collection, calculate the hamming distance (essentially how many characters in the hash string are dissimilar), and that number describes how different they are. The process of searching through a collection of hashes to find the best match in a reasonable amount of time will be the subject of the next post in this series (hint: it involves VP trees.) Obtaining hashes for all the possible card-arts is an exercise in web scrapping and loops, and isn't something I need to dive into here.
One of my first concerns upon seeing the algorithm spelled out was the discarding of color. The fantasy art we're dealing with is, in general, more colorful than most test image sets, so we might be discarding more information for less of a performance gain than usual. To that end, I decided to try a very simple approach, referred to as phash_color below: get a phash from each of the color channels and simply append them end-to-end. While it takes proportionally longer to calculate, I felt it should provide better discrimination. This expanded algorithm is illustrated below. While it is true that the results (far right column) appear highly similar across color channels, distinct improvements to identification were found across the entire corpus of images compared to the simpler (and faster) approach.
I decided to make a systematic test of it, and chose four cards from my old box and grabbed images, shown below. Some small attempt was made to vary the color content and level of detail across the test images.
For several combinations of hash and pre-processing I found what I'm calling the SNR, after 'signal-to-noise ratio'. This SNR is essentially how well the hash matches the image it should, divided by the quality of the next best match. The ideal hash size was found to be 16 by a good deal of trial and error. A gallery of showing the matching strength for the four combinations (original phash, the color version, with equalized histograms, and without pre-processing) are shown below, but the general take-away is that histogram equalization makes matching easier, and including color provides additional protection against false positives.
If there is interest I can post the code for the color-aware phash function, but it really is as simple as breaking the image into three greyscale layers and using phash function provided by the imagehash package. Up next: VP trees and quickly determining which card it is we're looking at!