Back in October I posted a short blurb on my first attempts on recognizing Magic cards through webcam imagery. A handful of factors have brought me back around to it, not the least of which is a still un-sorted collection. Also, it happened to be a good excuse to dig into image processing and search trees, things I’ve heard a lot about but never really dug into. Probably the biggest push to get back on this project was a snippet of python I found for live display of the pre-and-post processed webcam frames in real time, here. There is real novelty in seeing your code in action in a very immediate way, and it also eliminated all of the frustration I was having with convincing the camera to stay in focus between captures. At present, the program appears to behave well and recognize cards reliably!
I plan to break my thoughts on this project into a few smaller posts focusing on the specific tasks and problems that came up along the way, so I can devote enough space to the topics I found most interesting.
- Image Pre-Processing
- Recognizing Blurry Images: Hashing and Performance
- Finding Matches: Fancy Binary Trees
I should note here: a lot of the ideas used in this project were taken from code others posted online. Any time I directly used (or was heavily inspired by) a chunk of code, I’ll link out to the original source as well as include a listing at the bottom of each post in this series.
The goal here was to take the camera imagery and produce an image that was most likely to be recognized as "similar" by our hashing algorithm. First and foremost, we need to deal with the fact that our camera (1) is not perfect, the white-balance, saturation, and focus of our acquired image may all be different than the image we're comparing with, and (2) the camera captures a lot more than the card alone. Let's focus on the latter problem first, isolating the card from the background.
The method I described in the previous post works sometimes, but not particularly well. It required exactly ideal lighting and a perfectly flat background. The algorithm I ended up settling on is:
- Convert a copy of the frame to grey-scale
- Store the absolute difference between that frame, and the background (more on that later)
- Threshold that difference-image to a binary image
- Find the contours present using cv2.findContours()
- Only look at the contours with a bounded area greater than 10k pixels (based on my camera)
- Find a bounding box for each of these contours and compute the aspect ratio.
- Throw out contours with a bounding box aspect ratio less than 0.65 or greater than 1.0
- If we've got exactly one contour left in the set, that's our card!
The next problem to tackle is that of perspective and rotation, which thankfully we can tackle simultaneously. In the previous steps we were able to find the contour of the card and the bounding rectangle for that contour, and we can use these.
- Find the approximate bounding polygon for our contour using cv2.approxPolyDp().
- If the result has more than four corners, we need to trim out the spurious corners by finding the ones closest to any other corner. These might result from a hand holding the card, for example.
- Using the width of the bounding box, known aspect ratio of a real card, and the corners of the trapezoid bounding the card, we can construct the perspective transformation matrix.
- Apply the perspective transform.
Camera input image. Card contour is shown in red, bounding rectangle is shown in green. The text labels are the result of the look-up process I'll explain in the coming posts.
The isolated and perspective-corrected card image.
Lastly, to isolate the art we simply rely on the consistency of the printed cards. By measuring the cards it was fairly easy to pick out the fractional width and height bounds for the art, and simply crop to those fractions. Now we're left with the first problem: the imperfect camera. Due to the way we're hashing images, which will be discussed in the next post in this series, we're not terribly worried about image sharpness as the method does not preserve high frequencies. Contrast however, is a big concern. After much experimentation I settled on a very simple histogram equalization. Essentially modifying the image such that the brightest color is white and darkest color is black, without disrupting how the bits in the middle correspond. An example of this is given below.
Sample image showing the camera capture, the target image, the result of histogram equalizing the input, and the result of equalizing the target.
So now we're at the point where we can capture convincing versions of the card art reliably from the webcam. In the next post I'll go over how I chose the hashing algorithm to compare each captured image against all the potential candidates, so we can tell which card we've actually got!