Semi-Automatic Book Cataloguing in Python

Skip to the code: Git repository here

I got it into my head a little while back, the idea perhaps planted by a book-and-spreadsheet loving friend, that I should consider cataloguing our entire library. In part this was to provide friends with a handy reference of which books to avoid gifting us, since that’s an always popular option around the holidays, but also to provide a place to store notes. It would be interesting to have jotted down when we read a particular book, what we thought of it at the time, and maybe the recommender who convinced us to give it a read. 

Just one corner of the stacks
Just one corner of the stacks

It would’ve been easy enough, if incredibly tedious, to simply type in the details of every book into a spreadsheet and call it done. At this point I’m entirely convinced my efforts to automate the process have taken longer than simply typing them in would have, but I rationalize that adding new books will be easier, and the process of learning some new tools and writing a bit of code was far more enjoyable than filling my evenings with mindless typing.

The framework that first came to mind was simple enough; snap photos of every book’s barcode, use those barcodes to find the ISBN, use the ISBN to grab the meta-data, and populate a row of the spreadsheet with those strings. I would learn a few things along the way that complicated that flow, but the general skeleton of it survived the process intact.

The first step was to find a way to isolate the barcode from a photo and decode it into a string, this was luckily already an existing function in pyzbar. Next was to use this to find an ISBN - this turned out to be unnecessary (usually) since most books use the ISBN as the product identifier right in the barcode, the stored string literally is the ISBN. I'll get into the caveats to this later. Lastly was to use the ISBN to grab the meta-data, which conveniently is available through isbnlib, another freely available python library. If all of those steps succeed, then all that's left to do is write a row to a CSV file. I did play around with a few database options and fancy libraries, but ended up settling on using the barebones Python open() and write() functions for plaintext. The plan was to generate CSV files that could be imported into any spreadsheet software easily, I'm unlikely to own a million books, so no need to complicate this step further.

The core of the program was basically done, it worked on the twenty or so books I considered a representative sample (foreshadowing: it was not representative), so I powered ahead on building a GUI interface. Even with that small subset it was obvious I would want to be able to tweak the details before inserting each record into the file. Many times the data returned by isbnlib.meta() was incomplete, and there were several sources that function can be set to choose from - manual review wasn't avoidable, still easier than typing literally every bit by hand though.

Honestly, I'd never bothered with complicated GUIs in Python before, and certainly not since Python 3 became the default. After reviewing the libraries, and there are so many, I settled on wxPython and hunted down some basic tutorial videos. It took a while, but eventually I had a tenuous handle on frames, panels, sizers and the like. I'm sure any deft wxPython use would find my code miserable, but it does run, and with a little window re-sizing, worked for all the images I fed it.

At this point I had to decide what the workflow of this program was going to look like; I settled on the following: the user will specify an output CSV file for the results, then point the program at a directory containing jpeg files, then the program will populate a selectable list with those file names. The user then can click on each file name, causing the program to display the image and populate the ISBN field automatically. A button then triggers the meta-data look-up, which populates the remaining fields. After review, they can click to write the row to file and remove the selected file from the list, also moving the image file to a /Success/ directory, repeat until the list is empty and all your books are in the CSV. I also added an option to skip a photo, moving it to a /Skip/ directory without writing any record, so unparsable images wouldn't continually end up at the top of the queue. Generally speaking, that worked great. 

Program in action
Program in action

Now for the snags. Firstly, there's a field in that screenshot that isbnlib.meta() does not provide: the subject tags. Interestingly enough, it's not trivial to discover if a given ISBN is a mystery novel, a comic book, or even simply fiction or nonfiction. There are paid services that will return detailed tags (ISBNdb seemed to be roughly what I was after), but aside from that, I was left searching. Eventually I found that OpenLibrary has user-curated tags associated with each of their entries, but isbnlib.meta() doesn't retrieve them, so I wrote my own handler (just grabbing the url to text, parsing it as JSON, and joining the bits grouped under 'subjects').

This worked, but "user curated" inevitably means practically random. Some books would have no subjects tagged at all, others would have pages of tags, some tagged in other languages, some tags were control codes for someone else's scheme. It was, and remains, the wonkiest part of my flow. In practice I manually reviewed and cleaned up these tags before inserting each record, and even build a set of buttons to add the most common tags quickly. I have some thoughts to do a housekeeping down the line where I look at the frequency of each given tag and cull the nonsensical ones that snuck past me, but that's a task for some other weekend.

Most of the other stumbling blocks I hit had to do first with older books, back from before the ISBN standard was created in 1969, though in reality there were plenty of books lacking an ISBN through the 1980s. Thankfully the switch from ISBN-10 to ISBN-13 that happened in 2007 is essentially transparent to my program's workflow, either work fine. The second issue was non-English books catalogued in pre-ISBN systems, or simply not catalogued at all. In both cases, these had to be entered manually, usually requiring a minute or two to search up the details of publishing date, edition, etc.   

Aside from those cases, a lot of failure states had to be accounted for in the flow. Was the photo subpar and a barcode couldn't be found? Was the ISBN entered invalid? Did the meta() function fail to return anything at all? I eventually added a status bar to the program to provide these sorts of messages in a less obtrusive way - even just using it myself I found pop-ups insufferable. 

Records in the spreadshet
Records in the spreadsheet

While there is still some clean up to be done, adding new books is a breeze and I can maintain a live inventory on Google Sheets pretty painlessly. It also doesn't hurt to keep track of who got lent which book when, or who lent me a book that I've failed to read for ages!

As for the code; I'm not sure the cleanest way to share it, I've never really messed with GitHub before, but it seems cleaner than simply tossing a IPython notebook file up without any other info. Behold, my first git repository.

The next big project in the queue is building a set of custom bookshelves to house these ~1,500 books, which will be laborious in an entirely different and interesting way!

Recognizing Cards - Finding the Match

In the previous two posts in this series (1, 2) I talked about capturing a reliable image of the art on a magic card from a webcam, and how we can hash those images for an effective and mostly efficient comparison. Now we're left with a hash for the card we'd like to match, and an enormous table of hashes for all the possible matches (something like 18,000 cards). The first question is how to compare the hashes in a meaningful way, but thankfully this is made easy by the nature and documentation for phash. The Hamming distance, or number of dissimilar characters in the string, is the metric of choice. This method is demonstrated step-wise below, first for the correct target, then for a random non-target image.

When compared to the actual match, the Hamming distance is 9. Mismatched character are highlighted in red.
When compared to the actual match, the Hamming distance is 9. Mismatched characters are highlighted in red. While the cutoff value might take some fine tuning, for our purposes (and my camera), 9 is a relatively strong match.
Hashing_example_2
Compared to an incorrect image, the Hamming distance is 15. The single matched character is effectively within the noise.

Hamming distance is a reliable metric for this hashing approach, and is relatively easy to compute if the hashes are all already calculated. That being said, we don't really want to do 18,000 x 16 charter-character comparisons in order to determine which card we're looking at. Instead we can use a binary search tree. There is a lot already written on binary trees, but the short version is this: by splitting space up one can iteritively narrow down the possibilities, rather than look at each candidate individually. It does require that you take the time to build the tree, but individual searches become substantially faster.

But wait, the Hamming distance doesn't provide coordinates in some searchable space, it provides a measure describing how two data are related (or not), but this can be thought of as a 16 dimensional metric space. The approach I've gone with is a vantage-point, or VP tree, chosen primarily since Paul Harrison was kind enough to post his Python implementation. The idea behind VP trees is to pick a point in your space, e.g. the hash "1111aaaabbbbcccc", and then break your member-set into two parts: those "nearer" than some cut-off Hamming distance, and those further out. By repeating this process a tree of relations can be built up, with adjacent 'branches' having smaller hamming distances than 'far' branches. This means that a hash you're trying to match can rapidly traverse the tree and only run direct comparison with one or two actual set members. The paper by Kumar et.al has an excellent explanation of how this compares with other binary-tree approaches, and while they were doing image-patch analysis, the content is still incredible relevant and well presented. Figure 2 in that paper, not reproduced here, is perfect for visualizing the structure of VP trees!

I'm still in the process of cleaning up code, but plan to shortly follow up with a video demonstration of the code in action, as well a few snippets of particular interest.

Recognizing Cards - Effective Comparisons with Hashing

In the previous post we got as far as isolating and pre-processing the art from a card placed in front of the camera; now we come to the problem of effectively comparing it with all the possible matches. Given the possible "attacks" against the image we're trying to match, e.g. rotation, color balance, and blur, it's important to choose a comparison method that will be insensitive to the ones we can't control without losing the ability to clearly identify the correct match among thousands of impostors. A bit of googling led me to phash, a perceptual hashing algorithm that seemed ideal for my application. A good explanation of how the algorithm works can be found here, and illustrates how small attacks on the image can be neglected. I've illustrated the algorithm steps below using one of the cards from my testing group, Snowfall.

Illustration of the phash algorithm from left to right. DCT is the discrete cosine transform. Click for full-size.

The basic identification scheme is simple: calculate the hash for each possible card, then calculate the hash for the art we're identifying. These hashes are converted to ASCII strings and stored. For each hash in the collection, calculate the hamming distance (essentially how many characters in the hash string are dissimilar), and that number describes how different they are. The process of searching through a collection of hashes to find the best match in a reasonable amount of time will be the subject of the next post in this series (hint: it involves VP trees.) Obtaining hashes for all the possible card-arts is an exercise in web scrapping and loops, and isn't something I need to dive into here.

One of my first concerns upon seeing the algorithm spelled out was the discarding of color. The fantasy art we're dealing with is, in general, more colorful than most test image sets, so we might be discarding more information for less of a performance gain than usual. To that end, I decided to try a very simple approach, referred to as phash_color below: get a phash from each of the color channels and simply append them end-to-end. While it takes proportionally longer to calculate, I felt it should provide better discrimination. This expanded algorithm is illustrated below. While it is true that the results (far right column) appear highly similar across color channels, distinct improvements to identification were found across the entire corpus of images compared to the simpler (and faster) approach.

The color-aware extension of the phash algorithm. The rows correspond to individual color channels.
The color-aware extension of the phash algorithm. The rows correspond to individual color channels. Click for full-size.

I decided to make a systematic test of it, and chose four cards from my old box and grabbed images, shown below. Some small attempt was made to vary the color content and level of detail across the test images.

The four captured arts for testing the hashing algorithms.
The four captured arts for testing the hashing algorithms. The art itself is the property of Wizards of the Coast.

For several combinations of hash and pre-processing I found what I'm calling the SNR, after 'signal-to-noise ratio'. This SNR is essentially how well the hash matches the image it should, divided by the quality of the next best match. The ideal hash size was found to be 16 by a good deal of trial and error. A gallery of showing the matching strength for the four combinations (original phash, the color version, with equalized histograms, and without pre-processing) are shown below, but the general take-away is that histogram equalization makes matching easier, and including color provides additional protection against false positives.

This slideshow requires JavaScript.

If there is interest I can post the code for the color-aware phash function, but it really is as simple as breaking the image into three greyscale layers and using phash function provided by the imagehash package. Up next: VP trees and quickly determining which card it is we're looking at!

Recognizing Cards - Image Capture

Back in October I posted a short blurb on my first attempts on recognizing Magic cards through webcam imagery. A handful of factors have brought me back around to it, not the least of which is a still un-sorted collection. Also, it happened to be a good excuse to dig into image processing and search trees, things I’ve heard a lot about but never really dug into. Probably the biggest push to get back on this project was a snippet of python I found for live display of the pre-and-post processed webcam frames in real time, here. There is real novelty in seeing your code in action in a very immediate way, and it also eliminated all of the frustration I was having with convincing the camera to stay in focus between captures. At present, the program appears to behave well and recognize cards reliably!

I plan to break my thoughts on this project into a few smaller posts focusing on the specific tasks and problems that came up along the way, so I can devote enough space to the topics I found most interesting.

  • Image Pre-Processing
  • Recognizing Blurry Images: Hashing and Performance
  • Finding Matches: Fancy Binary Trees

I should note here: a lot of the ideas used in this project were taken from code others posted online. Any time I directly used (or was heavily inspired by) a chunk of code, I’ll link out to the original source as well as include a listing at the bottom of each post in this series.

Pre-Processing

The goal here was to take the camera imagery and produce an image that was most likely to be recognized as "similar" by our hashing algorithm. First and foremost, we need to deal with the fact that our camera (1) is not perfect, the white-balance, saturation, and focus of our acquired image may all be different than the image we're comparing with, and (2) the camera captures a lot more than the card alone. Let's focus on the latter problem first, isolating the card from the background.

The method I described in the previous post works sometimes, but not particularly well. It required exactly ideal lighting and a perfectly flat background. The algorithm I ended up settling on is:

  1. Convert a copy of the frame to grey-scale
  2. Store the absolute difference between that frame, and the background (more on that later)
  3. Threshold that difference-image to a binary image
  4. Find the contours present using cv2.findContours()
  5. Only look at the contours with a bounded area greater than 10k pixels (based on my camera)
  6. Find a bounding box for each of these contours and compute the aspect ratio.
  7. Throw out contours with a bounding box aspect ratio less than 0.65 or greater than 1.0
  8. If we've got exactly one contour left in the set, that's our card!

The next problem to tackle is that of perspective and rotation, which thankfully we can tackle simultaneously. In the previous steps we were able to find the contour of the card and the bounding rectangle for that contour, and we can use these.

  • Find the approximate bounding polygon for our contour using cv2.approxPolyDp().
  • If the result has more than four corners, we need to trim out the spurious corners by finding the ones closest to any other corner. These might result from a hand holding the card, for example.
  • Using the width of the bounding box, known aspect ratio of a real card, and the corners of the trapezoid bounding the card, we can construct the perspective transformation matrix.
  • Apply the perspective transform.
Camera input image. Card contour is shown in red, bounding rectangle is shown in green.
Camera input image. Card contour is shown in red, bounding rectangle is shown in green. The text labels are the result of the look-up process I'll explain in the coming posts.
The isolated and perspective-corrected card image.
The isolated and perspective-corrected card image.

Lastly, to isolate the art we simply rely on the consistency of the printed cards. By measuring the cards it was fairly easy to pick out the fractional width and height bounds for the art, and simply crop to those fractions. Now we're left with the first problem: the imperfect camera.  Due to the way we're hashing images, which will be discussed in the next post in this series, we're not terribly worried about image sharpness as the method does not preserve high frequencies. Contrast however, is a big concern. After much experimentation I settled on a very simple histogram equalization. Essentially modifying the image such that the brightest color is white and darkest color is black, without disrupting how the bits in the middle correspond. An example of this is given below.

Sample image showing (cw) the camera capture, the target image, the result of histogram equalizing the input, and the result of equalizing the target.
Sample image showing the camera capture, the target image, the result of histogram equalizing the input, and the result of equalizing the target.

So now we're at the point where we can capture convincing versions of the card art reliably from the webcam. In the next post I'll go over how I chose the hashing algorithm to compare each captured image against all the potential candidates, so we can tell which card we've actually got!

Recognizing Cards - First Attempts

Sorting images has been a problem on my mind for years, but I never had a really good reason to sink time into it. Just recently, I finally found a reason. It occurred to me that it would be useful to have my webcam recognize magic cards by their art, and add them to a database for collection tracking. I've since learned that this wasn't as new an idea as I'd thought, and several complete software packages for this specific application have become available in the last 6 months. Nevertheless, it was an interesting excursion into image processing, admittedly not my home turf. It worked much better than planned too.

To my mind, the problem had three main tasks that I'd not undertaken before, in order of drastically increasing difficulty:

  1. Grabbing camera input from Python
  2. Isolating the card from the background
  3. Usefully comparing images

The first part was mostly just an exercise in googling it and adapting the code to my webcam. Short version, use OpenCV2's VideoCapture method, and be sure to chuck out a few frames while the camera is auto-focusing and tuning the white balance. The second bit proved to be slightly more interesting. Given a photo of a card on a plain white background, canny edge detection can reliably pull out the edge (though I haven't tried this with white-boarded cards yet). The card-background contour is easily picked it; it has the largest area. Once we've cropped the image down to that, we can isolate the art by knowing the ratios used to layout the cards. This process is shown below.

Original image
Original image
Contours
Contours
Isolated card
Isolated card
Isolated art
Isolated art

The third and most difficult step, effectively comparing the images, will have to wait until I've got more time to write. Soon!

Python: Filtering lists for value

Update: The way I was pulling price data (the Deckbrew API) has changed the format for data coming back, and I haven't updated my code, so the tool no longer functions. If you're looking for a intuitively laid out mtg price listing by set, I highly recommend the visualizers and listings over at mtg.dawnglare.com!

I've worked with Python for years now, but had never really pursued the idea of running it server-side to generate dynamic content. I came across a problem that was suited for it recently. I had a box of ~5000 Magic: The Gathering commons and uncommons from older sets, and I didn't have a good idea of which were worth something compared to the rest of the chaff. I found the Deckbrew API, and was able to make calls to that by grabbing a well-shaped URL. It was a short path to putting together a simple HTML form for specifying the filter and threshold values as well as displaying the results.

The resulting tool can be found here!

There are certainly bugs, primarily that I've already encountered cards that are listed, but have no listed price for the printing of interest. I'm thinking to work around this by picking the value of an alternate printing, but haven't bothered to work that out yet.

Jones Vectors and Visualizing Polarization

I recently had an assignment wherein I was presented with a handful of polarization states and asked to estimate the Jones vector for each one. Usually this isn't so bad, as there is only so much variation between linear, elliptical, and circular, however they also asked that I add the appropriate phase such to have the vector indicate a specific point in the trajectory. I spent some time thinking it through, but did finally decide to spend a bit of time writing a program to check my answers. Parametric plotting in Python turned out to be trivial, so it went quick. The result of that is this script, which creates two functions. One of them, jones_plot(), will take in a two-element Jones vector and generate a 2D plot of the trajectory of the tip of the electric field vector. That is, circular polarization will draw graphs of circles, and so on. The second of them, jones_check(), takes in the same Jones vector, but returns a string describing the nature of the polarization.


jones_check([exp(-1j*pi/2), -0.5j])
=> 'Linear polarization at 26.565051 degrees CCW from x-axis'
jones_check([exp(-1j*pi/4), -0.5j])
=> 'Right elliptical polarization, rotated with respect to the axes'

 

The jones_plot() output for [ exp(-1j*pi/4), -0.5j ]
The jones_plot() output for [ exp(-1j*pi/4), -0.5j ]. Red dot indicates phase = 0, green dot indicates phase = pi/5
The code is probably not bullet-proof, but it provided a good means for practicing manipulating the vectors and understanding how to interpret them. For the curious, the sign convention used is decreasing phase in the style of Hecht's Optics 4th edition.