Skip to the code: Git repository here
I got it into my head a little while back, the idea perhaps planted by a book-and-spreadsheet loving friend, that I should consider cataloguing our entire library. In part this was to provide friends with a handy reference of which books to avoid gifting us, since that’s an always popular option around the holidays, but also to provide a place to store notes. It would be interesting to have jotted down when we read a particular book, what we thought of it at the time, and maybe the recommender who convinced us to give it a read.

It would’ve been easy enough, if incredibly tedious, to simply type in the details of every book into a spreadsheet and call it done. At this point I’m entirely convinced my efforts to automate the process have taken longer than simply typing them in would have, but I rationalize that adding new books will be easier, and the process of learning some new tools and writing a bit of code was far more enjoyable than filling my evenings with mindless typing.
The framework that first came to mind was simple enough; snap photos of every book’s barcode, use those barcodes to find the ISBN, use the ISBN to grab the meta-data, and populate a row of the spreadsheet with those strings. I would learn a few things along the way that complicated that flow, but the general skeleton of it survived the process intact.
The first step was to find a way to isolate the barcode from a photo and decode it into a string, this was luckily already an existing function in pyzbar. Next was to use this to find an ISBN - this turned out to be unnecessary (usually) since most books use the ISBN as the product identifier right in the barcode, the stored string literally is the ISBN. I'll get into the caveats to this later. Lastly was to use the ISBN to grab the meta-data, which conveniently is available through isbnlib, another freely available python library. If all of those steps succeed, then all that's left to do is write a row to a CSV file. I did play around with a few database options and fancy libraries, but ended up settling on using the barebones Python open() and write() functions for plaintext. The plan was to generate CSV files that could be imported into any spreadsheet software easily, I'm unlikely to own a million books, so no need to complicate this step further.
The core of the program was basically done, it worked on the twenty or so books I considered a representative sample (foreshadowing: it was not representative), so I powered ahead on building a GUI interface. Even with that small subset it was obvious I would want to be able to tweak the details before inserting each record into the file. Many times the data returned by isbnlib.meta() was incomplete, and there were several sources that function can be set to choose from - manual review wasn't avoidable, still easier than typing literally every bit by hand though.
Honestly, I'd never bothered with complicated GUIs in Python before, and certainly not since Python 3 became the default. After reviewing the libraries, and there are so many, I settled on wxPython and hunted down some basic tutorial videos. It took a while, but eventually I had a tenuous handle on frames, panels, sizers and the like. I'm sure any deft wxPython use would find my code miserable, but it does run, and with a little window re-sizing, worked for all the images I fed it.
At this point I had to decide what the workflow of this program was going to look like; I settled on the following: the user will specify an output CSV file for the results, then point the program at a directory containing jpeg files, then the program will populate a selectable list with those file names. The user then can click on each file name, causing the program to display the image and populate the ISBN field automatically. A button then triggers the meta-data look-up, which populates the remaining fields. After review, they can click to write the row to file and remove the selected file from the list, also moving the image file to a /Success/ directory, repeat until the list is empty and all your books are in the CSV. I also added an option to skip a photo, moving it to a /Skip/ directory without writing any record, so unparsable images wouldn't continually end up at the top of the queue. Generally speaking, that worked great.

Now for the snags. Firstly, there's a field in that screenshot that isbnlib.meta() does not provide: the subject tags. Interestingly enough, it's not trivial to discover if a given ISBN is a mystery novel, a comic book, or even simply fiction or nonfiction. There are paid services that will return detailed tags (ISBNdb seemed to be roughly what I was after), but aside from that, I was left searching. Eventually I found that OpenLibrary has user-curated tags associated with each of their entries, but isbnlib.meta() doesn't retrieve them, so I wrote my own handler (just grabbing the url to text, parsing it as JSON, and joining the bits grouped under 'subjects').
This worked, but "user curated" inevitably means practically random. Some books would have no subjects tagged at all, others would have pages of tags, some tagged in other languages, some tags were control codes for someone else's scheme. It was, and remains, the wonkiest part of my flow. In practice I manually reviewed and cleaned up these tags before inserting each record, and even build a set of buttons to add the most common tags quickly. I have some thoughts to do a housekeeping down the line where I look at the frequency of each given tag and cull the nonsensical ones that snuck past me, but that's a task for some other weekend.
Most of the other stumbling blocks I hit had to do first with older books, back from before the ISBN standard was created in 1969, though in reality there were plenty of books lacking an ISBN through the 1980s. Thankfully the switch from ISBN-10 to ISBN-13 that happened in 2007 is essentially transparent to my program's workflow, either work fine. The second issue was non-English books catalogued in pre-ISBN systems, or simply not catalogued at all. In both cases, these had to be entered manually, usually requiring a minute or two to search up the details of publishing date, edition, etc.
Aside from those cases, a lot of failure states had to be accounted for in the flow. Was the photo subpar and a barcode couldn't be found? Was the ISBN entered invalid? Did the meta() function fail to return anything at all? I eventually added a status bar to the program to provide these sorts of messages in a less obtrusive way - even just using it myself I found pop-ups insufferable.

While there is still some clean up to be done, adding new books is a breeze and I can maintain a live inventory on Google Sheets pretty painlessly. It also doesn't hurt to keep track of who got lent which book when, or who lent me a book that I've failed to read for ages!
As for the code; I'm not sure the cleanest way to share it, I've never really messed with GitHub before, but it seems cleaner than simply tossing a IPython notebook file up without any other info. Behold, my first git repository.
The next big project in the queue is building a set of custom bookshelves to house these ~1,500 books, which will be laborious in an entirely different and interesting way!












![The jones_plot() output for [ exp(-1j*pi/4), -0.5j ]](https://i0.wp.com/www.alexander-miles.com/wp-content/uploads/jones_plot_example.png?resize=300%2C226)