Trinity College provides copies of its Hall menus online. These are in the form of PDF files, each representing the menu for one week as a table with one column per day.
It would be more convenient to be able to see just the menu choices for the current day, so I created a page to do this.
My first thoughts were that I could write a script to download the menu for the coming week, and then use ImageMagick to extract rectangular regions corresponding to each day. However, the width and height of columns in the table changes, so it is necessary to find the positions of the line in the table separately for each PDF. This task is greatly simplified by the fact that the lines are horizontal or vertical, and their length is a large proportion of the height/width of the image. This suggests that they can be found by looking for rows and columns in the image containing a large number of black pixels.
To test if this idea was feasible, I downloaded a menu, and converted it to a png with ImageMagick (
convert -density 600x600 menu.pdf menu.png), then did some investigating with Matlab:
I = imread('menu.png'); % Load the image imggrey = I(:,:,1); % Extract only the blue channel (the menu is B&W, so each pixel has equal values of R/G/B) plot(sum(imggrey,1)); % Plot the sum of the values in imggrey down each column plot(sum(imggrey,2)); % Plot the sum of the values in imggrey across each row
Repeating this for a few menus and looking at the resulting plots confirm that the lines can be easily located by summing the intensities all pixels in each row and column, then identifying the sums that are below a threshold.
To actually do the processing, I wrote a small program in C++, using the CImg library, a “small, open source, C++ toolkit for image processing”. I chose this library because it is lightweight and portable: there is a single header file to include in my program, with no dependencies or library installation problems to contend with.