Coding a Middle Ground: ImageGrid

Openness is the sacred cow of the digital humanities. Making data publicly available, writing open-source code, or publishing in open-access journals are not just ideals, but often the very glue that binds the field together. It’s one of the aspects of digital humanities that I find most appealing. Despite this, I have only slowly begun to put this ideal into practice. Earlier this year, for instance, I posted over one hundred book summaries I had compiled while studying for my qualifying exams. Now I’m venturing into the world of open-source by releasing a program I used in a recent research project.

The program tries to tackle one of the fundamental problem facing many digital humanists who analyze text: the gap between manual “close reading” and computational “distant reading.” In my case, I was trying to study the geography within a large corpus of nineteenth-century Texas newspapers. First I wrote Python scripts to extract place-names from the papers and calculate their frequencies. Although I had some success with this approach, I still ran into the all-too-familiar limit of historical sources: their messiness. Namely, nineteenth-century newspapers are extremely challenging to translate into machine-readable text. When performing Optical Character Recognition (OCR), the smorgasbord nature of newspapers poses real problems. Inconsistent column widths, a potpourri of advertisements, vast disparities in text size and layout, stories running from one page to another – the challenges go on and on and on. Consequently, extracting the word “Havana” from OCR’d text is not terribly difficult, but writing a program that identifies whether it occurs in a news story versus an advertisement is much harder. Given the quality of the OCR’d text in my particular corpus, deriving this kind of context proved next-to-impossible.

The messy nature of digitized sources illustrates a broader criticism I’ve heard of computational distant reading: that it is too empirical, too precise, and too neat. Messiness, after all, is the coin of the realm in the humanities – we revel in things like context, subtlety, perspective, and interpretation. Computers are good at generating numbers, but not so good at generating all that other stuff. My computer program could tell me precisely how many times “Chicago” was printed in every issue of every newspaper in my corpus. What it couldn’t tell me was the context in which it occurred. Was it more likely to appear in commercial news? Political stories? Classified ads? Although I could read a sample of newspapers and manually track these geographic patterns, even this task proved daunting: the average issue contained close to one thousand place-names and stretched more than 67,000 words (or, longer than Mrs. Dalloway, Fahrenheit 451, and All Quiet on the Western Front). I needed a middle ground. I decided to move backwards, from the machine-readable text of the papers to the images of the newspapers themselves. What if I could broadly categorize each column of text according both to its geography (local, regional, national, etc.) and its type of content (news, editorial, advertisement, etc.)? I settled on the idea of overlaying a grid onto the page image. A human reader could visually skim across the page and select cells in the grid to block off each chunk of content, whether it was a news column or a political cartoon or a classified ad. Once the grid was divided up into blocks, the reader could easily calculate the proportions of each kind of content.

My collaborator, Bridget Baird, used the open-source programming language Processing to develop a visual interface to do just that. We wrote a program called ImageGrid that overlaid a grid onto an image, with each cell in the grid containing attributes. This “middle-reading” approach allowed me a new access point into the meaning and context of the paper’s geography without laboriously reading every word of every page. A news story on the debate in Congress over the Spanish-American War could be categorized primarily as “News” and secondarily as both “National” and “International” geography. By repeating this process across a random sample of issues, I began to find spatial patterns.

Grid with primary categories as colors and secondary categories as letters

For instance, I discovered that a Texas paper from the 1840s dedicated proportionally more of its advertising “page space” to local geography (such as city grocers, merchants, or tailors) than did a later paper from the 1890s. This confirmed what we might expect, as a growing national consumer market by the end of the century gave rise to more and more advertisements originating from outside of Texas. More surprising, however, was the pattern of international news. The earlier paper contained three times as much foreign news (relative “page space” categorized as news content and international geography) as did the later paper in the 1890s. This was entirely unexpected. The 1840s should have been a period of relative geographic parochialism compared to the ascendant imperialism of the 1890s that marked the United States’s noisy emergence as a global power. Yet the later paper dedicated proportionally less of its news to the international sphere than the earlier paper. This pattern would have been otherwise hidden if I had used either a close-reading or distant-reading approach. Instead, a blended “middle-reading” through ImageGrid brought it into view.

We realized that this “middle-reading” approach could be readily adapted not just to my project, but to other kinds of humanities research. A cultural historian studying American consumption might use the program to analyze dozens of mail-order catalogs and quickly categorize the various kinds of goods – housekeeping, farming, entertainment, etc. – marketed by companies such as Sears-Roebuck. A classicist could analyze hundreds of Roman mosaics to quantify the average percentage of each mosaic dedicated to religious or military figures and the different colors used to portray each one.

Inspired by the example set by scholars such as Bethany NowviskieJeremy Boggs, Julie Meloni, Shane Landrum, Tim Sherratt, and many, many others, we released ImageGrid as an open-source program. A more detailed description of the program is on my website, along with a web-based applet that provides an interactive introduction to the ImageGrid interface. The program itself can be downloaded either on my website or on its GitHub repository, where it can be modified, improved, and adapted to other projects.

The Mobile Historian

The rocketing ascent of mobile technology was one of the fundamental shifts of 2008, and many market analysts predict it will only continue throughout 2009. Its rise seems to be following a two-tracked progression: individuals in developing countries are latching onto increasingly affordable mobile phones as a way to log in to a wider network, while wealthier consumers fascinated by the ability to take their online experience on-the-go are snatching up smartphones at a shocking rate (to the point where the smartphone industry appears to be recession resistant). This environment creates an intriguing medium for historians to refine and improve their craft, and the time is ripe for innovation.

Some historians have been leading the charge in utilizing this technology. Bill Turkel has been a pioneer in applying new methods in place-based computing to the field of history. Meanwhile, the majority of similar efforts fall under the sphere of public history. Some museums have long been experimenting with “electronic curators,” or hand-held audio devices that emit information about an aspect of the exhibit depending on where its carrier is standing. Cultural heritage sites, particularly battlefields and/or national parks, have quickly recognized the potential for GPS-enabled devices that guide visitors through a site. Finally, some history educators are experimenting with ways to engage their students using portable technology, including fieldwork and visitations.

Dave Lester, of George Mason University’s CHNM, presented “Mobile Historical Landscapes: Exposing and Crowdsourcing Historical Landmarks” in early April at the American Association for History and Computing conference. Dave’s is currently working on a project called HistoryPlot to encourage user participation in exploring and contributing to a knowledge bank of historical places. The idea is that roving bands of history enthusiasts could visit sites, pull out their iPhone, learn about some of its history, and possibly add both information and multimedia to the site by snapping pictures and/or uploading content – creating a kind of Yelp for the historically-minded. Dave’s project draws upon two specific advantages: 1) the participatory culture of crowdsourcing, and 2) the increasing ubiquitousness of mobile technology

Dan Cohen recently explored the advantage of crowdsourcing when he posted a historical puzzle on his blog at the start of a presentation, which asked people to identify the following picture using minimal clues:

He simultaneously sent out the puzzle via Twitter by asking his 1,600 followers to try to solve it in the next hour. The speed with which Dan got answers was impressive, with an initial correct answer coming in 9 minutes. Although he admits he should have made the puzzle a bit more difficult, the process was successful in highlighting the immense advantages of crowdsourcing historical problems using a fluid and mobile platform such as Twitter.

The growth of a mobile culture in which users are constantly connected magnifies the power of crowdsourcing. Dan’s experiment rested on the assumption that a certain number of his followers would be online and checking their tweets, and enough of them would then be able to use the internet to access his blog, read the clue, and search for the answer online. Two or three years ago, the chances of receiving an answer in 9 minutes would be much, much slimmer. A mobile culture removes barriers to accessing information, and simultaneously increases users’ expectations for accessing that information, many of whom no longer tolerate being shackled by outlets, ethernet cords, or wireless signals.

Consequently, mobile technology is redefining our social conception of space and place, and this has corresponding ramifications for historians. It revisits the fundamental relationship between a physical location and what happened in the past within that space, a relationship with which spatial and geographic historians continuously grapple. This shift is opening up a two-way street for historical researchers. On the one hand, a mobile culture allows efforts such as Dave Lester’s to shed light on previously inaccessible areas. Suddenly, a historian researching a far-away site might be able to “travel” there by looking at uploaded pictures and documents, trading emails or tweets with other researchers who have visited the place, or watching the video of a history enthusiast on vacation at the site.

On the other hand, those shifting expectations that accompany a mobile culture can also turn themselves on historical researchers. A mobile society might question the reliability of a solitary historian writing abstractly about a place they have never actually been to. A constantly connected audience will start to expect the kind of intimate access and exploration that can only be gained from hands-on visitation. A readership conditioned to read reviews on Amazon or tourists’ travel blogs will increasingly dismiss the authority of a specialist who has never visited a location they describe, even if they are describing its past. Audiences will continue to tolerate a historian’s inability to time-travel; they will not continue to tolerate an inability to place-travel.

Fortunately, mobile technology can also create a mobile historian. Imagine a historian writing about shifting gender roles on the Oklahoma Chickasaw reservation during the Dust Bowl. Armed with a laptop, digital camera, and smartphone, the historian can travel to Oklahoma and go to the reservation itself. Once there, traditional archival research is greatly enhanced by technology. Instead of lugging around 3×5 index cards, Zotero can speed up and digitize the note-taking process. The digital camera can capture documents for later perusal, allowing them to find more sources in a shorter amount of time. Is the researcher suddenly curious about gender demographics for a particular town near the reservation, or wants to understand the background to a religious ceremony referenced in a court record? They can use their smartphone to look up census data or send out queries to colleagues likely receive a rapid answer to their question.

Leaving the archives, the historian can dip into oral history by interviewing locals and recording their memories on the smartphone or digital recorder. The smartphone’s GPS capabilities allow him or her to not only locate the homes of the interviewees, but to flag and mark locations to look for spatial patterns at a later date – what if all the traditional “male” venues on a reservation were located on a specific street, while “female” venues were spread over a greater area? The GPS ability of a smartphone can capture these on-the-ground patterns. Finally, the mobile historian can quickly send out updates on their progress, receiving feedback and suggestions from a remote crowd of like-minded researchers, students, assistants, or colleagues.

Mobile technology (like all technology) is not a magic pill that will suddenly transform the historical profession. There are certainly drawbacks. First and foremost exists a strong economic barrier to entry. Already struggling for travel stipends and fellowship money, many historians won’t be able to afford a brand-new iPhone or high-quality digital camera. Those who aren’t already comfortable with mobile technology will often feel overwhelmed or at an unfair disadvantage. On a more abstract level, technology and its inherent distractions can sometimes construct blinders to one of the most important advantages to visiting a place in person: the ability to feel the sense of place, to listen to the wind and hear the accents and taste the food, a decidedly fuzzy process that adds crucial depth and richness to the historian’s understanding of their subject.

As technology itself becomes more refined and more sophisticated, the possibilities for innovation and exploration will continue to expand. As with any new methodology, the traditional skills and strengths of a historian will not fade into obsolescence. Instead, they’ll be ever more critical to the process of responsibly incorporating new techniques and approaches into the broader historical fold. If this process is even moderately successful, the future of the mobile historian appears bright.

Open Letter to a Future Thesis Writer

Dear Junior History Major,

It’s that time of year again. You’ve probably returned from spring break, hopefully in one piece and with your liver only a little worse for wear. Maybe you’re terrified by the senior history majors gliding across campus like ghosts, baggy-eyed and shell-shocked from the prospect of finishing writing starting their theses in the next four weeks. That will be you in one year’s time. But for now, you are just coming to the beginning of the thesis road, and wondering how to start walking down it. Here’s my step-by-step guide to making sure you get off on the right foot:

1. THINK about your interests

Treat it like an assignment – go to the library, the coffee shop, the bar, the gym, wherever it is that you get your best thinking done. Think back on the past couple of years, and write down every book, article, movie, lecture, discussion, or passing comment that has struck you as a topic you are really, truly interested in. Were you absolutely drawn into that lecture on Qing China? Scintillated by reading King Leopold’s Ghost? Avid reader of Jane Austen?  Presumably you became a history major for a reason – you enjoy studying history. Include everything. Don’t stop to think about whether “that article in Newsweek about ‘Dark Knight’ being a totally badass movie” is a plausible research topic – jot it down anyway, and move on to the next one. Keep that list handy, and add to it whenever you think of something else.

Finding an idea that interests you is half the battle in choosing a good thesis topic, and arguably the most important step you can take. The cliche that writing a thesis is like being in a relationship is largely true – you will be spending a ridiculous amount of time with this topic, and choosing one that you are passionate about will partially determine how much you enjoy writing your thesis. After you’ve got a decent list, sit down and narrow it down to 5-10 topics that most excite you and that you can imagine absolutely consuming for the next year.

2. TALK it over.

With everyone.

Start with your advisor. Email her or him your list of 5-10 topics and ask to set up a time to discuss them. They will (presumably) have a lot of experience in just this sort of advising, and are the single best resource for determining whether a topic is academically feasible. Ask them the following questions about each topic:

– Too narrow, too broad?

– Will there be enough (accessible) source material?

– Will it be difficult to do original research?

– Is this a realistic scope for a senior thesis?

If they don’t openly advocate for one or two of your topics, they should at least help you narrow down the list by eliminating topics that aren’t feasible. From there, talk with other professors in the field you’re looking at. Ask them the same questions. Talk to senior history majors. Talk to friends. Talk to family. As you spend more time talking about the couple of topics you’ve chosen, it will gradually emerge which you are most passionate about, which one you most easily articulate, and ultimately which one you should choose.

3. PLAN your research.

Again, your advisor should help you with this. If they are not going to be your primary reader, have them refer you to another professor. Meet with them and discuss how to begin tackling the topic. Everyone has different approaches to starting research, but many will likely recommend the following:

Start doing some cursory investigative forays into your topic, especially if you don’t have a definitive set of primary sources. Familiarize yourself with the basics – both surrounding historical context and at least a working knowledge of what (if any)  major research has already been done. From there, ask your advisor or reader about avenues to take towards finding primary source material. One of the most important things to find out is location – is it available online? Can your school’s library give you access? Will you have to travel to any distant archives?

Think carefully about how you want to take notes and what has worked for you in the past. I’d stump mightily for the benefits of Zotero, but it ultimately comes down to what you are comfortable with using. Whatever it is, use it. Faithfully. Short of having a photographic memory, methodical note-taking is an absolute lifesaver throughout the entire process, and will end up saving you time and effort.

4. START working.

Like, now.

The upcoming summer is presumably your last summer vacation as an undergraduate, and possibly your last three-month summer vacation for the foreseeable future. You should do everything you can to enjoy and take advantage of it. But here is the trade-off. Starting to work on your thesis during the break is a really, really good idea. Especially if the bulk of your source material is not available on campus, it becomes imperative to get a head start on research over the summer.

Although some of your classmates may have the ability to smoothly conjure out of thin air a brilliant and deeply profound thesis in the last two months before its due date, the majority of us mortals are forced to rely on hard work. There is a surprisingly sticky correlation between the amount of time one spends on their research and the quality of their end product. Hard work on the back-end of the process can not only mask other deficiencies, but it will also save your future self an incredible amount of undue stress and despair. Having said that, take some time off to be a college kid during your last summer vacation. Your thesis will still be there lurking in the shadows like voracious alien monster when you get back.

5. ENJOY it.

Writing a thesis will be the most difficult and rewarding accomplishment of your college career.

Have fun.



Best of luck,

Cameron