Historians of the American West have a county problem. It’s primarily one of geographic size: counties in the West are really, really big. A “List of the Largest Counties in the United States” might as well be titled “Counties in the Western United States (and a few others)” – you have to go all the way to #30 before you find one that falls east of the 100th meridian. The problem this poses to historians is that a lot of historical data was captured at a county level, including the U.S. Census.
San Bernardino County is famous for this – the nation’s largest county by geographic area, it includes the densely populated urban sprawl of the greater Los Angeles metropolis along with vast swathes of the uninhabited Mojave Desert. Assigning a single count of anything to San Bernardino county to is to teeter on geographic absurdity. But, for nineteenth-century population counts in the national census, that’s all we’ve got.
Here’s a basic map of population figures from the 1870 census. You can see some general patterns: central California is by far the most heavily populated area, with some moderate settlement around Los Angeles, Portland, Salt Lake City, and Santa Fe. But for anything more detailed, it’s not terribly useful. What if there was a way to get a more fine-grained look at settlement patterns in these gigantic western counties? This is where my work on the postal system comes in. There was a post office in (almost) every nineteenth-century American town. And because the department kept records for all of these offices – the name of the office, its county and state, and the date it was established or discontinued – a post office becomes a useful proxy to study patterns over time and space. I assembled this data for a single year (1871) and then wrote a program to geocode each office, or to identify its location by looking it up in a large database of known place-names. I then supplemented it with the the salaries of postmasters at each office for 1871. From there, I could finally put it all onto a map:
The result is a much more detailed regional geography than that of the U.S. Census. Look at Wyoming in both maps. In 1870, the territory was divided into five giant rectangular counties, all of them containing less than 5,000 people. But its distribution of post offices paints a different picture: rather than vertical units, it consisted largely of a single horizontal stripe along its southern border.
Similarly, our view of Utah changes from a population core of Salt Lake City to a line of settlement running down the center of the territory, with a cluster in the southwestern corner completely obscured in the census map.
Post offices can also reveal transportation patterns: witness the clear skeletal arc of a stage-line that ran from the Oregon/Washington border southeast to Boise, Idaho.
Connections that didn’t mirror the geographic unit of a state or county tended to get lost in the census. One instance of this was the major cross-border corridor running from central Colorado into New Mexico. A map of post offices illustrate its size and shape; the 1870 census map can only gesture vaguely at both.
The following question, of course, should be asked of my (and any) map: what’s missing? Well, for one, a few dozen post offices. This speaks to the challenges of geocoding more than 1,300 historical post offices, many of which might have only been in existence for a single year or two. I used a database of more than 2 million U.S. place-names and wrote a program that tried to account for messy data (spelling variations, altered state or county boundaries, etc.). The program found locations for about 90% of post offices, while the remaining offices I had to locate by hand. Not surprisingly, they were missing from the database for a reason: these post offices were extremely obscure. Finding them entailed searching through county histories, genealogy message boards, and ghost town websites – a process that is simply not scalable beyond a single year. By 1880, the number of post offices in the West had doubled. By 1890, and it doubled again. I could conceivably spend years trying to locate all of these offices. So, what are the implications of incomplete data? Is automated, 90% accuracy “good enough”?
What else is missing? Differentiation. The salary of a postmaster partially addresses this problem, as the department used a formula to determine compensation based partially on the amount of business an office conducted. But it was not perfectly proportional. If it was, the map would be one giant circle covering everything: San Francisco conducted more business than any other office by several orders of magnitude. As it is, the map downplays urban centers while highlighting tiny rural offices. A post office operates in a kind of binary schema: no office, no people (well, at least very few). If there was an office, there were people there. We just don’t know how many. The map isn’t perfect, but it does start to tackle the county problem in the West.
October 15th marks Ada Lovelace Day, an annual celebration of the achievements of women in science, technology, engineering and maths. As I read through posts commemorating the day, it got me reflecting on my own experience. It’s not just that I admire Ada Lovelace and the women that followed after her. It’s that I quite literally wouldn’t be here without them.
My mom, Bridget Baird, went to an all-women’s college in the late 1960s where she considered majoring in philosophy before switching to mathematics. After getting her PhD, she took a job in the early 1980s at Connecticut College in the math department. She got interested in computer programming, and eventually moved into a joint appointment in the computer science department. Over a three-decade career, her curiosity led her (and her thousands of students along with her) to the intersection of computer science with disciplines as far afield as archaeology, music, dance, and art. Along the way she faced the kinds of systemic discrimination that plagued the entire cohort of women entering male-dominated fields in the 1970s and 1980s. In other ways, she was lucky to have grown up during a time of transition when women began carving out new possibilities to enter those fields. She has spent her entire career mentoring female students and colleagues while vocally pushing her institution and discipline to take a more active role in tackling gender equity.
Although I missed the boat entirely on my mom’s math gene, she did manage to impress on me her fascination with applying computers to solve problems. Five years ago I wrote personal statements for history graduate programs structured around my interest in using technology to study the past. My mom since helped me learn how to program and we eventually ended up collaborating on a couple of projects. I’m one of the few graduate students I know who can call their mother to ask her about Thanksgiving plans and Python modules. I am, in ways I can’t even begin to articulate, a direct beneficiary of the legacy left by women like Ada Lovelace.
Which is why I oscillate between hope and discouragement when I look at around my own disciplinary homes of history and the digital humanities. On the one hand, women have made significant inroads in both fields. There are roughly equal numbers of male and female graduate students in my department. Many of the thought leaders and rising stars of the digital humanities are women, with opportunities and support growing all the time. The kinds of daily overt sexism faced by my mom and other women in her generation have, for the most part, gone the way of transistor radios. But that’s the problem: what remains is an insidious, covert sexism that is much, much harder to uproot.
So, was the postal department always in such terrible fiscal shape? No, not at first. But from the 1840s onward, putting aside the 1990s and early 2000s, deficits were the norm. The next question: What was the geography of deficits? Which states paid more than others? Essentially, who picked up the check?
Because it’s only one table, I manually transcribed the columns into a spreadsheet. At this point, I could turn to ArcGIS to start analyzing the data, maybe merging the table with a shapefile of state boundaries provided by NHGIS. But ArcGIS is a relatively high-powered tool better geared for sophisticated geospatial analysis. What I’m doing doesn’t require all that much horsepower. And, in fact, quantitative spatial relationships (ex. measurements of distance or area) aren’t all that important for answering the questions I’ve posed. There are a number of different software packages for exploring data, but Tableau provides a quick-and-dirty, drag-and-drop interface. In keeping with the nature of data exploration, I’ve purposefully left the following visualizations rough around the edges. Below is a bar graph, for instance, showing the surplus or deficit of each state, grouped into rough geographic regions:
Or, in map form:
Between the map and the bar graph, it’s immediately apparent that:
a) Most states ran a deficit in 1871
b) The Northeast was the only region that emerged with a surplus
So who picked up the check? States with large urban, literate populations: New York, Pennsylvania, Massachusetts, Illinois. Who skipped out on the bill? The South and the West. But these are absolute figures. Maybe Texas and California simply spent more money than Arizona and Idaho because they had more people. So let’s normalize our data by analyzing it on a per-capita basis, using census data from 1870.
The South and the West may have both skipped out on the bill, but it was the West that ordered prime rib and lobster before it left the table. Relative to the number of its inhabitants, western states bled the system dry. A new question emerges: how? What was causing this extreme imbalance of receipts and expenditures in the West? Were westerners simply not paying into the system?
Actually, no. The story was a bit more complicated. On a per-capita basis, westerners were paying slightly more money into the system than any other region. The problem was that providing service to each of those westerners cost substantially more than in any other region: $38 per person, or roughly 4-5 times the cost of service in the east. For all of its lore of rugged individualism and a mistrust of big government, the West received the most bloated government “hand-out” of any region in the country. This point has been driven home by a generation of “New Western” historians who demonstrated the region’s dependence on the federal government, ranging from massive railroad subsidies to the U.S. Army’s forcible removal of Indians and the opening of their lands to western settlers. Add the postal service to that long list of federal largesse in the West.
But what made mail service in the West so expensive? The original 1871 table further breaks down expenses by category (postmaster salaries, equipment, buildings, etc.). Some more mucking around in the data reveals a particular kind of expense that dominated the western mail system: transportation.
High transport costs were partially a function of population density. Many western states like Idaho or Montana consisted of small, isolated communities connected by long mail routes. But there’s more to the story. Beginning in the 1870s, a series of scandals wracked the postal department over its “star” routes (designated as any non-steamboat, non-railroad mail route). A handful of “star” route carriers routinely inflated their contracts and defrauded the government of millions of dollars. These scandals culminated in the criminal trial of high-level postal officials, contractors, and a former United States Senator. In 1881, the New York Times printed a list of the ninety-three routes under investigation for fraud. Every single one of these routes lay west of the Mississippi.
The rest of the country wasn’t just subsidizing the West. It was subsidizing a regional communications system steeped in fraud and corruption. The original question – “Who picked up the check?” – leads to a final cliffhanger: why did all of these frauds occur in the West?
The digital humanities adore labs. Labs both symbolize and enable many of the field’s overarching themes: interdisciplinary teamwork, making/building, and the computing process itself. Labs give digital humanists a science-y legitimation that, whether we admit it or not, we find appealing. Labs aren’t necessary for doing digital humanities research, but in terms of infrastructure, collaboration, and institutional backing they certainly help. Along with “collaboration” and “open” (and possibly “nice“), “lab” is one of the field’s power words. With a period of accelerated growth over the past five years, world-wide digital humanities labs and centers now run into the hundreds. We overwhelmingly focus on labs in this kind of context: labs as physical research spaces. I’d like to move away from this familiar ground to discuss the role of lab assignments within a digital humanities curriculum. While reflecting on my own recent experience of designing and using labs in the classroom, I realized it spoke to many of the current issues facing the digital humanities.
Let me start with some background. This past autumn I taught my first college course, “The Digital Historian’s Toolkit: Studying the West in an Age of Big Data.” It was one of Stanford History Department’s Sources & Methods seminars, which are classes aimed at history majors to get them working intensively with primary sources. When I was designing my course a year ago, I decided to blend a digital humanities curriculum with more traditional historical pedagogy. Under the broad umbrella of the nineteenth-century American West, I used a specific historical theme each week (mining, communications, tourism, etc.) to tie together both traditional analysis and digital methodology. As part of this, over five different class periods students met in the Center for Spatial and Textual Analysis to complete a weekly lab assignment.
In designing the course, I wrestled with a problem that faces every digital humanist: the balancing of “traditional” (for lack of a better term) and “digital.” How much of my curriculum should follow a seminar model based on reading and discussion? How much should it follow a lab model based on technical tools and techniques? As is often the case, pragmatism partially informed my decision. Because my class was part of a required series of courses offered by the department, I couldn’t simply design a full-blown digital humanities methods course. It had to have a strong historical component in order to get approved. This juggling act is not uncommon for digital humanists. But more philosophically, I believed that digital tools were best learned in the context of historical inquiry. An overarching theme (in my case, the late nineteenth-century West) helped answer the question of why a student was learning a particular piece of software. Without it, digital pedagogy can stray into the bugaboo waved about by skeptics: teaching technology for technology’s sake.
I designed my labs with three goals in mind. First, I wanted my students to come away with at least an introduction to technical skills they wouldn’t otherwise get in a typical history course. Given my background, I focused largely on GIS, textual analysis, and visual design. I didn’t expect my students to become geospatial technicians in ten weeks, but I did want them to try out these kinds of methods and understand how they could be applied to historical problems. This first goal speaks to the alarmist rhetoric of a “crisis in the humanities,” of falling enrollments and shrinking budgets and growing irrelevance. In this caricature, the digital humanities often get remade as a life-boat for a sinking ship. This view is obviously overblown. But it is important to remember that the vast majority of our students are not going to end up as professors of history, literature, or philosophy. While there is a strong case to be made for the value of the humanities, I also think we need to do a better job of grafting other kinds of skills onto the field’s reading/writing/thinking foundation.
Second, I wanted students to learn technical skills as part of a larger intellectual framework. I pursued this in part by assigning specific techniques to answer larger questions. For instance, how does Mark Twain’s western novel Roughing It compare to other iconic nineteenth-century works of literature? Instead of assigning thousands of pages of text, I had my students use topic modeling to compare Roughing It to other books such as Uncle Tom’s Cabin and Little Women. But labs were also an effective way to concretize some of the contemporary issues swirling around technology. In one of the labs, students applied different kinds OCR software to a sampling of pages from an Overland Trail diary they had read earlier in the week. This gave them a chance to peer behind the curtain of large-scale digitization projects. When you experience first-hand just how many words and characters the OCR process can miss, it makes you think more critically about resources like Google Books or LexisNexis. Teaching in the digital humanities should, in part, force students to think critically about the issues surrounding the tools we use: copyright, access, marginalization.
Finally, I wanted students to learn by doing. There’s a certain passive mode of learning endemic to so many humanities courses: go to lectures, write a few papers, study for an exam, make comments in discussion. Student passivity can be inherent to both the pedagogical form itself and how it’s practiced, as anyone who has sat in a lecture hall or watched a student coast through discussion can tell you. Don’t get me wrong: bad labs can be just as passive as lectures. But done right, they emphasize active learning based on immediate feedback. As much as I’ve soured on the term “hacking” and all the privileged baggage it can carry, it is a useful term to describe the type of learning I want my students to engage in. Try something out. If it doesn’t work, try something else. Under this rubric, mistakes are a necessary part of the process. Feedback is more immediate in a way that enables exploration, tinkering, tangents, and restarts. It’s a lot harder to do this with traditional assignments; trying out something new in a paper is riskier than trying out something new in a lab.
This last goal proved the hardest to meet and constitutes one of the major hurdles facing digital humanities pedagogy. We want to teach digital methods not for their own sake, but to fit them within a broader framework, such as how they help us understand the past. But to get to that point, students need to make a fairly substantial investment of time and energy into learning the basics of a particular tool or technique. I tried to scaffold my lab assignments so that they became less and less prescriptive and more and more open-ended with each passing week. The idea was that students needed heavy doses of step-by-step instruction when they were still unfamiliar with the technology. My first lab, for instance, spelled out instructions in excruciating detail. Unfortunately, this led to exactly the kind of passive learning I wanted to avoid. I liken it to the “tutorial glaze” – focusing so much on getting through individual tasks that you lose track of how they all fit together or how you would apply them beyond the dataset at hand. The ability to teach early-stage technical skills involves a litany of pedagogical challenges that humanities instructors are simply not used to tackling.
By contrast, my final lab gave students a dataset (a map of Denver and enumeration district data from the 1880 census) and asked them to formulate and then answer a historical question through GIS. By nearly any metric – enthusiasm, results, feedback – this proved to be the most effective lab. It forced students to engage in the messy process of digital history: exploring the data enough to formulate a question, returning to the data to answer that question, realizing the data can’t even begin to answer that question, formulating a different question, figuring out how to answer it, and deciding how to visualize an argument. I was even more satisfied with their reflections on the process. Some described the frustrations that came with discovering the limits or gaps in census data. Others remarked on how their own mapmaking decisions, such as changing classification breaks or using different symbology, could completely alter the presentation of their argument. It’s one thing for students to read an essay by J.B. Harley on the subjectivity of maps (which they did). It’s another for students to experience the subjective process of map-making for themselves. Learning by doing: this is what was labs are all about.
To try and help others who want to integrate labs into their curriculum, I’ve made the labs and datasets available to download on the course website. Even as I posted them, though, I was reminded of one last problem facing the digital humanities: the problem of ephemerality. I spent hours and hours designing labs that will likely be unusable in a matter of years. Some of them require expensive software licenses, others rely on tools that could fall completely out of development. That’s one of the downside of labs. Ten years from now, I’ll still be able to re-use my lesson plan for discussing Roughing It. The lab on topic-modeling Twain and other novelists? Doubtful. But ephemerality is one of the necessary costs of teaching digital humanities. Because labs, and the broader pedagogical ethos of the digital humanities they embody, are ultimately worth it.
A history PhD can be thought of as a collection of overlapping areas: coursework, teaching, qualifying exams, and the dissertation itself. The first three are fairly structured. You have syllabi, reading lists, papers, classes, deadlines. The fourth? Not so much. Once you’re advanced to candidacy there’s a sense of finally being cut loose. Go forth, conquer the archive, and return triumphantly to pen a groundbreaking dissertation. It’s exhilarating, empowering, and also terrifying as hell. I’ve been swimming through the initial research stage of the dissertation for the past several months and thought it would be a good time to articulate what, exactly, I’m trying to find. Note: if you are less interested in American history and more interested in maps and visualizations, I would skip to the end.
The Elevator Speech
I’m studying communications networks in the late nineteenth-century American West by mapping the geography of the U.S. postal system.*
The Elevator-Stuck-Between-Floors Speech
From the end of the Civil War until the end of the nineteenth century the US. Post steadily expanded into a vast communications network that spanned the continent. By the turn of the century the department was one of the largest organizational units in the world. More than 200,000 postmasters, clerks, and carriers were involved in shuttling billions of pounds of material between 75,000 offices at the cost of more than $100 million dollars a year. As a spatial network the post followed a particular geography. And nowhere was this more apparent than in the West, where the region’s miners, ranchers, settlers, and farmers led their lives on the network’s periphery. My dissertation aims to uncover the geography of the post on its western periphery: where it spread, how it operated, and its role in shaping the space and place of the region.
My project rests on the interplay between center and periphery. The postal network hinged on the relationship between its bureaucratic center in Washington, DC and the thousands of communities that constituted the nodes of that network. In the case of the West, this relationship was a contentious one. Departmental bureaucrats found themselves buffeted with demands to reign in ballooning deficits. Yet they were also required by law to provide service to every corner of the country, no matter how expensive. And few regions were costlier than the West, where a sparsely settled population scattered across a huge area was constantly rearranged by the boom-and-bust cycles of the late nineteenth century. From the top-down perspective of the network’s center, providing service in the West was a major headache. From the bottom-up perspective of westerners the post was one of the bedrocks of society. For most, it was the only affordable and accessible form of long-distance communication. In a region marked by transience and instability, local post offices were the main conduits for contact with the wider world. Western communities loudly petitioned their Congressmen and the department for more offices, better post roads, and speedier service. In doing so, they redefined the shape and contours of both the network and the wider geography of the region.
The post offers an important entry point into some of the major forces shaping American society in the late nineteenth century. First, it helped define the role of the federal government. On a day-to-day basis, for many Americans the post was the federal government. Articulating the geographic size and scale of the postal system will offer a corrective to persistent caricatures of the nineteenth-century federal government as weak and decentralized. More specifically, a generation of “New Western” historians have articulated the omnipresent role of the state in the West. Analyzing the relationship between center and periphery through the post’s geography provides a means of mapping the reach of federal power in the region. With the postal system as a proxy for state presence, I can begin to answer questions such as: where and how quickly did the state penetrate the West? How closely did it follow on the heels of settler migration, railroad development, or mining industries? Finally, the post was deeply enmeshed in a system of political patronage, with postmasterships disbursed as spoils of office. What was the relationship between a communications network and the geography of regional and national politics?
Second, the post rested on an often contentious marriage between the public and private spheres. Western agrarian groups upheld the post as a model public monopoly. Nevertheless, private hands guided the system’s day-to-day operations on its periphery. Payments to mail-carrying railroad companies became the department’s single largest expenditure, and it doled out millions of dollars each year to private contractors to carry the mail in rural areas. This private/public marriage came with costs – in the early 1880s, for instance, the department was rocked by corruption scandals when it discovered that rural private contractors had paid kickbacks to department officials in exchange for lavish carrying contracts. How did this uneasy alliance of public and private alter the geography of the network? And how did the department’s need to extend service in the rural West reframe wider debates over monopoly, competition, and the nation’s political economy?
Getting Off The History Elevator
That’s the idea, at least. Rather than delve into even greater detail on historiography or sources, I’ll skip to a topic probably more relevant for readers who aren’t U.S. historians: methodology. Digital tools will be the primary way in which I explore the themes outlined above. Most obviously, I’m going to map the postal network. This entails creating a spatial database of post offices, routes, and timetables. Unsurprisingly, that process will be incredibly labor intensive: scanning and georeferencing postal route maps, or transcribing handwritten microfilmed records into a database of thousands of geocoded offices. But once I’ve constructed the database, there are any number of ways to interrogate it.
To demonstrate, I’ll start with lower-hanging fruit. The Postmaster General issues an annual report providing (among other information) data on how many offices were established and discontinued in each state. These numbers are fairly straightforward to put into a table and throw onto a map. Doing so provides a top-down view of the system from the perspective of a bureaucrat in Washington, D.C. For instance, by looking at the number of post offices discontinued each year it’s possible to see the wrenching reverberations of the Civil War as the postal system struggled to reintegrate southern states into its network in 1867:
Post Offices Discontinued By State, 1867
(Source: Annual Report of the Postmaster General, 1867)
The West, meanwhile, was arguably the system’s most unstable region. As measured by the percentage of its total offices that were either established or discontinued each year, states such as New Mexico, Colorado, and Montana were continually building and dismantling new nodes in the network.
Post Offices Established or Discontinued as a Percentage of Total Post Offices in State, 1882
(Source: Annual Report of the Postmaster General, 1882)
Of course, the broad brush strokes of national, year-by-year data only provide a generalized snapshot of the system. I plan on drilling down to far more detail by charting where and when specific post offices were established and discontinued. This will provide a much more fine-grained (both spatially and temporally) view of how the system evolved. Geographer Derek Watkins has employed exactly this approach:
Derek’s map demonstrates the power of data visualization: it is compelling, interactive, and conveys an enormous amount of information far more effectively than text alone. Unfortunately, it also relies on an incomplete dataset. Derek scraped the USPS Postmaster Finder, which the USPS built as a tool for genealogists to look up postmaster ancestors. The USPS historian adds to it on an ad-hoc basis depending on specific requests by genealogists. In a conversation with me, she estimated that it encompasses only 10-15% of post offices, and there is no record of what has been completed and what remains to be done. Derek has, however, created a robust data visualization infrastructure. In a wonderful demonstration of generosity, he has sent me the code behind the visualization. Rather than spending hours duplicating Derek’s design work, I’ll be able to plug my own, more complete, post office data into a beautiful existing interface.
Derek’s generosity brings me back to my ongoing personal commitment to scholarly sharing. I plan on making the dissertation process as open as possible from start to finish. Specifically, the data and information I collect has broad potential for applications beyond my own project. As the backbone of the nation’s communications infrastructure, the postal system provides rich geographic context for any number of other historical inquiries. Cameron Ormsby, a researcher in Stanford’s Spatial History Lab, has already used post office data I collected as a proxy for measuring community development in order to analyze the impact of land speculation and railroad construction in Fresno and Tulare counties.
To kick things off, I’ve posted the state-level data I referenced above on my website as a series of CSV files. I also used Tableau Public to generate a quick-and-dirty way for people to interact with and explore the data in map form. This is an initial step in sharing data and I hope to refine the process as I go. Similarly, I plan on occasionally blogging about the project as it develops. Rather than narrowly focusing on the history of the U.S. Post, my goal (at least for now) is to use my topic as a launchpad to write about broader themes: research and writing advice, discussions of digital methodology, or data and visualization releases.
*By far the most common response I’ve received so far: “Like the Pony Express?” Interestingly, the Pony Express was a temporary experiment that only existed for about eighteen months in 1860-1861. In terms of mail carried, cost, and time in existence, it was a tiny blip within the postal department’s operations. Yet it has come to occupy a lofty position in America’s historical memory and encapsulates a remarkable number of the contradictions and mythologies of the West.
Below is a Tableau visualization showing the number of post offices in each state for each year. Note that the visualization is a quick-and-dirty approach that uses modern political boundaries. This causes particular problems with western territories. North and South Dakota, for instance, were grouped as Dakota Territory until 1889 while Oklahoma peeled off from “Indian Territory” in 1891. Information for these pre-modern-statehood areas are not visualized on the map, but the full dataset can be downloaded here.
I am interested in what patterns you might find, so please post any and all observations in the comments section below!
– TotalOffices: number of post offices in that state for that year
– Established: number of post offices established during that year
– Discontinued: number of post offices discontinued during that year
– DisPercent: discontinued post offices as a percentage of total post offices
– EstabPercent: established post offices as a percentage of total post offices
– DisEstabPercent: combined established and discontinued post offices as a percentage of total post offices.
Source: The Annual Report of the Postmaster General, from 1867-1902. (Example page)
Date Created: 8/28/2012
Date Modified: 8/28/2012
Openness is the sacred cow of the digital humanities. Making data publicly available, writing open-source code, or publishing in open-access journals are not just ideals, but often the very glue that binds the field together. It’s one of the aspects of digital humanities that I find most appealing. Despite this, I have only slowly begun to put this ideal into practice. Earlier this year, for instance, I posted over one hundred book summaries I had compiled while studying for my qualifying exams. Now I’m venturing into the world of open-source by releasing a program I used in a recent research project.
The program tries to tackle one of the fundamental problem facing many digital humanists who analyze text: the gap between manual “close reading” and computational “distant reading.” In my case, I was trying to study the geography within a large corpus of nineteenth-century Texas newspapers. First I wrote Python scripts to extract place-names from the papers and calculate their frequencies. Although I had some success with this approach, I still ran into the all-too-familiar limit of historical sources: their messiness. Namely, nineteenth-century newspapers are extremely challenging to translate into machine-readable text. When performing Optical Character Recognition (OCR), the smorgasbord nature of newspapers poses real problems. Inconsistent column widths, a potpourri of advertisements, vast disparities in text size and layout, stories running from one page to another – the challenges go on and on and on. Consequently, extracting the word “Havana” from OCR’d text is not terribly difficult, but writing a program that identifies whether it occurs in a news story versus an advertisement is much harder. Given the quality of the OCR’d text in my particular corpus, deriving this kind of context proved next-to-impossible.
The messy nature of digitized sources illustrates a broader criticism I’ve heard of computational distant reading: that it is too empirical, too precise, and too neat. Messiness, after all, is the coin of the realm in the humanities – we revel in things like context, subtlety, perspective, and interpretation. Computers are good at generating numbers, but not so good at generating all that other stuff. My computer program could tell me precisely how many times “Chicago” was printed in every issue of every newspaper in my corpus. What it couldn’t tell me was the context in which it occurred. Was it more likely to appear in commercial news? Political stories? Classified ads? Although I could read a sample of newspapers and manually track these geographic patterns, even this task proved daunting: the average issue contained close to one thousand place-names and stretched more than 67,000 words (or, longer than Mrs. Dalloway, Fahrenheit 451, and All Quiet on the Western Front). I needed a middle ground. I decided to move backwards, from the machine-readable text of the papers to the images of the newspapers themselves. What if I could broadly categorize each column of text according both to its geography (local, regional, national, etc.) and its type of content (news, editorial, advertisement, etc.)? I settled on the idea of overlaying a grid onto the page image. A human reader could visually skim across the page and select cells in the grid to block off each chunk of content, whether it was a news column or a political cartoon or a classified ad. Once the grid was divided up into blocks, the reader could easily calculate the proportions of each kind of content.
My collaborator, Bridget Baird, used the open-source programming language Processing to develop a visual interface to do just that. We wrote a program called ImageGrid that overlaid a grid onto an image, with each cell in the grid containing attributes. This “middle-reading” approach allowed me a new access point into the meaning and context of the paper’s geography without laboriously reading every word of every page. A news story on the debate in Congress over the Spanish-American War could be categorized primarily as “News” and secondarily as both “National” and “International” geography. By repeating this process across a random sample of issues, I began to find spatial patterns.
For instance, I discovered that a Texas paper from the 1840s dedicated proportionally more of its advertising “page space” to local geography (such as city grocers, merchants, or tailors) than did a later paper from the 1890s. This confirmed what we might expect, as a growing national consumer market by the end of the century gave rise to more and more advertisements originating from outside of Texas. More surprising, however, was the pattern of international news. The earlier paper contained three times as much foreign news (relative “page space” categorized as news content and international geography) as did the later paper in the 1890s. This was entirely unexpected. The 1840s should have been a period of relative geographic parochialism compared to the ascendant imperialism of the 1890s that marked the United States’s noisy emergence as a global power. Yet the later paper dedicated proportionally less of its news to the international sphere than the earlier paper. This pattern would have been otherwise hidden if I had used either a close-reading or distant-reading approach. Instead, a blended “middle-reading” through ImageGrid brought it into view.
We realized that this “middle-reading” approach could be readily adapted not just to my project, but to other kinds of humanities research. A cultural historian studying American consumption might use the program to analyze dozens of mail-order catalogs and quickly categorize the various kinds of goods – housekeeping, farming, entertainment, etc. – marketed by companies such as Sears-Roebuck. A classicist could analyze hundreds of Roman mosaics to quantify the average percentage of each mosaic dedicated to religious or military figures and the different colors used to portray each one.
The provocative title of Stephen Marche’s Atlantic article, “Is Facebook Making Us Lonely?” invites immediate skepticism as the latest iteration in the sub-genre of technologicalalarmism about the internet. Like much of this literature, Marche’s writing is far more thoughtful and measured than his simplistic title would indicate. He admits, for instance, that “Loneliness is certainly not something that Facebook or Twitter or any of the lesser forms of social media is doing to us. We are doing it to ourselves.” He also makes the interesting point that Facebook requires a relentless and exhausting performative dance on a digital stage. But he also makes some problematic claims. A range of responses have critiqued Marche’s use of studies and statistics, but what caught my eye was Marche’s use of history. In one passage, worth quoting at length, Marche writes:
Loneliness is at the American core, a by-product of a long-standing national appetite for independence: The Pilgrims who left Europe willingly abandoned the bonds and strictures of a society that could not accept their right to be different. They did not seek out loneliness, but they accepted it as the price of their autonomy. The cowboys who set off to explore a seemingly endless frontier likewise traded away personal ties in favor of pride and self-respect. The ultimate American icon is the astronaut: Who is more heroic, or more alone? The price of self-determination and self-reliance has often been loneliness. But Americans have always been willing to pay that price.
Self-invention is only half of the American story, however. The drive for isolation has always been in tension with the impulse to cluster in communities that cling and suffocate. The Pilgrims, while fomenting spiritual rebellion, also enforced ferocious cohesion. The Salem witch trials, in hindsight, read like attempts to impose solidarity—as do the McCarthy hearings. The history of the United States is like the famous parable of the porcupines in the cold, from Schopenhauer’s Studies in Pessimism—the ones who huddle together for warmth and shuffle away in pain, always separating and congregating.
I always get annoyed when historians mount their high horses to harumph about how Americans don’t know anything about history. But indulge me for one paragraph while I do just that. There are two major problems with Marche’s use of history here. First, it’s inaccurate. There’s a big difference between “loneliness,” “independence” “self-determination” and “self-reliance,” but Marche seems to conflate them all together. The Pilgrims were more about religious reform than religious independence, and leaving one place for another place doesn’t make you lonely. Or alone. Or independent. Or self-reliant. As Marche himself admits, they also pursued their “spiritual rebellion” in an intensely communal manner.
Then there’s the cowboys. Oh boy. A generation of “New Western Historians” have pretty conclusively dispelled the idea of the self-reliant, independent wrangler. Cowboys were always deeply reliant on others: the federal government to remove plains Indians and enforce ranching and riparian rights, or a host of merchants, storekeepers, and meat-packers that inextricably tied them to national and international markets. And I don’t even understand what “traded away personal ties in favor of pride and self-respect” even means.
My problem is less with the accuracy of Marche’s history but in how he uses it. I don’t expect an article in the Atlantic to delve into the historiographical intricacies of the Puritans or the problematic nature of Frederick Jackson Turner’s frontier thesis. What Marche is talking about is American mythology, not some “core” of the American character or “actual” history. If he had made this distinction clearer, it’s a quite relevant and important point. Independence, self-reliance, self-determination: these are cherished ideals that undergird many of the stories Americans tell themselves about their past. And it’s fascinating to think about how these ideals interact with the separate (but related) reality of both loneliness and community in a present-day context.
Alexis de Tocqueville tackled this paradox between individualism and communalism two centuries ago in Democracy in America. The French political thinker toured America in 1831 and wrote an expansive account of American institutions, history, society, and character. A major theme running through Democracy in America was the tension between the individualism produced by a society based on equality with institutions and associations based on communal life. De Tocqueville argued that social equality had the downside of producing immensely self-centered people. In true de Tocqueville fashion, he penned one passage that has a ring of timelessness to it – Marche could have used it word-for-word in his characterization of present-day loneliness:
The first thing that strikes the observation is an innumerable multitude of men, all equal and alike, incessantly endeavoring to procure the petty and paltry pleasures with which they glut their lives. Each of them, living apart, is as a stranger to the fate of all the rest; his children and his private friends constitute to him the whole of mankind. As for the rest of his fellow citizens, he is close to them, but he does not see them; he touches them, but he does not feel them; he exists only in himself and for himself alone; and if his kindred still remain to him, he may be said at any rate to have lost his country.
But de Tocqueville goes on to describe how American society during the Jacksonian era combated the effects of isolation brought about by social equality, perhaps most importantly through associational life: “In no country in the world has the principle of association been more successfully used or applied to a greater multitude of objects than in America. ” Americans in the 1820s and 1830s loved forming groups: political parties, religious sects, reform movements. This was the age of Joseph Smith and Mormonism, massive evangelical revivals, temperance movements, and the American Anti-Slavery Society. So what does it say that one of the most famous historical observers of American society highlighted the intense communalism of that society? My point is not that de Tocqueville was right or wrong, it’s that Americans and critics of American society have always wrestled with the balance between communalism and individualism.
A lack of historicity is my major problem with “Is Facebook Making Us Lonely?”. Marche uses history as a vague, unexamined point of departure for the present, oftentimes veering into trope of a lost “Golden Age.” He cites some studies demonstrating, for instance, that the number of households with one inhabitant has increased from 1950, or that the number of personal confidants decreased from the 1980s to the present. Although Eric Klinenberg thoughtfully disputes Marche’s claim that “various studies have shown loneliness rising drastically over a very short period of recent history,” I’m less concerned with the accuracy of Marche’s claims than his treatment of history itself.
There’s a tendency when writing critiques of present-day society to make a direct implication that things are fundamentally new and are changing for the worse. And this tendency seems to be even more prevalent in diatribes against technology, which operate under an often-unexamined assumption that technology X (the telegraph, the automobile, the Internet, social media) has irrevocably reshaped our world. It’s useful to talk about the effects of technological changes: there are many ways in which Facebook and social media has, in fact, fundamentally changed our society. But too often these articles assume that any and every change is a) something fundamentally new, and b) directly attributable to the technology itself. Marche neatly encapsulates this lack of historicity in two sentences: “Nostalgia for the good old days of disconnection would not just be pointless, it would be hypocritical and ungrateful. But the very magic of the new machines, the efficiency and elegance with which they serve us, obscures what isn’t being served: everything that matters.”
Facebook isn’t magic and the “good old days of disconnection” only exist in our historical imagination. Not only do cowboys have an American Professional Rodeo Association, the group has its own Facebook page. As de Tocqueville reminds us, we’ve wrestled with the contradictions between loneliness, individualism, and communalism for a long, long time. What Facebook has done is change some of the channels and format of these tensions. Like any technology, it needs to be more thoughtfully placed in its historical context. History is not a golden age or a black box or a passive point of departure for a completely new paradigm. Critics of Facebook or Twitter or whatever new technology will be undermining the “American core” in twenty years should do a better job of keeping this in mind.
Earlier this year I uploaded a little over one hundred book summaries onto my website. The short summaries were in many ways the culmination of hundreds and hundreds of hours of reading, note-taking, and studying that I had done in preparation for my qualifying exams. At the end of the process I thought about all of that work that had gone into them and realized that it would be a shame to simply file them away for my own personal use. I might post them to Stanford’s internal history graduate archive, I might forward a few to colleagues or students, but that’s about it.
Book summaries of this kind represent an odd gray zone in the humanities. They’re a slightly more refined version of the scribbled notes that all of us take during classes, workshops, conferences, and colloquia. They aren’t critical reviews meant to stimulate evaluation and debate. They aren’t scholarship. They are certainly not anything I would put in a tenure file. Instead they’re the boring, nuts-and-bolts side of being an historian – ingesting several hundred pages of material and condensing it into a short, schematic summary. But this gray zone is also an important part of our profession. It’s what quals are all about: establishing a command (however tenuous) over the literature in the field. I hear laments all the time about how difficult it is to maintain that command, to keep all of the books in your field crammed into your head. My aim in making these summaries available is to try and aid that process on a superficial level. Nobody is going to be able to write a paper or teach a lecture using these summaries. But they do serve as a quick-and-dirty reference tool: “I keep seeing Kathy Peiss’s Cheap Amusements cited by urban historians. What exactly was it about?” Reading a short, four-hundred word schematic can be a lot faster and easier to digest than searching JSTOR for scholarly reviews, many of which offer only a brief summary of arguments before launching into historiographic debates or evaluative criticism.
I had some minor hesitation about putting the book summaries online. These were written for my own use and, I’m certain, contain the types of inevitable errors that originate from trying to churn out two books a day during the crunch-time of studying for quals. What if I had mischaracterized someone’s argument? What if I missed their point entirely? What if colleagues or job committees or (shudder) the actual author stumbled across those errors? All of these concerns are, of course, patently absurd. The idea of Amy Dru Stanley actually finding, reading, and taking offense at how I summarized her concept of postbellum contract ideology is laughable. More philosophical concerns involve a broader critique of the superficiality of the internet: when you can find everything at the click of a button, what incentive is there to actually learn things on your own? Despite my obvious embrace of technology, I have some sympathy for this viewpoint. The countless hours of reading and synthesizing that went into studying for quals was as valuable (if not more so) for developing the skill of reading books as it was for the content of the books themselves. Yes, a college freshman might take the lazy route and write a (poor) final paper based largely off of these summaries, thereby cheating themselves out of a valuable experience of learning to read and write effectively. But ultimately I had the same reaction to this concern as I did for related critiques of the digital “culture of distraction”: closing off content is not the answer to the problem.
Which brings me to the title of my post: “giving it away.” Like many in the digital humanities, I advocate for openness: open access, open data, open source. This commitment to openness is one of the hallmarks of a field dedicated to collaborative research and expanding the definition of scholarship. But I’m less concerned with humanists who know how to code or are self-professedTHATCamp junkies than I am with people who will never attach the “digital” label to their title. While proselytizing digital humanists sometimes exaggerate the closed-off nature of “traditional” humanities into a straw-man, the fact remains: there isn’t a clear non-technical-humanities avenue for, say, uploading KML files onto GeoCommons or releasing a Ruby script on Sourceforge. A journal article or a monograph represents hundreds and hundreds of hours of work. Yet in the end, all of those notes and documents and transcripts and photographs usually end up just sitting on someone’s hard drive. We don’t have an established model for making them available.
For example, when I was doing research on free blacks in colonial New England, I photographed and transcribed dozens of property deeds from town vaults. I reference many of these in the endnotes of an article, but this is only useful to someone who is able to travel to the local archives of Haddam, Connecticut, pull out a volume of property deeds, and laboriously pore over the hand-written pages. What about a genealogist tracing family roots to central Connecticut? Or a researcher looking into colonial rural debtor patterns? Being able to see and read the documents on their computer is a lot more useful than trying to follow the breadcrumb trail of scholarly endnotes. And that doesn’t even touch on all of the transcriptions and sources that I didn’t end up using in the article. I see this kind of material in the same light as my book summaries: they aren’t original scholarship, but they are a crucial component of my work as an historian that should be shared.
I’m not calling for a robust online repository where people like me can upload humanistic material. I know juuuuust enough about digital archiving to know that this kind of proposal is far, far more complex anyone (myself included) might imagine. Instead, I’m simply calling for humanities scholars to give more things away. I’m less concerned with the particulars of how it’s done than the broader culture that accompanies it. I don’t expect busy professors with limited technical expertise to spend hours and hours formatting their material, meticulously attaching metadata, or publicizing its availability. Archivists and open-access champions might shudder at the thought of individual people tossing poorly-documented material into isolated, worryingly ephemeral online silos. But I think the very process of making things available on an individual level, however ad-hoc or clumsily executed, is an important step for a discipline without a strong culture of openness. The broader humanities (not just the digital humanities) need to develop a stronger disciplinary habit of giving things away.
*This is part two of a series on preparing, studying for, and taking qualifying exams in a history PhD program. See Part I here. After taking my exams in December 2011, I decided to collect my thoughts on the process. The following advice is based on my own experience of taking Stanford’s qualifying oral exams for United States history. The format was a two-hour oral exam, with four faculty members testing four different fields: three standard American history fields (Colonial, Nineteenth Century, and Twentieth Century) and one specialty field (in my case, Spatial and Digital History). Bear in mind that other programs have different purposes, formats, and requirements.*
“Preparing for quals is a full-time job, but there is no reason to put in overtime.” This was one of the best pieces of advice I received when I was asking fellow graduate students about the process. More so than perhaps any other facet of graduate school, studying for quals should be managed like a job. This is for two reasons: to keep pace and to keep sane.
Quals can be thought of as a simple math problem with two main variables. One variable is the total number of books you need to read. The other is how much time you have to read them. If you have an exam date already set, work backwards to figure out how many books you need to read each week. If you have more control over scheduling the date of the exam, work forwards. Using a baseline of around 3-4 hours for each book, determine how many total hours you will need to read them. In either case, it’s crucial to factor in additional time for things like basic chronology, reviewing material, and meetings with professors (roughly 30-40 hours per field, in my case). Schedule in other commitments, weekends, vacations, or time off depending on your schedule. Finally, add in an additional 2-3 week buffer before the exam. This gives you crucial time to synthesize all of the material and, worst case scenario, a surplus buffer of time to dip into if you get behind on your reading schedule. Add it all up and you’ll get a rough sense for what your pace needs to be. In my case, I ended up having to read roughly 8-9 books a week, with around eight hours of additional preparation each week.
Once you’ve figured out what your pace is, you need to keep track of your progress. I ended up creating a spreadsheet with all of my books and estimates for how much time I’d need on each book (usually 3-4 hours for a normal monograph, several more hours for a synthetic tome like Daniel Walker Howe’s What Hath God Wrought). This gave me a running tally of my progress and how much still remained – unsurprisingly, this was a daunting list in the beginning. But checking off books became a daily ritual that lent an all-important sense of moving forward. Having a schedule also gives you added structure for an experience that can otherwise be dangerously unaccountable. There are days when you will be tired, distracted, or just sick and tired of turning pages. These are the days when lack of daily accountability becomes a problem. Putting off a book one morning might seem trivial at the time, but it adds up quickly. Having a schedule forces you to keep working. It might not be pretty, you might not retain as much from that particular book, but knowing that you have to get through it to reach your “quota” for the week allows you to keep grinding.
Treating quals-studying like a job that you clock into and out of also helps to keep your sanity. Just reading and reading for hours every day is an isolating and tiring experience in a way that taking classes, teaching, or even research is not. It’s easy to get lost in the world of endless books, and while this can be rewarding in its own peculiar way it’s also not sustainable. Set a daily reading schedule and try to stick with it. By working consistently at the same times each day it will be much easier for you to “leave” your job. When you’re done for the day, actually be done for the day. I found studying for quals to be draining in a very different way from other aspects of graduate school. Whereas I have no problem answering emails from students at night or thinking about research while I cook dinner, it was much more exhausting to think about the two books I had read that day for quals. If possible, try to take at least one day off a week where you don’t touch a book. And all of the other rules about work/life balance apply: have a social life, exercise, think and talk about things other than history. Clock in, clock out.
Learn How to Not Read
Arguably the most important skill in studying for quals is learning how to not read. When you have to read two books a day, you don’t actually read them. You gut them. Graduate school has likely forced you to begin to do this already, but it will soon become a standard rather than an exception. For inspiration, read Larry Cebula’s “How to Read a Book in One Hour.” Although you will be spending more time on each book, the same general principles apply. Below was my own system for reading a book for quals.
1. Use a template. After much debate I ended up using Evernote as my note-taking medium. I created a basic template that I would use to create a new note for each book. This not only saves time but allows you to remember information more systematically. Finally, taking notes digitally also allows for a more robust catalog and search functionality, especially via tagging systems. By tagging summaries of books with their different subjects, I could quickly pull up, say, all the books on my 19th-century reading list having to do with slavery.
2. Use book reviews. Read 2-3 reviews of the book and take notes on them. If possible, try to find a mix of shorter (1-2 page) synopses and lengthier (5-10 page) reviews. You will quickly learn which journals are best for your particular field – in US History, for instance, Reviews in American History offers much more detailed reviews that oftentimes place the books within a broader historiographic context. I would usually pair one of these longer reviews with two shorter ones. By reading several different reviews you can usually glean what the “consensus” is on the book’s major themes and contributions and be on the look-out for these while reading.
3. Be an active reader. I’m aware people have different styles. But for quals, I found the best way to take notes was to sit at a desk with my computer and take notes on every chapter as I went. Whereas in classes I had often read books lying on a couch and used marginalia and underlining, I’ve since soured on this approach. Actively taking notes while you read is less enjoyable, but forces you to synthesize as you go. It’s easy to underline an important sentence without actually understanding it. Paraphrasing forces you to actually get what you read. As for content, start with a careful, word-by-word reading of the introduction and take detailed notes. Then move much more quickly through the book’s chapters, skimming and trying to pull out what’s most important.
Quals tend to privilege arguments over thematic content: few people are going to ask for the specific evidence an author used to support their argument in a particular chapter. However, jotting a sentence down that describes the general setting, actors, and subject of the chapter, separate from its argumentative thrust, allows you to recall it better in the future. It’s important to take notes on both arguments and content. Finally, move fast. Flip past pages that are simply listing additional evidence for an argument. Although these are often the most enjoyable parts of history books they are, unfortunately, tangential to why you’re reading the book. Unless the book was particularly long or particularly important, I tried to cap the reading part of the note-taking process at around three hours.
4. Synthesize. This is crucial. After reading every book I forced myself to take 20-30 minutes and write a careful two-three paragraph summary of the book. This is much harder than simply taking notes because it forces you to distill a book into its barest bones. Perhaps not surprisingly, it’s difficult to write a summary of a book you don’t understand or remember, so doing this also makes sure you actually processed what the author was trying to do (or force you to at least take a stab at it). As a supplement to this, as I was reading the book I would write major themes or concepts in a bullet list. Once I got to the end, I would go back and decide which of these were actually major themes or concepts and which ended up being auxiliary. The important themes gave me a basic skeleton from which I could then write a more elaborate summary. These write-ups proved invaluable. When you’re reading two books a day, even a book you read two weeks ago can dissolve into a distant memory. These summaries give you a fast and efficient means of recalling what the book was about. Finally, go back and revise them as you read other books. Oftentimes you don’t understand the broader significance of an author’s argument until you’re able to place it in a larger historiographic context.
5. Talk it out. This is probably the hardest step, especially in the beginning of the process. But it’s central to studying for quals. There is something about having to verbally articulate an answer that forces you to understand it in a way that simply writing answers or notes does not. Additionally, one of the most challenging parts of quals is to move beyond simply being able to regurgitate a specific author’s argument and move towards higher-level synthesis. It’s one thing to be able to answer: “What is Bernard Bailyn’s interpretation of the American Revolution?” or even “What are three different interpretations of the American Revolution?” It’s much harder to answer, “Was the American Revolution actually revolutionary?” Answering these higher-level questions out loud is hard, but it is a skill at which you can and will get better. Once again, rely on your fellow graduate students, particularly ones who have already taken their exams. Have them ask you practice questions, pretend you are in an actual exam, and give formal answers (rather than the easier route of making it conversational, as in “Well, I’d probably say something about…”). Practice your own answers, but also ask other students for clarifications about topics or books you don’t understand. Do this as early as possible and keep doing it throughout the process. I found it the most useful way to prepare for the exam itself.
6. Go back to the basics. My grasp of the more factual side of American history was surprisingly weak going into the process. It’s easy to spend all of your time learning about historiography and interpretations, but you need a factual framework to build off. Particularly important episodes demand a solid grounding in chronology – for example, the lead-up to the American Revolution or the Civil War. Memorize things like changing geography, presidential administrations, dynastic reigns, economic depressions, major legal cases, etc. Some books, like those in the Oxford Series in American History, offer more nuts-and-bolts information than others. In this case, be aware of that and take more time to read them in more detail, writing separate notes related to basic chronology or events in addition to your notes on the more interpretive side of the book.