Making Numbers Legible

What do you do with numbers? I mean this in the context of writing, not research. How do you incorporate quantitative evidence into your writing in a way that makes it legible for your readers? I’ve been thinking more and more about this as I write my dissertation, which examines the role of the nineteenth-century Post in the American West. Much like today, the Post was massive. Its sheer size was part of what made it so important. And I find myself using the size of the Post to help answer the curmudgeonly “so what?” question that stalks the mental corridors of graduate students. On a very basic level, the Post mattered because so many Americans sent so many letters through such a large network operated by so many people. Answering the “so what?” question means that I have to incorporate numbers into my writing. But numbers are tricky.

Let’s begin with the amount of mail that moved through the U.S. Post. In 1880 Americans sent 1,053,252,876 letters. That number is barely legible for most readers. I mean this in two ways. In a mechanical sense we HATE having to actually read so many digits. A more conceptual problem is that this big of a number doesn’t mean all that much. If I change 1,053,252,876 to 1,253,252,876, would it lead you, the reader, to a fundamentally different conclusion about the size of the U.S. Post? I doubt it, even though the difference of 200 million letters is a pretty substantial one. And if instead of adding 200 million letters I subtract 200 million letters – 1,053,252,876 down to 853,252,876 – the reader’s perception is more likely to change. But this is only because the number shed one of its digits and crossed the magic cognitive threshold from “billion” to “million.” It’s not because of an inherent understanding of what those huge numbers actually mean.

ActualPerceived
Actual and perceived differences between 853,252,876 vs. 1,053,252,876 vs. 1,253,252,876

One strategy to make a number like 1,053,252,876 legible is by reduction: to turn large numbers into much smaller ones. If we spread out those billion letters across the population over the age of ten, the average American sent roughly twenty-eight letters over the course of 1880, or one every thirteen days. A ten-digit monstrosity turns into something the reader can relate to. After all, it’s easier to picture writing a letter every two weeks than it is to picture a mountain of one billion letters. Numbers, especially big ones, are easier to digest when they’re reduced to a more personal scale.

1,053,252,876 letters / 36,761,607 Americans over the age of ten = 28.65 letters / person

A second way to make numbers legible is by comparison. The most direct corollary to the U.S. Post was the telegraph industry. Put simply, the telegraph is a lot sexier than the Post and both nineteenth-century Americans and modern historians alike lionized the technology. A typical account goes something like this: “News no longer traveled at the excruciatingly slow pace of ships, horses, feet, or trains. It now moved at 670 million miles per hour.” In essence, “the telegraph liberated information.” But the telegraph only liberated information if you could afford to pay for it. In 1880 the cost of sending a telegram through Western Union from San Francisco to New York was $2.50, or 125 times the price to mail a two-cent letter. Not surprisingly, Americans sent roughly 35 times the number of letters than telegrams. The enormous size of the Post was in part a product of how cheap it was to use.

telegraphvspost
Cost of Telegram vs. Letter, San Francisco to New York (1880)

This points to a third strategy to make numbers legible: visualization. In the above case the chart acts as a rhetorical device. I’m less concerned with the reader being able to precisely measure the difference between $2.50 and $0.02 than I am with driving home the point that the telegraph was really, really expensive and the U.S. Post was really, really cheap. A more substantive comparison can be made by looking at the size of the Post Office Department’s workforce. In 1880 it employed an army of 56,421 postmasters, clerks, and contractors to process and transport the mail. Just how large was this workforce? In fact, the “postal army” was more than twice the size of the actual U.S. Army. Fifteen years removed from the Civil War there were now more postmasters than soldiers in American society. Readers are a lot better at visually comparing different bars than they are at doing mental arithmetic with large, unwieldy numbers.

PostOffice_Military

Almost as important as the sheer size of the U.S. Post was its geographic reach. Most postal employees worked in one of 43,012 post offices scattered across the United States. A liberal postal policy meant that almost any community could successfully petition the department for a new post office. Wherever people moved, a post office followed close on their heels. This resulted in a sprawling network that stretched from one corner of the country to the other. But what did the nation’s largest spatial network actually look like?

1880_PostOffices

Mapping 43,012 post offices gives the reader an instant sense for both the size and scope of the U.S. Post. The map serves an illustrative purpose rather than an argumentative one. I’m not offering interpretations of the network or even pointing out particular patterns. It’s simply a way for the reader to wrap their minds around the basic geography of such a vast spatial system. But the map is also a useful cautionary tale about visualizing numbers. If anything, the map undersells the size and extent of the Post. It may seem like a whole lot of data, but it’s actually missing around ten thousand post offices, or 22% of the total number that existed in 1880. Some of those offices were so obscure or had such a short existence that I wasn’t able to automatically find their locations. And these missing post offices aren’t evenly distributed: about 99% of Oregon’s post offices appear on the map compared to only 47% of Alabama’s.

Disclaimers aside, compare the map to a sentence I wrote earlier: “Most postal employees worked in one of 43,012 post offices scattered across the United States.” In that context the specific number 43,012 doesn’t make much of a difference – it could just as well be 38,519 or 51,933 – and therefore doesn’t contribute all that much weight to my broader point that the Post was ubiquitous in the nineteenth-century United States. A map of 43,012 post offices is much more effective at demonstrating my point. The map also has one additional advantage: it beckons the reader to not only appreciate the size and extent of the network, but to ask questions about its clusters and lines and blank spaces.* A map can spark curiosity and act as an invitation to keep reading. This kind of active engagement is a hallmark of good writing and one that’s hard to achieve using numbers alone. The first step is to make numbers legible. The second is to make them interesting.

* Most obviously: what’s going on with Oklahoma? Two things. Mostly it’s a data artifact – the geolocating program I wrote doesn’t handle Oklahoma locations very well, so I was only able to locate 19 out of 95 post offices. I’m planning to fix this problem at some point. But even if every post office appeared on the map, Oklahoma would still look barren compared to its neighbors. This is because Oklahoma was still Indian Territory in 1880. Mail service didn’t necessarily stop at its borders but postal coverage effectively fell off a cliff; in 1880 Indian Territory had fewer post offices than any other state/territory besides Wyoming. The dearth of post offices is especially telling given the ubiquity of the U.S. Post in the rest of the country, showing how the administrative status of the territory and decades of federal Indian policy directly shaped communications geography.

Who Picked Up The Check?

Adventures in Data Exploration

In November 2012 the United States Postal Service reported a staggering deficit of $15.9 billion. For the historian, this begs the question: was it always this bad? Others have penned far more nuanced answers to this question, but my starting point is a lot less sophisticated: a table of yearly expenses and income.

SurplusDeficitByYear
US Postal Department Surplus (Gray) or Deficit (Red) by Year

So, was the postal department always in such terrible fiscal shape? No, not at first. But from the 1840s onward, putting aside the 1990s and early 2000s, deficits were the norm. The next question: What was the geography of deficits? Which states paid more than others? Essentially, who picked up the check?

Every year the Postmaster General issued a report containing a table of receipts and revenues broken down by state. Let’s take a look at 1871:

AnnualReportTableReceiptsExpenditruesByState
1871 Annual Report of the Postmaster General – Receipts and Expenditures

Because it’s only one table, I manually transcribed the columns into a spreadsheet. At this point, I could turn to ArcGIS to start analyzing the data, maybe merging the table with a shapefile of state boundaries provided by NHGIS. But ArcGIS is a relatively high-powered tool better geared for sophisticated geospatial analysis. What I’m doing doesn’t require all that much horsepower. And, in fact, quantitative spatial relationships (ex. measurements of distance or area) aren’t all that important for answering the questions I’ve posed. There are a number of different software packages for exploring data, but Tableau provides a quick-and-dirty, drag-and-drop interface. In keeping with the nature of data exploration, I’ve purposefully left the following visualizations rough around the edges. Below is a bar graph, for instance, showing the surplus or deficit of each state, grouped into rough geographic regions:

SurplusDeficitBar_Crop
Postal Surplus or Deficit by State – 1871

Or, in map form:

SurplusDeficitMap_Crop
Postal Surplus (Black) or Deficit (Red) by State – 1871

Between the map and the bar graph, it’s immediately apparent that:
a) Most states ran a deficit in 1871
b) The Northeast was the only region that emerged with a surplus

So who picked up the check? States with large urban, literate populations: New York, Pennsylvania, Massachusetts, Illinois. Who skipped out on the bill? The South and the West. But these are absolute figures. Maybe Texas and California simply spent more money than Arizona and Idaho because they had more people. So let’s normalize our data by analyzing it on a per-capita basis, using census data from 1870.

SurplusDeficitBar_PerCapita_Crop
Postal Surplus or Deficit per Person by State – 1871

The South and the West may have both skipped out on the bill, but it was the West that ordered prime rib and lobster before it left the table. Relative to the number of its inhabitants, western states bled the system dry. A new question emerges: how? What was causing this extreme imbalance of receipts and expenditures in the West? Were westerners simply not paying into the system?

ReceiptsExpendituresByRegion
Postal Receipts and Expenditures per Person by Region – 1871

Actually, no. The story was a bit more complicated. On a per-capita basis, westerners were paying slightly more money into the system than any other region. The problem was that providing service to each of those westerners cost substantially more than in any other region: $38 per person, or roughly 4-5 times the cost of service in the east. For all of its lore of rugged individualism and a mistrust of big government, the West received the most bloated government “hand-out” of any region in the country. This point has been driven home by a generation of “New Western” historians who demonstrated the region’s dependence on the federal government, ranging from massive railroad subsidies to the U.S. Army’s forcible removal of Indians and the opening of their lands to western settlers. Add the postal service to that long list of federal largesse in the West.

But what made mail service in the West so expensive? The original 1871 table further breaks down expenses by category (postmaster salaries, equipment, buildings, etc.). Some more mucking around in the data reveals a particular kind of expense that dominated the western mail system: transportation.

TransportationMap_PerCapita_Crop
Transportation Expenses per Person by State (State surplus in black, deficit in red) – 1871

High transport costs were partially a function of population density. Many western states like Idaho or Montana consisted of small, isolated communities connected by long mail routes. But there’s more to the story. Beginning in the 1870s, a series of scandals wracked the postal department over its “star” routes (designated as any non-steamboat, non-railroad mail route). A handful of “star” route carriers routinely inflated their contracts and defrauded the government of millions of dollars. These scandals culminated in the criminal trial of high-level postal officials, contractors, and a former United States Senator. In 1881, the New York Times printed a list of the ninety-three routes under investigation for fraud. Every single one of these routes lay west of the Mississippi.

1881_StarRouteFrauds_Crop
Annual Cost of “Star” Routes Under Investigation for Fraud – 1881 (Locations of Route Start/End Termini)

The rest of the country wasn’t just subsidizing the West. It was subsidizing a regional communications system steeped in fraud and corruption. The original question – “Who picked up the check?” – leads to a final cliffhanger: why did all of these frauds occur in the West?

Scattered Links – 3/16/2009

I’ve been closely following the history blogging roundtable examining Judith Bennett’s History Matters: Patriarchy and the Challenge of Feminism. Notorious Ph.D., Girl Scholar kicked things off with Should politics be historical? Should history be political? Then Historiann kept the ball rolling with Who indeed is afraid of the distant past (and who says it’s distant, anyway)? A call to arms. This week Claire Potter at Tenured Radical posted part three, Teach This Book!, with part four appearing soon at Blogenspiel. I’ve found the series instructive, given my embarrassing lack of knowledge of historiography in general, and feminist (not to mention medieval feminist) historiography in particular. A lively comment-debate about generational issues followed Notorious Ph.D.’s posting, which Historiann expounded upon in part two (and included an interesting suggestion of social history’s potential for comparative women’s studies). Tenured Radical delves into why feminist historians might gravitate towards more recent history, while championing queer history as a partial solution to some issues that Bennett raises. The history/academia blogosphere could benefit from more roundtables such as these.

Deviant Art supplies an amusing cartographic comic on the progression of World War II. My favorite part? “We talked about this before, mon ami.”

Lisa Spiro at Digital Scholarship in the Humanities gives a great two-part wrap-up of Digital Humanities developments in 2008. Part One sounds a triumphant note, including “Emergence of Digital Humanities” and “Community and collaboration,” while Part Two is more sobering, discussing continued resistance to open access and other new scholarly models, along with the erroneous and Grinch-like litigation by EndNote against Zotero.

Scientists compiled a clickstream map of “scientific activity” (along with other disciplines) that creates a visualization of how users moved from one academic journal to another. The visualization shows how different disciplines tend to cluster around one another, and I was impressed at the degree of interaction in the humanities and social sciences (although I would have loved to see more fluidity between humanities and more “hard” disciplines).

It reminded me of Sterling Fluharty’s insightful take on using quantitative methods to rank history journals based on citations, which the clickstream map avoided due to inconsistent nature of citations across disciplines.

Finally, the Economist’s Technology Quarterly profiles Brewster Kahle in “The Internet’s Librarian” and his quest to build “Alexandria 2.0,” a free digital archive of human knowledge.

Revisiting Charles Tilly: “How (and What) are Historians Doing?”

In April of 2008, noted historian and sociologist Charles Tilly died of lymphoma. In addition to the 51 books or monographs and over 600 articles, he left behind a legion of friends and admirers. By all accounts, Tilly was a prolific and highly influential scholar whose work encompassed a staggering breadth of subjects. Back in April I decided I should at least acquaint myself with some of his writing, and filed away a recommended article for later reading: “How (and What) Are Historians Doing?”

The article covers several different topics, but begins by addressing the question: what distinguishes history from other disciplines in the humanities and social sciences? Tilly lists a six-part answer:

1. Time and place are fundamental characteristics
2. Historians specialize in specific times and places
3. The most dominant historical questions are rooted in national politics
4. The blurry line between amateurs and professionals
5. A strong focus on written documents and sources
6. A narrative style that focuses on the motivations of characters as supported by textual evidence

I think some of these characteristics already feel dated, eighteen years after he wrote them. For instance, the popularity of transnational history weakens the case for Numbers 2 and 3. Historians are increasingly straying outside the traditional specialties bound by time period and geographic location, and into more thematic realms such as diaspora and women’s studies. In many ways, this is a good thing. Without the constraints of specific time and place, scholars are able to tackle problems and questions from a radically different perspective.

On the other hand, I would contend that number 4 remains stronger than ever – historical writing consistently maintains a level of common popularity that is largely out of reach for many other disciplines (although a similar case could be made for sociology and, especially today, economics). Journalists, genealogists, librarians, curators, reenactors, armchair enthusiasts – all are contributing to the field of history as much as those within the academy. Finally, even though digitization projects are making sources such as print media, maps, video footage, and material culture more accessible, I would agree with Tilly that the solid majority of historical research is conducted using written documents and sources.

Later in the article, Tilly starts to systematically analyze common approaches to historical study. To do so, he uses a simple chart outlining these different approaches:

picture-11

On the vertical access is the scope, or the size of the lens being used by the historian. On the horizontal access is a continuum of methodological/philosophical approaches. Like any good educator, Tilly then uses specific examples of historical monographs to illustrate the “Four Corners” of historical approach. For his sample, he selects Carlo Ginzburg’s (1980) The Cheese and the Worms, E.P. Thompson’s (1963) The Making of the English Working Class, E.A. Wrigley and R.S. Schofield’s (1981) The Population History of England, 1541-1871, and Olivier Zunz’s (1982) Changing Face of Inequality: Urbanization, Industrial Development, and Immigrants in Detroit, 1880-1920. After discussing each one individually, he charted them visually:

picture-2

Although I have not read any of the works, Tilly did a great job of explaining them in terms of four different approaches to historical analysis. I was left with the urge to create my own charts using works I had actually read in the past year or two:

republic-of-suffering2Drew Gilpin Faust, This Republic of Suffering: Death and the American Civil War

The graph for Drew Faust’s This Republic of Suffering is largely concentrated in the large-scale, humanistic quadrant, as the book delves into the traumatic effects on the American psyche caused by the Civil War’s unprecedented slaughter. Its line extends down to the small-scale, humanistic corner in order to represent Faust’s use of individual experiences (such as those of Walt Whitman and Ambrose Bierce) to illustrate her broader points.

polio1David Oshinksy, Polio: An American Story

David Oshinsky’s Polio covers a wide area of scope and approach, but it leans towards a large-scale, social-scientific book, given his detailed narrative covering America’s medical struggle against polio in the twentieth century.

slave-no-more

David Blight, A Slave No More: Two Men Who Escaped to Freedom

David Blight’s A Slave No More maintains a relatively tight focus on the small-scale and humanistic approach. Blight uses the narratives of two escaped slaves to explore their decisions, motivations, and actions – all within the larger context of slavery’s death throes and national emancipation.

dwelling-place

Erskine Clarke, Dwelling Place: A Plantation Epic

Erskine Clarke’s Dwelling Place takes a small-scale, more social-scientific approach to chronicling several generations of a slave family and a slaveholding family on a Georgia plantation. The scope is kept relatively small, with a huge amount of detail that draws upon a range of quantitative and qualitative sources.

I enjoyed the process of creating these graphs. Information visualization is a growing field, and one that I believe will become ever more important to historians. Although the above graphs are imprecise and subjective, they forced me to really consider the content of the books in a new perspective. Most of these books probably touched every corner of the graph at some point during the course of the monograph, but visually plotting their content necessitates a careful aggregation of the author’s central themes and points. Where do the books cross the two axes of scope and approach? Where is their “centroid” on the graph? Just how far along each axis do they go? It’s a refreshing exercise, and one that I think could work in the classroom as a visual and quantitative supplement to a traditional book review.

I’d recommend anyone reading this to sit down with a pencil and paper, channel your inner Charles Tilly, and try to sketch out a graph for a favorite historical monograph. For those of you who have read the above books, I’d welcome any and all criticisms and disagreements regarding their graphs.