Text Analysis of Martha Ballard’s Diary (Part 2)

Given Martha Ballard’s profession as a midwife, it is no surprise that she carefully recorded the 814 births she attended between 1785 and 1812. These events were given precedence over more mundane occurrences by noting them in a separate column from the main entry. Doing so allowed her to keep track not only of the births, but also record payments and restitution for her work. These hundreds of births constituted one of the bedrocks of Ballard’s experience as a skilled and prolific midwife, and this is reflected in her diary.

As births were such a consistent and methodically recorded theme in Ballard’s life, I decided to begin my programming with a basic examination of the deliveries she attended. This examination would take the form of counting the number of deliveries throughout the course of the diary and grouping them by various time-related characteristics, namely: year, month, and day of the week.

Process and Results

The first basic step for performing a more detailed text analysis of Martha Ballard’s diary was to begin cleaning up the data. One step was to take all the words and (temporarily) turn every uppercase letter into a lowercase letter. This kept Python from seeing “Birth” and “birth” as two separate words. For the purposes of this particular program, it was more important to distill words into a basic unit rather than maintain the complexity of capitalized characters.

Once the data was scrubbed, we could turn to writing a program that would count the number of deliveries recorded in the diary. The program we wrote does the following:

  1. Checks to see if Ballard wrote anything in the “birth” column (the first column of the entries that she also used to keep track of deliveries)
  2. If she did write anything in that column, check to see if it contains any of the words: “birth”, “brt”, or “born”.
  3. I then printed the remainder of the entries that contained text in the “birth” column but did not contain one of the above words. From this short list I manually added an additional seven entries into the program, in which she appeared to have attended a delivery but did not record it using the above words.

Using these parameters, the program could iterate through the text and recognize the occurrence of a delivery. Now we could begin to organize these births.

First, we returned the birth counts for each year of the diary, which were then inserted into a table and charted in Excel:

Year Deliveries

At the risk of turning my analysis into a John Henry-esque woman vs. machine, I compared my figures to the chart that Laurel Ulrich created in A Midwife’s Tale that tallied the births Ballard attended (on page 232 of the soft-cover edition). The two charts follow the same broad pattern:


Note: I reverse-built her chart by creating a table from the printed chart, then making my own bar graph. Somewhere in the translation I seem to have misplaced one of the deliveries (Ulrich lists 814 total, whereas I keep counting 813 on her graph). Sorry!

However, a closer look reveals small discrepancies in the numbers for each individual year. I calculated each year’s discrepancy as follows, using Ulrich’s numbers as the “true” figures (she is the acting President of the AHA, after all) from which my own figures deviated, and found that the average deviation for a given year was 4.86%. Apologies for the poor formatting, I had trouble inserting tables into WordPress:

Year Deliveries Count Difference Deviation (from Ulrich)
Manual (Ulrich) Computer Program
1785 28 24 4 14.29%
1786 33 35 2 6.06%
1787 33 33 0 0.00%
1788 27 28 1 3.70%
1789 40 43 3 7.50%
1790 34 35 1 2.94%
1791 39 39 0 0.00%
1792 41 43 2 4.88%
1793 53 50 3 5.66%
1794 48 48 0 0.00%
1795 50 55 5 10.00%
1796 59 56 3 5.08%
1797 54 55 1 1.85%
1798 38 38 0 0.00%
1799 50 51 1 2.00%
1800 27 23 4 14.81%
1801 18 14 4 22.22%
1802 11 12 1 9.09%
1803 19 18 1 5.26%
1804 11 11 0 0.00%
1805 8 8 0 0.00%
1806 10 11 1 10.00%
1807 13 13 0 0.00%
1808 3 3 0 0.00%
1809 21 22 1 4.76%
1810 17 18 1 5.88%
1811 14 14 0 0.00%
1812 14 14 0 0.00%

Keeping the knowledge in the back of my mind that my birth analysis differed slightly from Ulrich’s, I went on to compare my figures with other factors, including the frequency of deliveries by month over the course of the diary.


If we extend the results of this chart and assume a standard nine-month pregnancy, we can also determine roughly which months that Ballard’s neighbors were most likely to be having sex. Unsurprisingly, the warmer period between May and August appears to be a particularly fertile time:


Finally, I looked at how often births occurred on different days of the week. There wasn’t a strong pattern, beyond the fact that Sunday and Thursday seemed to be abnormally common days for deliveries. I’m not sure why that was the case, but would love to hear speculation from any readers.



The discrepancies between the program’s tally of deliveries and Ulrich’s delivery count speak to broader issues in “digital” text mining versus “manual” text mining:

Data Quality

Ulrich’s analysis is a result of countless hours spent eye-to-page with the original text. And as every history teacher drills into their students when conducting research, looking directly at the primary documents minimizes the degrees of interpretation that can alter the original documents.  In comparison, my analysis is the result of the original text going through several levels of transformation, like a game of telephone:

Original text -> Typed transcription -> HTML tables -> Python list -> Text file -> Excel table/chart

Each level increases the chance of a mistake.  For instance, a quick manual examination using the online version of the diary for 1785 finds an instance of a delivery (marked by ‘Birth’) showing up in the online HTML, but which does not appear in the “raw” HTML files our program is processing and analyzing.

On the other hand, a machine doesn’t get tired and miscount a word tally or accidently skip an entry.


Ulrich brings to bear on the her textual analysis years of historical training and experience along with a deeply intimate understanding of Ballard’s diary. This allows her to take into account one of the most important aspects of reading a document: context. Meanwhile, our program’s ability to understand context is limited quite specifically to the criteria we use to build it. If Ballard attended a delivery but did not mark it in the standard “birth” column like the others, she might mention it more subtly in the main body of the entry. Whereas Ulrich could recognize this and count it as a delivery, our program cannot (at least with the current criteria).

Where the “traditional” skills of a historian come into play with data mining is in the arena of defining these criteria. Using her understanding of the text on a traditional level, Ulrich could create far, far superior criteria than I could for counting the number of deliveries Martha Ballard attends. The trick comes in translating a historian’s instinctual eye into a carefully spelled-out list of criteria for the program.


One area that is advantageous for digital text mining is that of revising the program. Hypothetically, if I realized at a later point that Ballard was also tallying births using another method (maybe a different abbreviated word), it’s fairly simple to add this to the program’s criteria, hit the “Run” button, and immediately see the updated figures for the number of deliveries. In contrast, it would be much, much more difficult to do so manually, especially if the realization came at, say, entry number 7,819. The prospect of re-skimming thousands of entries to update your totals would be fairly daunting.

8 thoughts on “Text Analysis of Martha Ballard’s Diary (Part 2)

  1. What a great idea to perform the same analysis (birth count by year) that Ulrich did, and compare your results to hers! I’m curious if you tried the manual method for the years with the widest divergence to see exactly why your program disagreed with Ulrich 5 times in 1795. That might offer an opportunity to improve the algorithm, but would more likely illustrate the limitations of data mining via text searches, with some concrete examples of why some analysis is non-computable.

    Given the content of the diary, I wonder if you could look for correlations between births and other events that Ballard mentioned in the text of her entries. For example, she mentions the weather in the early pages I’ve examined, so you might be able parse out her descriptions (looking for strings like “fine” or “snow”), assign weather values to dates, then look for correlations between the weather and the deliveries she attended. Other events may be harder to identify: Ballard mentions her own health, but also comments on other people who are unwell. This might make it impossible to correlate deliveries to Ballard’s health.

    Thanks for posting such a detailed description of your work.

    1. Ben,

      Interesting ideas – comparing my program to Ulrich’s analysis certainly reinforced some of the limitations of data mining, but gave me hope in that it’s not so difficult to tweak the program and make it more effective.

      I really like the idea of looking for correlations between births and other events. I think the next step for an in-depth and systematic analysis of the text would be to create first a dictionary of unique words, then start grouping the words together under different categories (Religion, Death, Marriage, etc.). From there it would be really cool to then look for patterns using those groupings. Unfortunately Ballard’s unique spelling system presents a challenge – she spells each word about 3-4 different ways, and has an incredible use of shorthand that contributes to around 37-38,000 “unique” words that would need to be cataloged. But if that gets done, the possibilities really become endless.

      Thanks again for the support!


      1. I’ve encountered similar challenges editing the Julia Brumfield diaries, where proper names are spelled inconsistently — sometimes within the same page. Because I’m identifying terms for indexing/analysis as I transcribe the text from scanned images, I can resolve the spelling irregularities during transcription/editing. However, I’ve still found that full-text searches will identify terms I missed during the mark-up phase, so I can’t say that my technique for data extraction is substantially better.

        I like your idea of extracting words from the text to identify variant spellings. I presume you’d do a frequency count over the entire corpus and sort the word/count pairs alphabetically to look for variants.

        Another possibility is to manually pull all the variant spellings of something you’re interested in (say, weather) from one year’s worth of entries. You could then execute the search against a different year and then manually identify missed variants there to see how representative your orgininal sample was. Someone with more statistics than I command could probably come up with reasonable figures for your extraction algorithm’s accuracy. At any rate, you’d then be able to extract the data about that subject for your analysis.

        The downside of my approach is that you can block yourself from discovering subjects to investigate. Because you set out with one topic of analysis in mind (weather, say), you might miss the sort of things that a high-frequency word list could suggest. In my own project, I did not identify clothes washing as a domestic activity worthy of analysis until I was around 500 pages in. Fixing this will not be easy.

  2. Thank you for an interesting and helpful post. And yet is seems to confirm my fairly uninformed and perhaps knee-jerk reaction to a lot of these text mining projects–the conclusions are quite modest compared to the effort that went into the process. You have shown that Mainers had sex more often in the summer. No, you have shown that Mainers who hired Martha Ballard to midwife their babies were more likely to have had sex in the summer.

    I hope I don’t come off as snarky, I truly am impressed by your technical abilities. I think that is exactly what makes me read your conclusions and say “Is that it?”

    1. Larry, I’m afraid that you’re confusing the technique (parsing and extracting data from the text) with the analysis (what you do with the data you’ve extracted, and whether you attempt to extract more data from the text). In this case, Cameron’s extracted births and dates from the text — a single fact (Ballard’s attendance at births) with a single dimension (the date the birth occurred). There’s not very much analysis he can perform on this data, since there are only a limited set of questions to be asked from it. So of course the conclusions are modest.

      However, I’d wager that those conclusions–the graph of births per year–were determined through far less effort by Cameron than the effort spent by Ulrich to manually tabulate births by year. The fact that he’s able to compare his low-cost effort to Ulrich’s and such minor deviation lets us know the quality-to-cost trade-offs of his methodology. That in itself is worth knowing.

      The real question is–having exhausted the interesting questions he can ask of the data he’s extracted–is there other data to be extracted from this text that might lend itself to more interesting analysis? How often was Ballard paid, and how many clients stiffed her? What were the geographic limits of her practice? If you had the misfortune to enter labor during a snowstorm, did that reduce the likelihood that you’d be attended by a midwife? If so, does weather explain the trough in births Ballard records between November and January, thus canceling out the summer-conception effect Cameron’s initial analysis finds?

      Some of these may simply not be extractable from the kind of full-text search Cameron’s performing here — geography in particular requires contextual information about where her clients lived that is not internal to the text, and which we’re unlikely to have elsewhere. But it’s a useful exercise to figure out which of these questions are answerable, which are impossible, and why.

    2. Larry,

      Thanks for the feedback, you raise some good points. One fundamental issue with text mining in the humanities has been a gulf between promise and delivery – there seems to be so many things that could potentially be done, but that in the end prove to be either impossible or involve even more work than doing it by hand. There’s also the issue of what I believe you termed “parlour tricks” on your blog, of analysis that may be superficially interesting or catchy, but adds little substantive value to the investigation. Both of these are fair criticisms.

      In response to “Is that it?”, I’d say its a valid question to ask since the analysis I’ve done so far isn’t particularly deep, but that it’s a bit like watching a ten year old learning how to play basketball and saying “Okay, but can they dunk?” Much like a ten year old struggling to learn how to play a new sport, the process for me (admittedly somewhat selfishly) has been more about the learning experience than about producing earth-shattering results.

      Having said that, even my limited experience so far has affirmed for me the potential and ability of text mining to study history. I’m fascinated by ways it can be applied that would be either impractical or impossible to accomplish manually. What I’ve done here can be done (and has, obviously) without the magic of computers. But in the hands of a more skilled programmer than I, text mining offers up the real ability for both deep analysis and a degree of flexibility that goes beyond the typical scope of traditional methodology. When paired up with the massive digitization projects going on already that lowers the barrier to processing digital data (and, I fully admit, presents its own issues and problems), I think the tradeoff between the quality of results vs. time/effort is going to continue to shift in favor of text mining.

  3. Cameron: Thanks for taking my comments in the friendly spirit in which they are intended.

    As a profession, we have been here before. In the 1960s the term Cliometrics was coined. Historians created punch cards based on census data and city directories and so on. It was going to REVOLUTIONIZE EVERYTHING. But nothing much ever came from it so far as I know. The one book title that pops up in my mind is Fogel and Engerman’s Time on the Cross–a controversial book.

    And yet–I am pretty sure that the application of digital technology is actually going to revolutionize everything–eventually. I want it to work.

    Can you point me towards some historical text mining scholarship that has produced unique and compelling insights?

    1. Larry, I think you make a fair point here. Text searching (not necessarily text mining) some parts of some scholarship, according to Patrick Leary’s Googling the Victorians, but in most cases it’s probably harder to figure out the questions to ask than to do the programming.

      In my on project, for example, writing an analysis tool to look for correlation among subjects I’d already extracted was the matter of a single evening’s hacking. But is it really that insightful to see that stripping tobacco occurs alongside clouds and rain? Or that mentions of the tenant farmer are common next to plowing? So far the most use I’ve gotten out of the tool has been in identifying unfamiliar names during the annotation process by looking for the context in which they’re mentioned. Which is nice, but that’s only happened twice in a few hundred pages. Web searches for unfamiliar names have worked just as often.

      Despite those modest–even disappointing–results, I’m not sorry I built the tool, not least because it required such a modest effort. I think that perhaps we’re moving beyond the model of large scale, resource-intensive text mining projects with unrealistic expectations to a model in which text mining is just another tool in the humanist’s chest. Like a set of Allen wrenches: you may not need them very often, but they only cost a couple of bucks so you don’t mind the expense.

Leave a Reply

Your email address will not be published. Required fields are marked *