December 31, 2010

I’ve been playing around with the new Google Ngram Viewer, an amazing application that allows searches within the text of the 5 million or so books Google has scanned to date. The Ngram Viewer allows users to enter multiple words or phrases and a date range, and then returns a graph of the use of those words or phrases relative to all the words published during the years specified. Though the tool has been reviewed as something of a time-suck or toy, applied using the very specific language of the social application of biology, I think it shows itself to be a pretty darn cool thing.

The chart above compares the frequency of the use of the word ‘eugenics’ in all books scanned by Google between the years 1905 and 1970 (blue line) with the relative priority of the topic of eugenics in American classrooms based on this author’s study of 80 high school biology textbooks (orange line).

A few fast searches turned up some interesting correlations and relationships: a cross in the popularity of the words ‘eugenics’ and ‘genetics’ in 1934; the rise and decline of eugenic-era terms ‘euthenics,’ ‘dysgenic’ and ‘feeble minded’ and the subsequent post-World War II popularity of the phrase ‘population explosion’; and the relative instances of the phrases ‘Kallikak family,’ ‘Juke family’ and ‘Nam family’, which revealed in a click data scholars might have spent years painstakingly counting.

The ‘eugenics’ curve was particularly interesting to me, as I had published a graph just last February based on the results of a survey that tracked the relative priority of the topic of eugenics in 80 American high school biology textbooks. Frankly, I was somewhat amazed by how closely my graph and Google’s paralleled one another.

Let me offer a couple of caveats before drawing any conclusions.

First, the vertical axis for both graphs is arbitrary. I ‘normalized’ the relative heights. Second, my graph is based on a subjective analysis of importance of eugenics in biology textbooks, while the Google graph is based on a hard word count of all texts published. Still, I think these parallel lines offer some interesting, if only suggestive, insights into a couple of questions asked about biology textbooks.

Assuming its okay to ‘normalize’ the vertical axis, what first jumps out is the obvious shift of 5 to 7 years between the height of popularity of the topic of eugenics in all texts and its popularity in biology textbooks. The second thing that pops is the apparent reluctance by authors to let go of eugenics, even as general interest in the topic began to wane dramatically starting in the late 1930s.

I hesitate to make too much of this, as a similar Ngram built using the word ‘evolution’ did not match my survey of the relative value of that topic in high school biology textbooks quite as neatly.

Still, something, don’t you think?