Tuesday, August 9, 2011

Word Play: Google's Ngram Viewer

For those who enjoy tracking name trends on the Baby Name Wizard, there's a similar offering for books from Google’s Ngram Viewer (via VSL).

Google has digitized millions of books, and Ngram Viewer lets you follow the evolution and occasional extinction of words. "Hella"? Its usage spiked around 1810 to 1820, when people wrote about it in reference to a town along the Tigris with big religious significance. Or that "hobo" — most popular in the 1930s — is back on the rise (same goes for "tuberculosis," which was off the charts in usage at the turn of the century)? And that references to "Internet" actually show up prior to 1950? (Oops, explains the Google crew, wisely ruling out time-traveling software engineers. That "usage" is credited to faulty optical character recognition (OCR) errors that couldn't be filtered out.) Oh, and for the kids in the back of the class: Yes, you can search dirty words. Not surprisingly, their usage spiked in the 1960s.

Of course, Ngram Viewer could prove useful for genuine research (when did "Latino" gain popularity, or usage of "Negro" trend toward "African American"?). The Google Labs crew highlights another interesting word trend: usage of "nursery school," "kindergarten," and "child care" from 1950 to 2000:
What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are "nursery school" or "child care"? Of all the unigrams, what percentage of them are "kindergarten"? Here, you can see that use of the phrase "child care" started to rise in the late 1960s, overtaking "nursery school" around 1970 and then "kindergarten" around 1973. It peaked shortly after 1990 and has been falling steadily since.
And for people wanting to use data from the Ngram Viewer for scholarly research, the staff at Harvard University's Cultural Observatory offers some tips.

Sources: VSL, Google (Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science [Published online ahead of print: 12/16/2010])

No comments:

Post a Comment