Entering a bunch of data by hand is better if it’s systematized, so here are a few tips for consistency. These are all judgment calls based on what seems to make data entry easiest.
When in doubt, encode it and flag it! We can always delete it later.
Made up places
Every place in a novel is, in its own way, “made up” by the author, so they should all be encoded. For example, Skuytercliff, the Hudson River Valley estate of the van der Luydens, is most likely wholly Edith Wharton’s invention, and we cannot be even “mostly” sure of where it is, but it should still be encoded. When adding the place, just set the confidence to 0.
Noun modifiers (“New York Times”)
Unlike adjectives (see below), noun modifiers should be encoded. Here are a few examples:
- Alençon lace and Waterford crystal. Both are named after their places of origin.
- New York Times (but not New Yorker Magazine)
- Kentucky Derby (but do not add a reference to Churchill Downs)
However, if the noun modifier is part of a specific place that can be identified, then the whole entity counts as one instance:
- New York University
- Boston Garden
- London Bridge
Enclosing places (“Manhattan, New York”) and other doubling and addresses
Places given as a set of enclosing objects (like an address, which in English moves from small to larger geographical entity) have each specific place encoded. “Manhattan, New York,” then, is a reference both to Manhattan and to New York (state). More precisely, it is two separate instances. Neither instance, however, is a reference to New York City.
There is at least one exception to this rule. For convenience, “Washington, D.C.” is a single instance of “Washington, D.C.” and not two separate references to Washington (the city) and the District of Columbia.
On the other hand, an expression like “please go to Sana’a Deli at 31 Rivington Street” is two instances, once for Sana’a Deli and once for 31 Rivington St. NYC, though they are both references to the same place. Note, however, that there is no reference here to Rivington St. by itself.
Places as genitives (“University of Chicago”)
When the place is a genitive (“n of place”), then it is a single instance, either to the place mentioned, or to the entity as a whole. For example:
- University of Chicago
- University of California - Davis (not 2 or 3 separate instances)
- shores of the Atlantic
- smells of Paris
- easternmost district of Mirzapur
- North Side of Chicago [note capitals!]
- Lucknow is considered the Paris of India
“Paris of India” could conceivably be an alternate name for Lucknow, but it makes more sense semantically as references to two specific places.
Clearly, “We had lunch at Sardi’s” or “We went out for Mamoun’s Falafel” should be encoded. But what about “We went to Sandy’s apartment”? Because of the lack of the capital, it should not be encoded. However, “we told him we were at Sandy’s” should, because it is equivalent to Sardi’s above.
Corners (“82nd and Park”)
Corners should be found on the map and marked clearly as one instance, not two (one of each street). Conversely, a sentence like “She stood on Park between 82nd and 83rd” would be three different instances.
Metonymy (“Wall Street,” “Hollywood”)
Often, if not usually, a term like “Wall Street” is not a reference to the specific, short street in downtown Manhattan but is, rather, a reference to the entire financial industry, including people scattered around the world. This is an instance of metonymy, and it is to be encoded. That little street holds a lot of semantic weight that it has accrued over time, but it has carried it well, and can continue to do so.
Personal names (“Indiana Jones,” “U.S.S. Arizona”)
These are sometimes encoded and sometimes not. See below.
Do not encode…
Places that are not capitalized (“the park,” “the ocean”)
Generally, if a place is not capitalized, you do not encode it, even though you know the park is Central Park or the ocean is the Atlantic. On the other hand, “he led her to the Park” would be encoded, if you can make a good guess as to what park it is. Do not hurry to assume, though.
Hence, a reference to a place like “north India” would be a reference to India, but not to a new place called “North India.” The same holds true for “north side of Chicago.” But capitalization is key, because “North Side of Chicago” is separate from just Chicago
Adjectives and demonyms (“French”)
Adjectives, such as “French,” “American,” or “English” are not to be encoded. The same goes for languages based on the adjectives and for demonyms (such as “Spaniard”).
Workplaces, leisure (“Sotheby’s,” “the Opera”)
These are a bit of a gray area. Because Sotheby’s has multiple locations, “she works at Sotheby’s” should not be encoded. However, “he works at the McDonald’s on Broadway” should be encoded twice, once as the McDonald’s (Washington Pl. & Broadway) and once for Broadway (but see “Corners,” above).
Similarly tricky is how the word “Opera” is used in a novel like Age of Innocence. First, it is a “nickname” of the Academy of Music, which was located at 14th and Irving Place. So in a sentence like “To come to the Opera in a Brown coupé was almost as honourable a way of arriving as in one’s own carriage,” we encode for the Academy of Music. On the other hand, a sentence like “in metropolises it was ‘not the thing’ to arrive early at the Opera” would not, typically be encoded (“Opera” is always capitalized in that novel. This is not easy!
Personal names (“Indiana Jones,” “U.S.S. Arizona”)
Personal names that have a geographical resonance should not be encoded unless the name is tied to a specific location, like in a title (“the Duke of York”) or in a nickname (“Pecos Bill”). An exception can be made if it is clear in the work that the place was important in naming the person, as in Joy Luck Club, where Waverly is named after the street the family was living on when she was born.
Toponyms affixed to things like boats, however, should be noted, as they are clearly named with the place in mind.