Return consistent version of a city names using stringr::str_*() functions. Letters are capitalized, hyphens and underscores are replaced with whitespace, other punctuation is removed, numbers are removed, and excess whitespace is trimmed and squished. Optionally, geographic abbreviations ("MT") can be replaced with their long form ("MOUNT"). Invalid addresses from a vector can be removed (possibly using invalid_city) as well as single (repeating) character strings ("XXXXXX").

normal_city(city, abbs = NULL, states = NULL, na = c("", "NA"), na_rep = FALSE)

Arguments

city

A vector of city names.

abbs

A named vector or data frame of abbreviations passed to expand_abbrev; see expand_abbrev for format of abb argument or use the usps_city tibble.

states

A vector of state abbreviations ("VT") to remove from the end (and only end) of city names ("STOWE VT").

na

A vector of values to make NA (useful with the invalid_city vector).

na_rep

logical; If TRUE, replace all single digit (repeating) strings with NA.

Value

A vector of normalized city names.

See also

Other geographic normalization functions: abbrev_full(), abbrev_state(), check_city(), expand_abbrev(), expand_state(), fetch_city(), normal_address(), normal_state(), normal_zip(), str_normal()

Examples

normal_city(
  city = c("Stowe, VT", "UNKNOWN CITY", "Burlington", "ST JOHNSBURY", "XXX"),
  abbs = c("ST" = "SAINT"),
  states = "VT",
  na = invalid_city,
  na_rep = TRUE
)
#> [1] "STOWE"           NA                "BURLINGTON"      "SAINT JOHNSBURY"
#> [5] NA