Geocoding Best Practices

<Updated 3/2/2015>

This webpage explains what geocoding is, and best practices to follow when performing geocoding.  For detailed instructions on using the latest geocoding services at Harvard, click here.

[compiled by Jeff Blossom]

Introduction

The ability to assign specific geographic locations to descriptive information (the process known as geocoding) is available to anyone with a computer and internet access. The relative ease of geocoding and resulting accuracy can vary widely depending on a number of factors. What is the nature of the data?  How ‘clean’ is it and what format is it in?  What geocoding technique will be used?  Determining a geocoding  strategy that best suits a particular need is not always clear.

The process of geocoding begins with comparing data in text or tabular  form to a reference data table in geographic format.  The reference table is a dataset that has already been mapped, with established map coordinates. When matches between the input data and the reference data are found, the corresponding map coordinates are assigned from the reference data to the input features, thus geocoding them.   A geocoding service is a program that allows for a user to input a batch of data contained in a table, search for matches as compared to a reference table, and output the result in a map or GIS layer format.  The key to confidently geocoding data lies in understanding the reference table which the data is being matched to, how a match is found, and the resulting locational accuracy.  Before geocoding have a clear idea of your geocoding purpose, and determine the level of accuracy you need:  e.g. address, street, ZIP, city, county, state, or some other geographic unit.

Preparing address data for geocoding

As a geocoding service compares input data to a table in an existing GIS layer in search of an identical match, it is of critical importance to eliminate misspellings, special characters (such as “ ? # ‘ \ %), and abbreviations in the input dataset.  Apartment, suite, unit, or other additional information should be eliminated, or stored in a separate field, as this information ends up confusing most geocoders.  To successfully geocode address data, it should contain the Street Number, Street Name, Street Type, and Street Suffix Direction (if necessary – many street names don’t have a Suffix Direction). Most address geocoders work on street intersections as well, and the intersecting Street Names and Street Types separated by an & in one field is the standard format for these. City and State are also required.  See the example below for a properly formatted U.S. address:

UNIQUE_IDADDRESSCITYSTATEZIPSTORE_NAME
11171 PIEDMONT AVE NEAtlantaGA30309Ace Market
2MAIN ST & 41ST AVEAtlantaGA30309Andrew’s Gasoline

Notice for the first address that the Street Number (1171), Street Name (PIEDMONT), Street Type (AVE), and Street Suffix Direction (NE) are all separated by a space, and there aren’t any periods or other special characters that may lead to confusion. As most address geocoders refer to a streets GIS layer that uses standard postal service abbreviations, following these conventions is also necessary in order to ‘match’ addresses when geocoding. For U.S. addresses, the U.S. Postal Service’s standard Street Type, (or suffix) and State abbreviations are listed here: https://www.usps.com/send/official-abbreviations.htm

ZIP or postal codes are good to have as well, but usually addresses can be matched without them.  Notice as well the additional table attribute STORE_NAME, is included in the table. This, and all other fields will be retained in the output GIS table after the geocoding is performed. In addition, the UNIQUE_ID field simply contains a sequential number that is unique to that row. It is a good idea to have this identifier, so the table of geocoded information can be linked back to the original table if necessary    

Geocoding services require address information to be formatted properly, in one of two ways.  The first format is shown in the table above, where the Address, City, State, and ZIP/postal code are in individual fields.  The 2nd format is shown below, where all information is one field, separated by commas, like this:

UNIQUE_IDADDRESSSTORE_NAME
11171 PIEDMONT AVE NE, Atlanta, GA, 30309Ace Market
2MAIN ST & 41ST AVE, Atlanta, GA, 30309Andrew’s Gasoline

 Know the format that is required by the geocoder you are using, and format your data that way prior to input.  In MS Excel (and other spreadsheet and database applications) the CONCATENATE formula is an easy way to combine multiple columns of data.  The Data > Text to Columns command is a great way to separate data from one column to many.

Locational accuracy in geocoding results

Most geocoders are able to locate addresses extremely accurately, to the exact property or building of the address.  But this is not always the case.  Often addresses will be located using address range interpolation along a street.  For example the address 170 Main St. will be placed 70% of the way along the block of Main St. that ranges from 100 to 200.  Additionally, the fact that an address is located on the map does not necessarily mean that address has been platted to a municipal address.  For example, the addresses 170, 171, 171.5, and 172 Main St. may all be mapped by a geocoder, but an actually property may only exist at 170 Main St.

Geocoders may also match address information to less precise locations, like ZIP or city centers.  For example, if the address “170 Main St., Cleveland, OH” is searched for, and if there exists no such address in Cleveland, the resulting location may be placed in a nearby town, or in the geographic center of Cleveland.  (As of this writing, the Google geocoder placed this fictitious address in Barea, OH, and the Yahoo geocoder placed this address in Westlake, OH). Knowing this level of locational precision is important!  Depending on what your geocoding purpose is, city center or nearby town accuracy may not be acceptable.  To establish a confidence level in the locational precision of geocoded results, check the locations visually against a base map.

General tips to keep in mind when geocoding

Clearly determine your geocoding purpose, and the accuracy level required to meet this need.  If you just need to geocode your data to the city or country level, don’t complicate things with address information.  However, if address level accuracy is necessary, take the proper steps to ensure confidence in your result.

Have an ‘iterative’ mindset.  Geocoding may reveal errors or typos in your data, or expose the shortcomings of the geocoding method used.  Be prepared to re-geocode, and refine your data and geocoding process accordingly – several geocoding iterations may be necessary to achieve the desired result.

Be skeptical of your geocoding results.  Inspect actual address match locations against other data sources, like street basemaps.  Compare your results to more than one basemap if possible.  For example, if geocoded in ArcMap, import the results to Google Earth to see if they match GE’s basemap.

Conclusion

There are many Desktop and Online geocoders available, with different levels of free and for pay access.  As more and more business, governmental, and other entities are realizing that geocoding is a valuable thing to do for many reasons, this demand has caused an increase in the number of available geocoders, and has also commoditized geocoding to some degree, making the cost of geocoding batches of addresses is something to consider.  Below is a brief comparison of the pros (+) and cons (-) of just two geocoders, one Desktop and one Online:


ArcGIS Desktop:

+ Geocodes upwards of ~500,000 address per hour (desktop), and can handle millions of input data records.

+ Uses the comprehensive ESRI Streets Premium reference dataset.

+ Allows for geocoding to any geographic reference layer (for example census tracts, parcels, etc.) through building a custom geocoder.

+ Contains a review/rematch interface for refining results.

+ Highly selective in address placement.

+ Reports the accuracy level achieved with each geocoded address

- Expensive software with a steep learning curve

            

Batchgeo.com online:

+ Free and easy to use.

+ Uses the comprehensive, currently updated Google Maps reference dataset.

+ Makes a nice shareable, editable map of the results.

+ Allows for saving results in KML format

- Somewhat slow, and restricted to a maximum of 250 addresses allowed daily (unless you pay).

- Does not allow for geocoding against a custom GIS dataset (e.g. tracts, parcels).

- Less selective in address placement, potentially leading to more errors.

Google for work:

+ Free, and uses the comprehensive, currently updated Google Maps reference dataset.

+ Returns latitude, longitude locations of geocoded addresses, along with a precicision report.

+ Allows up to 2,500 addresses per day.

- Does not produce a map of the results

- Requires running a Python script to access the API. 

- Does not allow for geocoding against a custom GIS dataset (e.g. tracts, parcels).

- Less selective in address placement, potentially leading to more errors.

 

And finally, ESRI offers a lot of information on geocoding as well.

no links

no file attachments

Share