Let's map, baby!

"There's probably a library for that..."

                                  Rawan Hassunah


It's common practice to do a thorough analysis of a location before opening up a retail store - or at least it should be. This is something I'm currently working and on, and will be sharing a few snippets of my process working with location data (for the first time, may I add!). The nerd inside of me is squealing...

 

Although many datasets were used for the purpose of this analysis including demographic data and transaction data, the main (and challenging) component was the trade area. The simplest way to define a trade area is through distance i.e. a 10km radius around a specific point.

 

"As The Crow Flies" Distance

I was coding out the formula for the Vincenty distance between two geo points when I had a thought: Python probably has a library for that. Google confirmed. :) 

Geopy is a geocoding web service. Click here for documentation.

There are two distance metrics that are commonly used, both available on Geopy:

  • Great-circle distance. This assumes the Earth's surface is a perfect sphere and measures the shortest distance on the sphere's surface.
    • Read more about it here.
  • Vincenty distance. This is an iterative method that was coined by geodesist Thaddeus Vincenty and assumes that the Earth's surface is an oblate (squashed) spheroid. Though exaggerated, it's still much more accurate than the previous and is the distance metric I chose to use.
    • Read more about it here.

Note: if you want to bypass the API, I found a class that calculates the vincenty distance between two points, written by machine learning engineer, Nathan Rooy, which is excellent.


Steps


Step 1: Import all necessary packages, classes and functions

import pandas as pd
from geopy.distance import vincenty
from geopy.geocoders import Nominatim

Step 2: Find the latitude and longitude of an address

def geolocate(address):
''' 
INPUT: address - street number street name, city, state 
OUTPUT: lat, long 
Returns the longitude and latitude for a specific address. 
''' 
geolocator = Nominatim() 
location = geolocator.geocode(address) 
return location.latitude, location.longitude

Step 3: Find the distance between two points

I used a dataframe of all zip codes in the US and their corresponding latitude and longitude, and found the distance between all the zip codes and the address I am finding a trade area for.

def vincenty_distance(df, centroid):
'''
INPUT: dataframe of lat, lons and centroid lat, lon
OUTPUT: dataframe with a new column, 
which measures the distance between
the centroid and the datapoints.
'''
df['distance'] = np.nan
df['distance'] = df.apply(lambda x: vincenty(centroid, x[1]), axis=1)
return df

Step 4: Filter on distance & plot geolocations on map

I filtered my dataframe based on the distance column, which is by default calculated in km. 

 

Because my team is refusing to become a reporting team, we are in the process of building a dashboard tool that will allow other teams to input any location to be assessed. The map will be included in the dashboard.

Trade area example using Articque.

Trade area example using Articque.

Technically, you could show each zip code in the trade area as a point on the map. Luckily, my team had previously produced a polygon shape for each zip code. See an example location above.

 

Other Distance Metrics (Driving, Walking, Etc)

If you want to look at other distance metrics i.e. driving, walking, you can use the Google Maps Directions API, which allows you to compute both time and distance. To refine the trade areas further, we used a mix of walking and driving times depending on the location of the store, the type of store, etc.