At work we have a copy of the zip-codes.com Business database, which I tend to reference somewhat frequently due to the nature of projects I work on.
Today I needed a list of postal codes from major metro areas, which would be used to drive a test data generator. The data generator has access to a simpler, non-commercial zip code list, and can do radius searches, but thats about it. I needed one code per metro area, and it worked out best to have the one closest to the center of the metro areas.
What I ended up with was a script to select the zip code closest to the approximate center of the top N CBSA, looking only at populated zip codes.
Some notes:
PrimaryRecord ='P'
: As it comes from the data provider, the data is largely de-normalized. A zip code could have multiple records, when it is part of a place that has multiple names.- This query uses the faster pythagorean distance, which works ok over small distances and lower latitudes. To be more accurate, a great circle distance would be better.
WITH topPopulationAreas AS ( SELECT TOP 50 CBSA_Name, avg(Latitude) AS lat, avg(Longitude) AS lon FROM zipcode WHERE PrimaryRecord = 'P' AND CBSA_Name != '' AND Population > 0 GROUP BY CBSA_Name ORDER BY sum(Population) DESC ), zipDistance AS ( SELECT DISTINCT ZipCode, zc.CBSA_Name, sqrt(square(lat - latitude) + square(lon - longitude)) AS dist FROM zipcode zc JOIN topPopulationAreas iq ON iq.CBSA_Name = zc.CBSA_Name ), shortest AS ( SELECT CBSA_Name, min(dist) AS mindist FROM zipDistance GROUP BY CBSA_Name ) SELECT min(zd.ZipCode), zd.CBSA_Name, dist, Latitude, Longitude FROM zipDistance zd INNER JOIN shortest ON zd.dist = shortest.mindist AND zd.CBSA_Name = shortest.CBSA_Name INNER JOIN zipcode z ON z.ZipCode = zd.ZipCode AND z.PrimaryRecord = 'P' GROUP BY zd.CBSA_Name, dist, Latitude, Longitude
