Sql Quickie: Find zip codes near the center of a region

At work we have a copy of the zip-codes.com Business database, which I tend to reference somewhat frequently due to the nature of projects I work on.

Today I needed a list of postal codes from major metro areas, which would be used to drive a test data generator. The data generator has access to a simpler, non-commercial zip code list, and can do radius searches, but thats about it. I needed one code per metro area, and it worked out best to have the one closest to the center of the metro areas.

What I ended up with was a script to select the zip code closest to the approximate center of the top N CBSA, looking only at populated zip codes.

Some notes:

  • PrimaryRecord ='P' : As it comes from the data provider, the data is largely de-normalized. A zip code could have multiple records, when it is part of a place that has multiple names.
  • This query uses the faster pythagorean distance, which works ok over small distances and lower latitudes. To be more accurate, a great circle distance would be better.
WITH
  topPopulationAreas AS (
   SELECT TOP 50 CBSA_Name, avg(Latitude) AS lat, avg(Longitude) AS lon
   FROM zipcode
   WHERE PrimaryRecord = 'P' AND CBSA_Name != '' AND Population > 0
   GROUP BY CBSA_Name
   ORDER BY sum(Population) DESC
 ),
  zipDistance AS (
   SELECT DISTINCT ZipCode, zc.CBSA_Name, sqrt(square(lat - latitude) + square(lon - longitude)) AS dist
   FROM zipcode zc
   JOIN topPopulationAreas iq ON iq.CBSA_Name = zc.CBSA_Name
 ),
  shortest AS (
   SELECT CBSA_Name, min(dist) AS mindist
   FROM zipDistance
   GROUP BY CBSA_Name
 )
SELECT min(zd.ZipCode), zd.CBSA_Name, dist, Latitude, Longitude
FROM zipDistance zd
 INNER JOIN shortest ON zd.dist = shortest.mindist AND zd.CBSA_Name = shortest.CBSA_Name
 INNER JOIN zipcode z ON z.ZipCode = zd.ZipCode AND z.PrimaryRecord = 'P'
GROUP BY zd.CBSA_Name, dist, Latitude, Longitude
cbsaCenterPoints

Leave a Reply