Thursday, September 26, 2013

Postgresql Radius Based Search using Latitude and Longitude for a Zip Code

2 comments
We have a table called "image_info" as follows. We have to find the images that are nearest to a given zip code.

\d image_info
                                    Table "public.image_info"
      Column       |       Type        |                         Modifiers                         
-------------------+-------------------+------------------------------------------------------------
 id                | bigint            | not null default nextval('image_info_id_seq'::regclass)
 content_type      | character varying |
 image_name        | character varying |
 zip_code          | character varying |
 Indexes:
    "image_info_pkey" PRIMARY KEY, btree (id)
    "zip_code_5digit_idx" btree (substr(zip_code::text, 1, 5))
    "zip_code_idx" btree (zip_code)
To achieve this, we used  data from commercial version of Maxmind's GeoIP Software.
Using Maxmind's GeoIP software, we can narrow down the LONG/LAT of an IP address to relative accuracy within 25 miles around 80% of the time. For more details, please visit
 http://stackoverflow.com/questions/4982125/matching-closest-longitude-latitude

In crawledphoto database, we created a table named "zip_code_based_lng_lat"
\d zip_code_based_lng_lat
Table "public.zip_code_based_lng_lat"
Column | Type | Modifiers
--------+------------------------+-----------
zip | character varying(50) |
state | character varying(2) |
city | character varying(100) |
type | character varying(2) |
lat | double precision |
lng | double precision |
Indexes:
"zip_code_based_lat_lng_gist_idx" gist (ll_to_earth(lat, lng))
"zip_code_based_lng_lat_zip_idx" btree (zip)

Here,
ZIP — ZIP Codes identify specific geographic delivery areas. A ZIP code can represent an area within a state, one which
crosses state boundaries, or a single building.
State — The USPS® standard 2-letter state or territory abbreviation.
City — The city, community, station or other name by which a 5-digit ZIP area is known.
Type — Defines the ZIP Code for delivery purposes.
      P = A ZIP code used only for P.O. Boxes.
      U = A unique* ZIP code.
      M = Identifies an APO/FPO ZIP Code.
      Blank = A standard ZIP Code.

*These are assigned to specific organizations and not to a city as a whole. Unique ZIP's are assigned to organizations such as Reader's Digest, educational institutions and government facilities. Use the UNQ.DAT file to find the organization name.
County FIPS — A Federal Information Processing Standard number is assigned to each county in the United States. Use this number to find the county name in the CNTY.DAT file. FIPS numbers begin with the 2-digit state code and are followed by the 3-digit county code.

Latitude — The geographic coordinate of a point measured in degrees north or south of the equator.
Longitude — The geographic coordinate of a point measured in degrees east or west of the Greenwich meridian

This table had two indexes.
zip_code_based_lng_lat_zip_idx - used to match the exact zip code
zip_code_based_lat_lng_gist_idx - a GIST index using "ll_to_earth" functionality of "earth_distance" extension.

CREATE INDEX zip_code_based_lat_lng_gist_idx ON zip_code_based_lng_lat USING gist(ll_to_earth(lat::float, lng::float)) ;


Here, "ll_to_earth" returns the location of a point on the surface of the Earth given its latitude (argument 1) and longitude (argument 2) in degrees.

Example data:

  zip  | state |         city         | type |   lat   |   lng  
-------+-------+----------------------+------+---------+---------
 00501 | NY    | HOLTSVILLE           | U    | 40.8172 | 73.0451
 00501 | NY    | I R S SERVICE CENTER | U    | 40.8172 | 73.0451
 ...


Now, we created a function which will return data using two parameters as zip code and radius distance in miles.

CREATE OR REPLACE FUNCTION get_zip_code_based_records_fn(prm_zip_code varchar, prm_distance_in_miles int default 10) RETURNS SETOF RECORD AS $$
DECLARE
 cur1 cursor ( vlat float, vlng float, v_area_km float) IS
SELECT g.*, z.state, z.city, z.type
,
earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float))::float/(1000*1.609) as distance_in_mile
FROM image_info as g,
zip_code_based_lng_lat as z
WHERE substr(g.zip_code,1,5) = z.zip
AND earth_box(ll_to_earth(vlat, vlng), v_area_km) @> ll_to_earth(lat::float, lng::float)
AND earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float)) < v_area_km
;
vlat float;
vlng float;
v_area_km float;
BEGIN
  SELECT lat, lng
  INTO vlat, vlng
  FROM zip_code_based_lng_lat
  WHERE zip = prm_zip_code
  LIMIT 1;
  v_area_km := (prm_distance_in_miles*1000)::float*1.609;
  FOR rec IN cur1(vlat, vlng, v_area_km)
  LOOP
    RETURN NEXT rec;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE 'plpgsql';

Here two parameters:
  • prm_zip_code - takes the zip code around which we want to get the data
  • prm_distance_in_miles  - takes an integer data denoting the circle distance radius around the "prm_zip_code" in miles; default 10 miles
In the function, we used a cursor using the query:

SELECT g.*, z.state, z.city, z.type,
earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float))::float/(1000*1.609) as distance_in_mile
FROM image_info as g,
          zip_code_based_lng_lat as z
WHERE
substr(g.zip_code,1,5) = z.zip
AND earth_box(ll_to_earth(vlat, vlng), v_area_km) @> ll_to_earth(lat::float, lng::float)
AND earth_distance(ll_to_earth(vlat, vlng), ll_to_earth(lat::float, lng::float)) < v_area_km

This cursor takes parameters such as
  • vlat - latitude of the given zip code at "prm_zip_code"
  • vlng - longitudeof the given zip code at "prm_zip_code"
  • v_area_km - circle area around "prm_zip_code"", user passes "prm_distance_in_miles" and the function converts it to KM and uses in the cursor

And uses two functions -
  • earth_box - Returns a box suitable for an indexed search using the cube @> operator for points within a given great circle distance of a location. Some points in this box are further than the specified great circle distance from the location, so a second check using earth_distance should be included in the query.
  • earth_distance - Returns the great circle distance between two points on the surface of the Earth
For more information, please visit http://www.postgresql.org/docs/9.2/static/earthdistance.html
Now, to get the data around 10 miles distance of zip code 18938, we use -
SELECT  *
FROM get_zip_code_based_records_fn('18938') as t
(
id bigint, content_type character varying,
          image_name character varying,
zip_code character varying ,
          distance_in_mile float
)
ORDER BY distance_in_mile
LIMIT 50

It uses the indexes that described above and give result within a second.


2 Responses so far

  1. Anonymous says:

    Good Blog..Thanks for saving my time. May I know how to get datas & distance in kilometers and as well as within 50 kilometers.

  2. AI Rumman says:

    I worked in this project almost 4 years back. However, reading the blog, it reminds me that we collected the data from
    https://www.maxmind.com/en/geoip2-services-and-databases

    For 50 kilometers, we may convert that into mile and pass as a parameter in the function.

Leave a Reply

Labels