Getting Started with Geo-Data: A Practical Guide

Ever wondered how geo-data works? If you have ever used any of the following on a device or PC – maps for navigation, ride-sharing or food delivery services, then likely you have come across geo-data from the lens of a consumer. But how does geo-data work from the perspective of someone in a data role such as a Data Analyst, Engineer or Scientist? Once it is understood, geo-data becomes a powerful mechanism to answer many location based questions like Which customers sit within a specific delivery radius? What happens if this expands by x distance? Do sales territories overlap?

This article is an explainer on getting started with geo-data and aims to cover:

Storage: What exactly is geo-data? How is it stored in a database? What does it look like?
Querying: What functions are available to query and analyse geo-data?
Visualizing: Mini case study demonstrating each of the above points.

There are two broad types of Geo-data,

Vector – This is coordinate‑based
Raster – This is grid and value based

Raster is gridded imagery and often used for satellite photos, elevation models but is out of scope here. This article covers the Vector data type from here on in.

Storage

What exactly is Geo-data?

Geo-data in its simplest form is a set of coordinates. Think of a treasure map drawn containing gridlines. It features:

Dots marked with an “X” indicating where the treasure is.
Lines trace the path to get there.
Shapes outline the objects that can be expected along the way.

This map is a good, basic representation of geo-data in visual form. Behind the visual are a set of numbers that represent coordinates along a 2-dimensional grid. The ‘X’ marks a specific point or set of coordinates on the map, the line marks a series of coordinates specified in an order and the shape is a line that loops back to its starting point.

Figure 1 – Geo-data is simply numbers representing coordinates on a map

Black and white globe illustration with grid lines.

Figure 2 – Lat/long coordinates on Earth form the foundation of geo-data

Working with geo-data is very similar to this treasure map. Except the grid lines are located across Earth and the coordinates are widely known as Latitude and Longitude. The 3 common types of Vector geo-data were described earlier and they are Points, Lines and Shapes.

Geo-data type	Description	Example
Point	A single set latitude/longitude coordinates.	Location of building
Line	Coordinates in a sequence.	River
Polygon (shape)	Coordinated in a sequence chained together that covers an area	Delivery radius

Table 1- Three commonly used types of geo-data

How is geo-data stored in a database?

Now that we have learnt that geo-data is essentially numbers, how is it stored in a database? It’s not stored as plain text or even integer/decimals. Most database engines have geospatial extensions that facilitate the storage and analysis of geo-data. One such database engine is PostgreSQL, an open-source engine with the PostGIS (PostgreSQL Geographic Information System) framework that allows for spatial objects. This means it knows how to read certain objects and interpret them as latitude/longitude coordinates. PostGIS is unique to PostgreSQL but the functions used in this guide are fairly universal across most geospatial equivalent frameworks in other database engines. Across the 3 types of geo-data, they are stored as:

Geo-data type	Stored in database
Point	POINT(151.2093 ‑33.8688)
Line	LINESTRING(151.2485 ‑33.1423, 149.1209 ‑35.2809)
Polygon (shape)	POLYGON((144.9543 ‑37.8101, 144.9758 ‑37.8090, 144.9843 ‑37.8222, 144.9518 ‑37.8248, 144.9535 ‑37.8650))

Table 2 – How geo-data types are stored in a database

In the table above, lat/lon coordinates are specified within some keywords (POINT, LINESTRING, POLYGON). These are type labels and are part of a broader format of WKT (well known text) that informs the database engine the object that is being dealt with. ie, If POINT(lon lat), then WKT is informing the database of a single coordinate, if POLYGON((lon1 lat1, … lonN latN)), then it’s being informed of shape made up of multiple coordinates that enclose an area. Coordinates also require additional configuration and all examples in this article use WGS‑84 (SRID 4326).

Why do decimals matter?

From the examples above, each set of coordinates were rounded off to 4 decimal places. Each extra decimal place makes a significant difference to how ‘precise’ the location is. The following table is an approximation on this:

Decimals	Approximate precision
2	~1km
4	~10m
6	~10cm

Table 3 – a couple of decimals can be the difference between a few centimeters and 1km

What does Geo-data look like?

The open source database engine, PostgreSQL, is used in the examples mentioned in this article configured with the PostGIS extension.

The PostGIS extension needs to be installed and created in PostgreSQL as an initial set up step.

-- Enable PostGIS once per database
CREATE EXTENSION IF NOT EXISTS postgis;

The code below is an example of creating a table and inserting geo-data:

DROP TABLE IF EXISTS geo_demo;
CREATE TABLE geo_demo (
    id         SERIAL PRIMARY KEY,
    name       TEXT,
    loc_point  GEOMETRY(POINT,     4326),   -- now geometry
    path_line  GEOMETRY(LINESTRING,4326),
    area_poly  GEOMETRY(POLYGON,   4326)
);

INSERT INTO geo_demo (name, loc_point, path_line, area_poly)
VALUES (
    'Sydney Example',

    -- POINT
    ST_GeomFromText('POINT(151.2131 -33.8568)', 4326),

    -- LINESTRING
    ST_GeomFromText('LINESTRING(151.2131 -33.8568, 151.1958 -33.8523)', 4326),

    -- POLYGON  (first vertex repeats at the end)
    ST_GeomFromText(
      'POLYGON((151.1984 -33.8600,
                151.2300 -33.8600,
                151.2300 -33.8750,
                151.1984 -33.8750,
                151.1984 -33.8600))',
      4326)
);

Observe that the format of the geo-data fields is Geometry followed by the geometry types mentioned earlier (point, linestring and polygon). 4326 is the SRID – or ID number indicating which coordinate system it is interpreted as.

Selecting the results directly produces a result in binary form which isn’t human readable. This is how PostGIS stores the data in its most raw form and is the default result format due to its efficiency for retrieval.

SELECT loc_point,
       path_line,
       area_poly
FROM geo_demo;

Results:

loc_point	path_line	area_poly
0101000020E6100000E25817B7D1E662403D9B559FABED40C0	0102000020E610000002000000E25817B7D1E662403D9B559FABED40C0D1915CFE43E66240BE30992A18ED40C0	0103000020E61000000100000005000000D8F0F44A59E66240AE47E17A14EE40C08FC2F5285CE76240AE47E17A14EE40C08FC2F5285CE762400000000000F040C0D8F0F44A59E662400000000000F040C0D8F0F44A59E66240AE47E17A14EE40C0

Parsing the data using one of the spatial type (ST_…”) functions generates the same data which was inserted into the table.

SELECT name,
       ST_AsText(loc_point) AS point_wkt,
       ST_AsText(path_line) AS line_wkt,
       ST_AsText(area_poly) AS poly_wkt
FROM geo_demo;

name	point_wkt	line_wkt	poly_wkt
Sydney Example	POINT(151.2131 -33.8568)	LINESTRING(151.2131 -33.8568,151.1958 -33.8523)	POLYGON((151.1984 -33.86,151.23 -33.86,151.23 -33.875,151.1984 -33.875,151.1984 -33.86))

Now that it’s understood how geo-data is stored, the next section goes into how it can be queried.

Querying

Core spatial functions

Storing the data only gets us coordinates sitting in the database; querying them is when analysis can be done. In this section spatial functions are covered. One spatial function (ST_AsText) was introduced in the previous section as a way to parse binary data into human readable geo-data. Some other commonly used spatial functions are:

Function	Description	Example use case
ST_Contains(A, B)	Does shape A fully surround B?	Assign customers to delivery zones
ST_Within(A, B)	Is shape A inside B (inverse of ST_contains)	Same as above, just reads nicer in SQL
ST_Distance(A, B)	How far apart are A & B?	Find the nearest depot to an order destination
ST_DWithin(A, B, d)	Are A & B within d metres?	Return customers inside a 5km radius
ST_Buffer(A, d)	Draw a d-metre circle around A	Create a 1km radius around a location

Table 4 – commonly used functions to query and analyse geo-data

The next section demonstrates the use of these functions with some examples and how the results look both in a table as well as in PostGIS’s geometric map overlay. Querying can help extract results, but how can this be visualized geographically? This is explored in the next section in an example case study.

Visualising

Case study

The Operations team in a logistics company are looking to expand into a new location. One consideration is a 2hr delivery service within the CBD of Sydney and they have gathered a database of potential customers. One of the questions is, “What percentage of potential customers will be covered within this delivery zone?” Rather than manually searching each customer on a map, PostGIS can help answer that (using the ST functions from earlier) with a visual overlay.

The first queries to execute are to ensure that PostGIS is enabled and resetting the creation of the following tables – depots, delivery_zones and customers.

/*---------------------------------------------------------------
0) Make sure PostGIS is available in this database
----------------------------------------------------------------*/
CREATE EXTENSION IF NOT EXISTS postgis;

/*----------------------------------------------------------------
1) Drop old demo tables so the script is idempotent
----------------------------------------------------------------*/
DROP TABLE IF EXISTS depots         CASCADE;
DROP TABLE IF EXISTS customers      CASCADE;
DROP TABLE IF EXISTS delivery_zones CASCADE;

/*----------------------------------------------------------------
2) Build demo schema — geometry, SRID 4326 (WGS‑84)
----------------------------------------------------------------*/
CREATE TABLE depots (
    depot_id SERIAL PRIMARY KEY,
    name     TEXT,
    loc      GEOMETRY(POINT, 4326)
);

CREATE TABLE delivery_zones (
    zone_id  SERIAL PRIMARY KEY,
    zone_name TEXT,
    area      GEOMETRY(POLYGON, 4326)
);

CREATE TABLE customers (
    cust_id  SERIAL PRIMARY KEY,
    name     TEXT,
    loc      GEOMETRY(POINT, 4326)
);

The next section creates some dummy data and inserts it into each of the tables.

The data in the depot table represents the geo-data point of the warehouse where inventory is physically stored.

The Delivery_zones table is the polygon shape of the delivery radius around the depot where 2hr delivery will be possible.

The final table contains data on potential customers as well as the point location of their delivery addresses.

/*----------------------------------------------------------------
3) Insert sample data
   • One depot near Sydney CBD
   • One polygon delivery zone (CBD rectangle)
   • 3 customers: inside, on the edge, outside
----------------------------------------------------------------*/
-- Depot
INSERT INTO depots (name, loc)
VALUES ('Sydney Depot',
        ST_GeomFromText('POINT(151.2070 -33.8675)', 4326));

-- Delivery Zone: rough rectangle around CBD
INSERT INTO delivery_zones (zone_name, area)
VALUES ('CBD Zone',
  ST_GeomFromText('POLYGON((
     151.1984 -33.8600,
     151.2300 -33.8600,
     151.2300 -33.8750,
     151.1984 -33.8750,
     151.1984 -33.8600
  ))', 4326));

-- Three customers
INSERT INTO customers (name, loc)
VALUES
  ('Alice Inside',
    ST_GeomFromText('POINT(151.2100 -33.8650)', 4326)),  -- inside polygon
  ('Bob Edge',
    ST_GeomFromText('POINT(151.1984 -33.8650)', 4326)),  -- on left edge
  ('Charlie Outside',
    ST_GeomFromText('POINT(151.2500 -33.8800)', 4326));  -- outside

/*---------------------------------------------------------------
4) Create spatial indexes (important for speed)
----------------------------------------------------------------*/
CREATE INDEX depots_loc_gix   ON depots        USING GIST (loc);
CREATE INDEX zones_area_gix   ON delivery_zones USING GIST (area);
CREATE INDEX cust_loc_gix     ON customers     USING GIST (loc);

The next few queries will not only help determine the answers of the original business question but also uncover additional analysis.

ST_Contains is used to check which customers are within the delivery zone. The first parameter used in this function is the polygon shape and the 2nd are the coordinates of the customers. This query alone answers theoriginal business question on what percentage of potential customers would be in the 2hr delivery zone. 33% or ⅓ of customers are within the delivery zone.

-- 5.1  ST_CONTAINS / ST_WITHIN  (point‑in‑polygon)
SELECT c.name AS customer,
       z.zone_name,
       ST_Contains(z.area, c.loc) AS inside
FROM customers c
CROSS JOIN delivery_zones z
ORDER BY customer;

Results:

customer	zone_name	inside
Alice Inside	CBD Zone	true
Bob Edge	CBD Zone	false
Charlie Outside	CBD Zone	false

An alternative way to answer this question is only selecting those customers within the delivery zone instead of listing all customers and flagging each one like the last example. The function for this is ST_Within which uses the same parameters (polygon and point) but in reverse order to ST_Contains.

-- Equivalent: ST_WITHIN(point, poly)
SELECT name AS customer
FROM customers, delivery_zones
WHERE ST_Within(loc, area);      -- returns Alice Inside & Bob Edge (on boundary)

Result:

Map of Sydney city center and surroundings