Quick Start
The bulk of the PgOSM Flex user guide is written for *nix
users.
The Quick Start, however, includes an additional section Running on Windows
provides guidance for Windows users. Windows users are advised to read
this section before continuing to the Windows specific steps, as many concepts
are not repeated in the Windows instructions.
TLDR;
The following code block shows the commands needed to run PgOSM Flex. The sections below provide explanations of these commands.
mkdir ~/pgosm-data
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
# Ensure you have the latest Docker image
docker pull rustprooflabs/pgosm-flex
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_USER=$POSTGRES_USER \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
PgOSM via Docker
The PgOSM Flex Docker image is hosted on Docker Hub. The image includes all the pre-requisite software and handles all of the options, logic, and post-processing steps required. Features include:
- Automatic data download from Geofabrik and validation against checksum
- Custom Flex layers built in Lua
- Mix and match layers using Layersets
- Loads to Docker-internal Postgres, or externally defined Postgres
- Supports
osm2pgsql-replication
andosm2pgsql --append
mode - Export processed data via
pg_dump
for loading into additional databases
Docker usage
This section outlines a typical import using Docker to run PgOSM Flex.
Prepare
Create a directory to use a common location to share with the Docker container.
This is used to link to the internal path where the .osm.pbf
file, .md5
file,
and (optional) output .sql
files are saved.
mkdir ~/pgosm-data
Run
Set environment variables for the temporary Postgres connection in Docker. These are required for the Docker container to run.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
Start the pgosm
Docker container. At this point, Postgres / PostGIS
is available on port 5433
.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Check Docker container running
It is worth verifying the Docker container is successfully running with docker ps -a
.
Check for a STATUS
similar to Up 4 seconds
shown in the example output below.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7f80926a823 rustprooflabs/pgosm-flex "docker-entrypoint.s…" 5 seconds ago Up 4 seconds 0.0.0.0:5433->5432/tcp, :::5433->5432/tcp pgosm
Execute PgOSM Flex
Use docker exec
to run the processing for the Washington D.C subregion.
This example uses three (3) parameters to specify the total system RAM (8 GB)
along with a region/subregion.
- Total RAM for osm2pgsql, Postgres and OS (
8
) - Region (
north-america/us
) - Sub-region (
district-of-columbia
) (Optional)
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The above command takes roughly 1 minute to run if the PBF for today has already been downloaded. If the PBF is not downloaded it will depend on how long it takes to download the 17 MB PBF file + ~ 1 minute processing.
After processing
The processed OpenStreetMap data is also available in the Docker container on port 5433
.
You can connect and query directly in the Docker container.
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"
┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
The ~/pgosm-data
directory has two (2) files from a typical single run.
The PBF file and its MD5 checksum have been renamed with the date in the filename.
This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The docker exec
command uses the PGOSM_DATE
environment variable
to load these historic files.
If --pg-dump
option is used the output .sql
is also saved in
the ~/pgosm-data
directory.
This .sql
file can be loaded into any other database with PostGIS and the proper
permissions.
ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 18M Jan 21 03:45 district-of-columbia-2023-01-21.osm.pbf
-rw-r--r-- 1 root root 70 Jan 21 04:39 district-of-columbia-2023-01-21.osm.pbf.md5
-rw-r--r-- 1 root root 163M Jan 21 16:14 north-america-us-district-of-columbia-default-2023-01-21.sql
Meta table
PgOSM Flex tracks processing metadata in the osm.pgosm_flex
table. The initial import
has osm2pgsql_mode = 'create'
, the subsequent update has
osm2pgsql_mode = 'append'
.
SELECT osm_date, region, srid,
pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode
FROM osm.pgosm_flex
;
┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐
│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │
╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡
│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │
│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │
└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘
Explore data loaded
A peek at some of the tables loaded.
This query requires the
PostgreSQL Data Dictionary (PgDD) extension,
use \dt+ osm.*
in psql
for similar details.
SELECT s_name, t_name, rows, size_plus_indexes
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY t_name LIMIT 10;
┌────────┬──────────────────────┬────────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪════════╪═══════════════════╡
│ osm │ amenity_line │ 7 │ 56 kB │
│ osm │ amenity_point │ 5796 │ 1136 kB │
│ osm │ amenity_polygon │ 7593 │ 3704 kB │
│ osm │ building_point │ 525 │ 128 kB │
│ osm │ building_polygon │ 161256 │ 55 MB │
│ osm │ indoor_line │ 1 │ 40 kB │
│ osm │ indoor_point │ 5 │ 40 kB │
│ osm │ indoor_polygon │ 288 │ 136 kB │
│ osm │ infrastructure_point │ 884 │ 216 kB │
│ osm │ landuse_point │ 18 │ 56 kB │
└────────┴──────────────────────┴────────┴───────────────────┘
One table to rule them all
From the perspective of database design, the osm.unitable
option is the worst!
This table violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.
This style included in PgOSM Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know where you are looking.
Unitable is loaded with the everything
layerset. Feel free to create your own
customized layerset if needed.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--layerset=everything
The
unitable.lua
script included in this project was adapted from the unitable example from osm2pgsql. This version usesJSONB
instead ofHSTORE
, and takes advantage ofhelpers.lua
to easily customize SRID.
JSONB support
PgOSM-Flex uses JSONB
in Postgres to store the raw OpenStreetMap
key/value data (tags
column) and relation members (member_ids
).
The tags
column only exists in the osm.tags
and osm.unitable
tables.
The member_ids
column is included in:
osm.place_polygon
osm.poi_polygon
osm.public_transport_line
osm.public_transport_polygon
osm.road_line
osm.road_major
osm.road_polygon
Additional resources
Blog posts covering various details and background information.