Quick Start
The bulk of the PgOSM Flex user guide is written for *nix users.
The Quick Start, however, includes an additional section Running on Windows
provides guidance for Windows users. Windows users are advised to read
this section before continuing to the Windows specific steps, as many concepts
are not repeated in the Windows instructions.
TLDR;
The following code block shows the commands needed to run PgOSM Flex. The sections below provide explanations of these commands.
mkdir ~/pgosm-data
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
# Ensure you have the latest Docker image
docker pull rustprooflabs/pgosm-flex
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_USER=$POSTGRES_USER \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
PgOSM via Docker
The PgOSM Flex Docker image is hosted on Docker Hub. The image includes all the pre-requisite software and handles all of the options, logic, and post-processing steps required. Features include:
- Automatic data download from Geofabrik and validation against checksum
- Custom Flex layers built in Lua
- Mix and match layers using Layersets
- Loads to Docker-internal Postgres, or externally defined Postgres
- Supports
osm2pgsql-replicationandosm2pgsql --appendmode - Export processed data via
pg_dumpfor loading into additional databases
Docker usage
This section outlines a typical import using Docker to run PgOSM Flex.
Prepare
Create a directory to use a common location to share with the Docker container.
This is used to link to the internal path where the .osm.pbf file, .md5 file,
and (optional) output .sql files are saved.
mkdir ~/pgosm-data
Run
Set environment variables for the temporary Postgres connection in Docker. These are required for the Docker container to run.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
Start the pgosm Docker container. At this point, Postgres / PostGIS
is available on port 5433.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Check Docker container running
It is worth verifying the Docker container is successfully running with docker ps -a.
Check for a STATUS similar to Up 4 seconds shown in the example output below.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7f80926a823 rustprooflabs/pgosm-flex "docker-entrypoint.s…" 5 seconds ago Up 4 seconds 0.0.0.0:5433->5432/tcp, :::5433->5432/tcp pgosm
Execute PgOSM Flex
Use docker exec to run the processing for the Washington D.C subregion.
This example uses three (3) parameters to specify the total system RAM (8 GB)
along with a region/subregion.
- Total RAM for osm2pgsql, Postgres and OS (
8) - Region (
north-america/us) - Sub-region (
district-of-columbia) (Optional)
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The above command takes roughly 1 minute to run if the PBF for today has already been downloaded. If the PBF is not downloaded it will depend on how long it takes to download the 17 MB PBF file + ~ 1 minute processing.
After processing
The processed OpenStreetMap data is also available in the Docker container on port 5433.
You can connect and query directly in the Docker container.
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"
┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
The ~/pgosm-data directory has two (2) files from a typical single run.
The PBF file and its MD5 checksum have been renamed with the date in the filename.
This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The docker exec command uses the PGOSM_DATE environment variable
to load these historic files.
If --pg-dump option is used the output .sql is also saved in
the ~/pgosm-data directory.
This .sql file can be loaded into any other database with PostGIS and the proper
permissions.
ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 18M Jan 21 03:45 district-of-columbia-2023-01-21.osm.pbf
-rw-r--r-- 1 root root 70 Jan 21 04:39 district-of-columbia-2023-01-21.osm.pbf.md5
-rw-r--r-- 1 root root 163M Jan 21 16:14 north-america-us-district-of-columbia-default-2023-01-21.sql
Meta table
PgOSM Flex tracks processing metadata in the osm.pgosm_flex table. The initial import
has osm2pgsql_mode = 'create', the subsequent update has
osm2pgsql_mode = 'append'.
SELECT osm_date, region, srid,
pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode
FROM osm.pgosm_flex
;
┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐
│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │
╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡
│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │
│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │
└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘
Explore data loaded
A peek at some of the tables loaded.
This query requires the
PostgreSQL Data Dictionary (PgDD) extension,
use \dt+ osm.* in psql for similar details.
SELECT s_name, t_name, rows, size_plus_indexes
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY t_name LIMIT 10;
┌────────┬──────────────────────┬────────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪════════╪═══════════════════╡
│ osm │ amenity_line │ 7 │ 56 kB │
│ osm │ amenity_point │ 5796 │ 1136 kB │
│ osm │ amenity_polygon │ 7593 │ 3704 kB │
│ osm │ building_point │ 525 │ 128 kB │
│ osm │ building_polygon │ 161256 │ 55 MB │
│ osm │ indoor_line │ 1 │ 40 kB │
│ osm │ indoor_point │ 5 │ 40 kB │
│ osm │ indoor_polygon │ 288 │ 136 kB │
│ osm │ infrastructure_point │ 884 │ 216 kB │
│ osm │ landuse_point │ 18 │ 56 kB │
└────────┴──────────────────────┴────────┴───────────────────┘
One table to rule them all
From the perspective of database design, the osm.unitable option is the worst!
This table violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.
This style included in PgOSM Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know where you are looking.
Unitable is loaded with the everything layerset. Feel free to create your own
customized layerset if needed.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--layerset=everything
The
unitable.luascript included in this project was adapted from the unitable example from osm2pgsql. This version usesJSONBinstead ofHSTORE, and takes advantage ofhelpers.luato easily customize SRID.
JSONB support
PgOSM-Flex uses JSONB in Postgres to store the raw OpenStreetMap
key/value data (tags column) and relation members (member_ids).
The tags column only exists in the osm.tags and osm.unitable tables.
The member_ids column is included in:
osm.place_polygonosm.poi_polygonosm.public_transport_lineosm.public_transport_polygonosm.road_lineosm.road_majorosm.road_polygon
Additional resources
Blog posts covering various details and background information.