PgOSM Flex
PgOSM Flex (GitHub) provides high quality OpenStreetMap datasets in PostGIS using the osm2pgsql Flex output. This project provides a curated set of Lua and SQL scripts to clean and organize the most commonly used OpenStreetMap data, such as roads, buildings, and points of interest (POIs).
Running PgOSM Flex is easy via the PgOSM Docker image hosted on Docker Hub.
- The quick start shows how easy it is to get started
- Change how PgOSM Flex runs with common customizations
- Customize layersets to change what data you load
- Configure connection to external database, and use replication
Project goals
- High quality spatial data
- Reliable
- Easy to customize
- Easy to use
Project decisions
A few decisions made in this project:
- ID column is
osm_id
- Geometry column named
geom
- Defaults to same units as OpenStreetMap (e.g. km/hr, meters)
- Data not included in a dedicated column is available from
osm.tags.tags
(JSONB
) - Points, Lines, and Polygons are not mixed in a single table
- Tracks latest Postgres, PostGIS, and osm2pgsql versions
This project's approach is to do as much processing in the Lua styles passed along to osm2pgsql, with post-processing steps creating indexes, constraints and comments.
Versions Supported
Minimum versions supported:
- Postgres 12
- PostGIS 3.0
This project will attempt, but not guarantee, to support PostgreSQL 12 until it reaches it EOL support.
The osm2pgsql version requirement is no longer relevant. Users of the Docker image naturally use the latest version of osm2pgsql at the time the Docker image was created.
Minimum Hardware
RAM
osm2pgsql requires at least 2 GB RAM.
Storage
Fast SSD drives are strongly recommended. It should work on slower storage devices (HDD, SD, etc), however the osm2pgsql-tuner package used to determine the best osm2pgsql command assumes fast SSDs.
PgOSM Flex Community Code of Conduct
Why have a Code of Conduct?
Online communities include people from many different backgrounds. The PgOSM Flex contributors are committed to providing a friendly, safe and welcoming environment for all, regardless of age, disability, gender, nationality, ethnicity, religion, sexuality, or similar personal characteristic.
The first goal of the Code of Conduct is to specify a baseline standard of behavior so that people with different social values and communication styles can talk about PgOSM Flex effectively, productively, and respectfully.
The second goal is to provide a mechanism for resolving conflicts in the community when they arise.
The third goal of the Code of Conduct is to make our community welcoming to people from different backgrounds. Diversity is critical to the project; for PgOSM Flex to be successful, it needs contributors and users from all backgrounds.
With that said, a healthy community must allow for disagreement and debate. The Code of Conduct is not a mechanism for people to silence others with whom they disagree.
Where does the Code of Conduct apply?
If you participate in or contribute to the PgOSM Flex ecosystem in any way, you are encouraged to follow the Code of Conduct while doing so.
Explicit enforcement of the Code of Conduct applies to the PgOSM Flex GitHub project and code reviews.
Values
These are the values to which people in the PgOSM Flex should aspire.
- Be friendly and welcoming
- Be patient
- Remember that people have varying communication styles and that not everyone is using their native language. (Meaning and tone can be lost in translation.)
- Be thoughtful
- Productive communication requires effort. Think about how your words will be interpreted.
- Remember that sometimes it is best to refrain entirely from commenting.
- Be respectful
- In particular, respect differences of opinion.
- Be charitable
- Interpret the arguments of others in good faith, do not seek to disagree.
- When we do disagree, try to understand why.
- Avoid destructive behavior:
- Derailing: stay on topic; if you want to talk about something else, start a new conversation.
- Unconstructive criticism: don't merely decry the current state of affairs; offer—or at least solicit—suggestions as to how things may be improved.
- Snarking (pithy, unproductive, sniping comments)
- Discussing potentially offensive or sensitive issues unless directly technically relevant; this all too often leads to unnecessary conflict.
- Microaggressions: brief and commonplace verbal, behavioral and environmental indignities that communicate hostile, derogatory or negative slights and insults to a person or group.
People are complicated. You should expect to be misunderstood and to misunderstand others; when this inevitably occurs, resist the urge to be defensive or assign blame. Try not to take offense where no offense was intended. Give people the benefit of the doubt. Even if the intent was to provoke, do not rise to it. It is the responsibility of all parties to de-escalate conflict when it arises.
Unwelcome behavior
These actions are explicitly forbidden in PgOSM Flex spaces:
- Insulting, demeaning, hateful, or threatening remarks.
- Discrimination based on age, disability, gender, nationality, ethnicity, religion, sexuality, or similar personal characteristic.
- Bullying or systematic harassment.
- Revealing private information about other participants without explicit permission ("doxxing").
- Unwelcome sexual advances.
- Incitement to any of these.
Moderation
The PgOSM Flex spaces are not free speech venues; they are for discussion about PgOSM Flex.
When using the PgOSM Flex spaces you should act in the spirit of the values. If you conduct yourself in a way that is explicitly forbidden by the CoC, you will be warned and asked to stop. If you do not stop, you will be removed from our community spaces temporarily. Repeated, willful breaches of the CoC will result in a permanent ban.
Reporting issues
The PgOSM Flex maintainers are responsible for handling conduct-related issues. Their goal is to de-escalate conflicts and try to resolve issues to the satisfaction of all parties.
If you encounter a conduct-related issue, you should report it to the maintainers by sending them all an email. In the event that you wish to make a complaint against a maintainer, you may instead contact the other maintainers.
Note that the goal of the Code of Conduct and the maintainers is to resolve conflicts in the most harmonious way possible. We hope that in most cases issues may be resolved through polite discussion and mutual agreement. Bans and other forceful measures are to be employed only as a last resort.
Changes to the Code of Conduct should be proposed as pull requests.
Summary
- Treat everyone with respect and kindness.
- Be thoughtful in how you communicate.
- Don’t be destructive or inflammatory.
- If you encounter an issue, please mail the maintainers.
Acknowledgements
This document is based on the OpenStreetMap Carto Code of Conduct, which in turn has parts derived from the Code of Conduct documents of the Go Community, Django, FreeBSD, and Rust projects.
This document is licensed under the Creative Commons Attribution 3.0 License.
Contributing
We encourage pull requests (PRs) from everyone.
Fork the project into your own repo, create a topic branch there and then make
one or more pull requests back to the main repository targeting the dev
branch.
Your PR can then be reviewed and discussed.
Helpful: Run make
in the project root directory and ensure tests pass. If tests are not passing and you need help resolving, please mention this in your PR.
Adding new feature layers
Checklist for adding new feature layers:
- Create
flex-config/style/<feature>.lua
- Create
flex-config/sql/<feature>.sql
- Update
flex-config/run-no-tags.lua
- Update
flex-config/run-no-tags.sql
- Update
db/qc/features_not_in_run_all.sql
- Add relevant
tests/sql/<feature_queries>.sql
- Add relevant
tests/expected/<feature_queries>.out
Quick Start
See the Docker Usage section below for an explanation of these commands.
mkdir ~/pgosm-data
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
# Ensure you have the latest Docker image
docker pull rustprooflabs/pgosm-flex
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
PgOSM via Docker
The PgOSM Flex Docker image is hosted on Docker Hub. The image includes all the pre-requisite software and handles all of the options, logic, an post-processing steps required. Features include:
- Automatic data download from Geofabrik and validation against checksum
- Custom Flex layers built in Lua
- Mix and match layers using Layersets
- Loads to Docker-internal Postgres, or externally defined Postgres
- Supports
osm2pgsql-replication
andosm2pgsql --append
mode - Export processed data via
pg_dump
for loading into additional databases
Docker usage
This section outlines a typical import using Docker to run PgOSM Flex.
Create directory for the .osm.pbf
file, output .sql
file, log output, and
the osm2pgsql command ran.
mkdir ~/pgosm-data
Set environment variables for the temporary Postgres connection in Docker. These are required for the Docker container to run.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
Start the pgosm
Docker container. At this point, Postgres / PostGIS
is available on port 5433
.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Use docker exec
to run the processing for the Washington D.C subregion.
This example uses three (3) parameters to specify the total system RAM (8 GB)
along with a region/subregion.
- Total RAM for osm2pgsql, Postgres and OS (
8
) - Region (
north-america/us
) - Sub-region (
district-of-columbia
) (Optional)
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The above command takes roughly 1 minute to run if the PBF for today has already been downloaded. If the PBF is not downloaded it will depend on how long it takes to download the 17 MB PBF file + ~ 1 minute processing.
After processing
The processed OpenStreetMap data is also available in the Docker container on port 5433
.
You can connect and query directly in the Docker container.
psql -h localhost -p 5433 -d pgosm -U postgres -c "SELECT COUNT(*) FROM osm.road_line;"
┌───────┐
│ count │
╞═══════╡
│ 39865 │
└───────┘
The ~/pgosm-data
directory has two (2) files from a typical single run.
The PBF file and its MD5 checksum have been renamed with the date in the filename.
This enables loading the file downloaded today
again in the future, either with the same version of PgOSM Flex or the latest version. The docker exec
command uses the PGOSM_DATE
environment variable
to load these historic files.
If --pg-dump
option is used the output .sql
is also saved in
the ~/pgosm-data
directory.
This .sql
file can be loaded into any other database with PostGIS and the proper
permissions.
ls -alh ~/pgosm-data/
-rw-r--r-- 1 root root 18M Jan 21 03:45 district-of-columbia-2023-01-21.osm.pbf
-rw-r--r-- 1 root root 70 Jan 21 04:39 district-of-columbia-2023-01-21.osm.pbf.md5
-rw-r--r-- 1 root root 163M Jan 21 16:14 north-america-us-district-of-columbia-default-2023-01-21.sql
Meta table
PgOSM Flex tracks processing metadata in the osm.pgosm_flex
table. The initial import
has osm2pgsql_mode = 'create'
, the subsequent update has
osm2pgsql_mode = 'append'
.
SELECT osm_date, region, srid,
pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode
FROM osm.pgosm_flex
;
┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐
│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │
╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡
│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │
│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │
└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘
Explore data loaded
A peek at some of the tables loaded.
This query requires the
PostgreSQL Data Dictionary (PgDD) extension,
use \dt+ osm.*
in psql
for similar details.
SELECT s_name, t_name, rows, size_plus_indexes
FROM dd.tables
WHERE s_name = 'osm'
ORDER BY t_name LIMIT 10;
┌────────┬──────────────────────┬────────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪════════╪═══════════════════╡
│ osm │ amenity_line │ 7 │ 56 kB │
│ osm │ amenity_point │ 5796 │ 1136 kB │
│ osm │ amenity_polygon │ 7593 │ 3704 kB │
│ osm │ building_point │ 525 │ 128 kB │
│ osm │ building_polygon │ 161256 │ 55 MB │
│ osm │ indoor_line │ 1 │ 40 kB │
│ osm │ indoor_point │ 5 │ 40 kB │
│ osm │ indoor_polygon │ 288 │ 136 kB │
│ osm │ infrastructure_point │ 884 │ 216 kB │
│ osm │ landuse_point │ 18 │ 56 kB │
└────────┴──────────────────────┴────────┴───────────────────┘
One table to rule them all
From the perspective of database design, the osm.unitable
option is the worst!
This table violates all sorts of best practices established in this project
by shoving all features into a single unstructured table.
This style included in PgOSM Flex is intended to be used for troubleshooting and quality control. It is not intended to be used for real production workloads! This table is helpful for exploring the full data set when you don't really know what you are looking for, but you know where you are looking.
Unitable is loaded with the everything
layerset. Feel free to create your own
customized layerset if needed.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--layerset=everything
The
unitable.lua
script include in in this project was adapted from the unitable example from osm2pgsql. This version uses JSONB instead of HSTORE and takes advantage ofhelpers.lua
to easily customize SRID.
JSONB support
PgOSM-Flex uses JSONB
in Postgres to store the raw OpenStreetMap
key/value data (tags
column) and relation members (member_ids
).
The tags
column only exists in the osm.tags
and osm.unitable
tables.
The member_ids
column is included in:
osm.place_polygon
osm.poi_polygon
osm.public_transport_line
osm.public_transport_polygon
osm.road_line
osm.road_major
osm.road_polygon
Additional resources
Blog posts covering various details and background information.
- Book Release! Mastering PostGIS and OpenStreetMap
- Better OpenStreetMap places in PostGIS
- Improved OpenStreetMap data structure in PostGIS
- Hands on with osm2pgsql's new Flex output.
Customize PgOSM Flex
Common Customizations
A major goal of PgOSM Flex is support a wide range of use cases for using OpenStreetMap data in PostGIS. This chapter explores a few ways PgOSM Flex can be customized.
Selecting region and subregion
The most used customization is the region and subregion selection.
The examples throughout this project's documentation use
the --region=north-america/us
and --subregion=district-of-columbia
because it is a small region that downloads and imports quickly.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
By default PgOSM Flex will attempt to download the necessary data files
from Geofabrik's download server.
Navigate the Region/Sub-region structure on Geofabrik to determine
exactly what --region
and --subregion
options to choose.
This can be a bit confusing as larger subregions can contain smaller subregions.
Feel free to start a discussion if you need help figuring this part out!
If you want to load the entire United States subregion, instead of
the District of Columbia subregion, the docker exec
command is changed to the
following.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america \
--subregion=us
For top-level regions, such as North America, leave off the --subregion
option.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america
Specific input file
The automatic Geofabrik download can be overridden by providing PgOSM Flex
with the path to a valid .osm.pbf
file using --input-file
.
This option overrides the default file handling, archiving, and MD5
checksum validation. With --input-file
you can use a custom osm.pbf
you created, or use it to simply remove the need for an internet connection
from the instance running the processing.
Note: The
--region
option is always required, the--subregion
option can be used with--input-file
to put the information in thesubregion
column ofosm.pgosm_flex
.
Small area / custom extract
Some of the smallest subregions provided by Geofabrik are quite large compared
to the area of interest for a project.
The osmium
tool makes it quick and easy to
extract a bounding box.
The following example extracts an area roughly around Denver, Colorado.
It takes about 3 seconds to extract the 3.2 MB denver.osm.pbf
output from
the 239 MB input.
osmium extract --bbox=-105.0193,39.7663,-104.9687,39.7323 \
-o denver.osm.pbf \
colorado-2023-04-18.osm.pbf
The PgOSM Flex procesing time for the smaller Denver region takes less than 20 seconds on a typical laptop, versus 11 minutes for all of Colorado.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=custom \
--subregion=denver \
--input-file=denver.osm.pbf \
--layerset=everything
Customize load to PostGIS
There are a few ways to customize exactly how data is loaded to PostGIS / Postgres.
SRID
PgOSM Flex defaults to SRID 3857 matching the default osm2pgsql behavior.
This can be customized using --srid 4326
or any other SRID supported by
osm2pgsql and PostGIS.
Language
The --language
option enables defining a preferred language for OpenStreetMap
names. If --language=en
is defined, PgOSM Flex's helper.get_name()
function will use name:en
if it exists. The usage and effect
of this option is shown in this comment.
Using -e PGOSM_LANGUAGE=kn
for U.S. West results in most state labels picking
up the Kannada language option. The states without a name:kn
default
to the standard name selection logic.
Data only
The --data-only
option skips creating optional data structures in the target
database. This includes the helper tables in the pgosm
schema and the
QGIS layer style table.
Skip nested place polygons
The nested place polygon calculation (explained in this post) adds minimal overhead to smaller regions, e.g. Colorado with a 225 MB PBF input file. Larger regions, such as North America (12 GB PBF), are impacted more severely as a difference in processing time. Calculating nested place polygons for Colorado adds less than 30 seconds on an 8 minute process, taking about 5% longer. A larger region, such as North America, can take 33% longer adding more than an hour and a half to the total processing time. See the performance section for more details.
Use --skip-nested
to bypass the calculation of nested admin polygons.
Use --pg-dump
to export data
The
--pg-dump
option was added in 0.7.0. Prior versions defaulted to usingpg_dump
and provided a--skip-dump
option to override. The default now is to only usepg_dump
when requested. See #266 for more.
A .sql
file can be created using pg_dump
as part of the processing
for easy loading into one or more external Postgres databases.
Add --pg-dump
to the docker exec
command to enable this feature.
The following example
creates an empty myosm
database to load the processed and dumped OpenStreetMap
data.
psql -d postgres -c "CREATE DATABASE myosm;"
psql -d myosm -c "CREATE EXTENSION postgis;"
psql -d myosm \
-f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-default-2023-01-21.sql
The above assumes a database user with
superuser
permissions is used. See the Postgres Permissions section for a more granular approach to permissions.
Use --help
The PgOSM Docker image can provide command line help.
The Python script that controls PgOSM Flex's behavior is built using the
click
module, providing built-in --help
.
Use docker exec
to show the full help.
docker exec -it pgosm python3 docker/pgosm_flex.py --help
The first portion of the --help
output is shown here.
Usage: pgosm_flex.py [OPTIONS]
Run PgOSM Flex within Docker to automate osm2pgsql flex processing.
Options:
--ram FLOAT Amount of RAM in GB available on the machine
running the Docker container. This is used to
determine the appropriate osm2pgsql command via
osm2pgsql-tuner recommendation engine. [required]
--region TEXT Region name matching the filename for data sourced
from Geofabrik. e.g. north-america/us. Optional
when --input-file is specified, otherwise
required.
--subregion TEXT Sub-region name matching the filename for data
sourced from Geofabrik. e.g. district-of-columbia
--data-only When set, skips running Sqitch and importing QGIS
Styles.
Layersets
A layerset in PgOSM Flex defines one or more layers, where each layer
includes one or more tables. For example, the
minimal
layerset (see flex-config/layerset/minimal.ini)
is defined as shown in the following snippet.
[layerset]
place=true
poi=true
road_major=true
In the above example, place
, poi
and road_major
are the included
Layers. This results in nine (9) total tables being loaded.
There is the standard
meta table osm.pgosm_flex
, plus eight (8)
tables for the three (3) layers. The place layer has four tables,
poi has three (3) and road major has one (1).
┌────────┬──────────────────────┬───────┬───────────────────┐
│ s_name │ t_name │ rows │ size_plus_indexes │
╞════════╪══════════════════════╪═══════╪═══════════════════╡
│ osm │ pgosm_flex │ 1 │ 32 kB │
│ osm │ place_line │ 128 │ 168 kB │
│ osm │ place_point │ 124 │ 128 kB │
│ osm │ place_polygon │ 217 │ 496 kB │
│ osm │ place_polygon_nested │ 22 │ 304 kB │
│ osm │ poi_line │ 255 │ 128 kB │
│ osm │ poi_point │ 10876 │ 2360 kB │
│ osm │ poi_polygon │ 12413 │ 6456 kB │
│ osm │ road_major │ 8097 │ 2504 kB │
└────────┴──────────────────────┴───────┴───────────────────┘
Included layersets
PgOSM Flex includes a few layersets to get started as examples.
These layersets are defined under flex-config/layerset/
.
If the --layerset
is not defined, the default
layerset is used.
basic
default
everything
minimal
Using a built-in layerset other than default
is done by defining
the --layerset
option. The following example uses the minimal
layerset
shown above.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--layerset=minimal \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
The output from running PgOSM Flex indicates which layers are being loaded.
2023-01-29 08:47:12,191:INFO:pgosm-flex:helpers:Including place
2023-01-29 08:47:12,192:INFO:pgosm-flex:helpers:Including poi
2023-01-29 08:47:12,192:INFO:pgosm-flex:helpers:Including road_major
Custom layerset
A layerset including the poi
and road_major
layers would look
like:
[layerset]
poi=true
road_major=true
To use the --layerset-path
option for custom layerset
definitions, link the directory containing custom styles
to the Docker container in the docker run
command.
If the custom-layerset
directory is in the home (~
) directory, adding
-v ~/custom-layerset:/custom-layerset \
to the docker run
command will make the layerset definition available to the Docker container.
The custom styles will be available inside the container under
/custom-layerset
.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-v ~/custom-layerset:/custom-layerset \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex
Define the layerset name (--layerset=poi
) and path
(--layerset-path
) to the docker exec
command.
The value provided to --layerset-path
must match the path linked in the
docker exec
command.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--layerset=poi \
--layerset-path=/custom-layerset/ \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
Excluding layers
To exclude layers from a layerset they can be simply omitted from the
.ini
file. They can also be set explicitly to false
such as road_major=false
.
Layers
This section documents the layers created by PgOSM Flex. The
layerset defined at runtime (to docker exec
)
determines which tables are loaded, based on layer_group
.
The amenity
layer has each of the three types of geometry
commonly associated, so has three tables:
osm.amenity_line
osm.amenity_point
osm.amenity_polygon
The definitive answer to "what is in a layer" is defined by the
associated Lua code under flex-config/style/<layer group>.lua
Layer definitions
The layers are determined by the .lua
files available
in the flex-config/style
directory. Each .lua
file in the style
folder has a matching .sql
file in the flex-config/sql
directory. For example,
the road layer is defined by flex-config/style/road.lua
and
flex-config/sql/road.sql
, and creates three (3) tables (see Tables section).
Tables
Using --layerset=everything
creates 42 tables, 2 views, and 2
materialized views. The following table lists the groups of
tables created with the types of layer it is.
Layer | Geometry Types |
---|---|
amenity | line, point, polygon |
building | point, polygon |
indoor | line, point, polygon |
infrastructure | line, point, polygon |
landuse | point, polygon |
leisure | point, polygon |
natural | line, point, polygon |
place | line, point, polygon, polygon_nested |
poi | line, point, polygon |
public_transport | line, point, polygon |
road | line, point, polygon |
road_major | line (table name is non-standard, osm.road_major ) |
shop | point, polygon |
tags | Provides full JSONB tags |
traffic | line, point, polygon |
unitable | generic geometry |
water | line, point, polygon |
Inclusion by OpenStreetMap tags
Data is included in layers based on the tags from OpenStreetMap.
Amenity
OpenStreetMap tags included:
- amenity
- bench
- brewery
Building
OpenStreetMap tags included:
- building
- building:part
- door
- office
Plus: Address only locations.
See issue #97 for more details about Address only locations.
Indoor
OpenStreetMap tags included:
- indoor
- door
- entrance
Infrastructure
OpenStreetMap tags included:
- aeroway
- amenity
- emergency
- highway
- man_made
- power
- utility
Landuse
OpenStreetMap tags included:
- landuse
Leisure
OpenStreetMap tags included:
- leisure
Natural
OpenStreetMap tags included:
- natural
Excludes water/waterway values. See Water section.
Place
OpenStreetMap tags included:
- admin_level
- boundary
- place
POI (Points of Interest)
The POI layer overlaps many of the other existing layers, though with slightly different definitions. e.g. only buildings with either a name and/or operator are included.
OpenStreetMap tags included:
- building
- shop
- amenity
- leisure
- man_made
- tourism
- landuse
- natural
- historic
Public Transport
OpenStreetMap tags included:
- public_transport
- aerialway
- railway
Additional important tags considered, but not used for primary selection:
- bus
- railway
- lightrail
- train
- highway
Road
OpenStreetMap tags included:
- highway
Additional important tags considered, but not used for primary selection:
- maxspeed
- layer
- tunnel
- bridge
- access
- oneway
Shop
OpenStreetMap tags included:
- shop
- amenity
Tags
The osm.tags
table stores all tags for all items loaded.
Traffic
OpenStreetMap tags included:
- highway
- railway
- barrier
- traffic_calming
- amenity
- noexit
Unitable
All data is stuffed into a generic GEOMETRY
column.
Water
OpenStreetMap tags included:
- natural
- waterway
Uses specific natural
types, attempts to avoid overlap
with the Natural layer. See the Natural section.
Views
The need for views is diminishing as PgOSM Flex matures along with osm2pgsql's Flex output.
The materialized view that will likely remain is:
osm.vplace_polygon_subdivide
The other views currently created in PgOSM Flex 0.8.x will be removed in v0.9.0, see issue #320.
osm.vbuilding_all
osm.vpoi_all
osm.vshop_all
Configure Postgres inside Docker
Add customizations with the -c
switch, e.g. -c shared_buffers=1GB
,
to customize Postgres' configuration at run-time in Docker.
See the osm2pgsql documentation
for recommendations on a server with 64 GB of RAM.
This docker run
command has been tested with 16GB RAM and 4 CPU (8 threads) with the Colorado
subregion. Configuring Postgres in-Docker runs 7-14% faster than the default
Postgres in-Docker configuration.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex \
-c shared_buffers=512MB \
-c work_mem=50MB \
-c maintenance_work_mem=4GB \
-c checkpoint_timeout=300min \
-c max_wal_senders=0 -c wal_level=minimal \
-c max_wal_size=10GB \
-c checkpoint_completion_target=0.9 \
-c random_page_cost=1.0
The docker exec
command used for the timings.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=colorado \
--layerset=basic \
--pgosm-date=2021-10-08
Monitoring the import
You can track the query activity in the database being loaded using the
pg_stat_activity
view from pg_catalog
. Database connections use
application_name = 'pgosm_flex'
.
SELECT *
FROM pg_catalog.pg_stat_activity
WHERE application_name = 'pgosm-flex'
;
Processing Time
The purpose of this page is to provide a rough guideline of what to expect for how long PgOSM Flex processing will take. Two server sizes are used for this testing hosted by Digital Ocean. The larger size server has 8 vCPU and 64 GB RAM to match the target server size outlined in the osm2pgsql manual. The current matching Digital Ocean resource class is the Memory-Optimized with dedicated CPU resources. This comes with a 200 GB SSD. The cost for this class of instance is $0.500 / hour, or $336 / month. A good number of production Postgres instances can run on this hardware.
The smaller server size is a budget friendly 2 AMD vCPU and 2 GB RAM on shared CPU resources. The cost for this class of instance is $0.031 / hour, or $21 / month.
Versions Tested
Versions used for testing: PgOSM Flex 0.7.1 Docker image, based on the official PostGIS image with Postgres 15.2 / PostGIS 3.3. osm2pgsql 1.8.1.
Note: Postgres 15 made GIST indexes faster to create. Using an version prior to Postgres 14 will likely take longer.
Methodology
Create instance, Ubuntu 22.04.
sudo apt update \
&& sudo apt upgrade -y \
&& sudo apt autoremove -y \
&& sudo apt install -y apt-transport-https ca-certificates curl software-properties-common \
&& curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null \
&& sudo apt update \
&& sudo apt install docker-ce \
&& sudo reboot -h now
The timing for the first docker exec
for each region was discarded as
it included the timing for downloading the PBF file.
Timings are an average of multiple recorded test runs over more than one day.
For example, the Norway region for the minimal
layerset had two times: 5 min 35 seconds
and 5 minutes 37 seconds for an average of 5 minutes 36 seconds.
Time for the import step is reported using the Linux time
command on the docker exec
step as shown in the following commands.
PostGIS Size
reported is according to the meta-data in Postgres using this query.
SELECT d.oid, d.datname AS db_name,
pg_size_pretty(pg_database_size(d.datname)) AS db_size
FROM pg_catalog.pg_database d
WHERE d.datname = current_database()
Commands
Set environment variables and start pgosm
Docker container with configurations
set per the osm2pgsql tuning guidelines.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex:0.7.1 \
-c shared_buffers=1GB \
-c work_mem=50MB \
-c maintenance_work_mem=10GB \
-c autovacuum_work_mem=2GB \
-c checkpoint_timeout=300min \
-c max_wal_senders=0 -c wal_level=minimal \
-c max_wal_size=10GB \
-c checkpoint_completion_target=0.9 \
-c random_page_cost=1.0 \
-c full_page_writes=off \
-c fsync=off
WARNING: Setting
full_page_writes=off
andfsync=off
is part of the expert tuning for the best possible performance. This is deemed acceptable in this Docker container running--rm
, obviously this container will be discarded immediately after processing. DO NOT use these configurations unless you understand and accept the risks of corruption.
Run PgOSM Flex within Docker. The first run time is discarded because the first run time includes time downloading the PBF file. Subsequent runs only include the time running the processing.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=64 \
--region=north-america/us \
--subregion=colorado \
--layerset=minimal
Layerset: Minimal
The minimal
layer set only loads major roads, places, and POIs.
Timings with nested admin polygons and dumping the processed data to a .sql
file.
Sub-region | PBF Size | PostGIS Size | .sql Size | Import Time |
---|---|---|---|---|
District of Columbia | 18 MB | 36 MB | 14 MB | 15.3 sec |
Colorado | 226 MB | 181 MB | 129 MB | 1 min 23 sec |
Norway | 1.1 GB | 618 MB | 489 MB | 5 min 36 sec |
North America | 12 GB | 9.5 GB | 7.7 GB | 3.03 hours |
Timings skipping nested admin polygons the dump to .sql
. This adds
--skip-dump --skip-nested
to the docker exec process
. The following
table compares the import time using these skips against the full times reported
above.
Sub-region | Import Time (full) | Import Time (skips) |
---|---|---|
District of Columbia | 15.3 sec | 15.0 sec |
Colorado | 1 min 23 sec | 1 min 21 sec |
Norway | 5 min 36 sec | 5 min 12 sec |
North America | 3.03 hours | 1.25 hours |
Layerset: Default
The default
layer set....
Timings with nested admin polygons and dumping the processed data to a .sql
file.
Sub-region | PBF Size | PostGIS Size | .sql Size | Import Time |
---|---|---|---|---|
District of Columbia | 18 MB | 212 MB | 160 MB | 53 sec |
Colorado | 226 MB | 2.1 GB | 1.9 GB | 8 min 20 sec |
Norway | 1.1 GB | 7.2 GB | 6.5 GB | 33 min 44 sec |
North America | 12 GB | 98 GB | 55 GB | 8.78 hours |
Timings skipping nested admin polygons the dump to .sql
. This adds
--skip-dump --skip-nested
to the docker exec process
. The following
table compares the import time using these skips against the full times reported
above.
Sub-region | Import Time (full) | Import Time (skips) |
---|---|---|
District of Columbia | 53 sec | 51 sec |
Colorado | 8 min 20 sec | 7 min 55 sec |
Norway | 33 min 44 sec | 32 min 18 sec |
North America | 8.78 hours | 6.58 hours |
Querying with PgOSM Flex
Nested admin polygons
Nested admin polygons are stored in the table osm.place_polygon_nested
.
The osm.build_nested_admin_polygons()
to populate the table is defined in flex-config/place.sql
,
the Docker process automatically runs it.
Can run quickly on small areas (Colorado), takes significantly longer on larger
areas (North America).
The Python script in the Docker image has a --skip-nested
option to skip
running the function to populate the table. It can always be populated
at a later time manually using the function.
CALL osm.build_nested_admin_polygons();
When this process is running for a while it can be monitored with this query.
SELECT COUNT(*) AS row_count,
COUNT(*) FILTER (WHERE nest_level IS NOT NULL) AS rows_processed
FROM osm.place_polygon_nested
;
Quality Control Queries
Features not Loaded
The process of selectively load specific features and not others always has the chance of accidentally missing important data.
Running and examine tags from the SQL script db/qc/features_not_in_run_all.sql
.
Run within psql
(using \i db/qc/features_not_in_run_all.sql
) or a GUI client
to explore the temp table used to return the aggregated results, osm_missing
.
The table is a TEMP TABLE
so will disappear when the session ends.
Example results from initial run (v0.0.4) showed some obvious omissions from the current layer definitions.
┌────────────────────────────────────────┬────────┐
│ jsonb_object_keys │ count │
╞════════════════════════════════════════╪════════╡
│ landuse │ 110965 │
│ addr:street │ 89482 │
│ addr:housenumber │ 89210 │
│ name │ 47151 │
│ leisure │ 25351 │
│ addr:state │ 19051 │
│ power │ 16933 │
│ addr:unit │ 13973 │
│ building:part │ 13773 │
│ golf │ 13427 │
│ railway │ 13032 │
│ addr:city │ 12426 │
│ addr:postcode │ 12358 │
│ height │ 12113 │
│ building:colour │ 11124 │
│ roof:colour │ 11115 │
Unroutable routes
The helpers.lua
methods are probably not perfect.
routable_foot()
routable_cycle()
routable_motor()
SELECT * FROM osm.road_line
WHERE NOT route_foot AND NOT route_motor AND NOT route_cycle
;
Not all rows returned are errors.
highway = 'construction'
is not necessarily determinate...
Relations missing from unitable
SELECT t.*
FROM osm.tags t
WHERE t.geom_type = 'R'
AND NOT EXISTS (
SELECT 1
FROM osm.unitable u
WHERE u.geom_type = t.geom_type AND t.osm_id = u.osm_id
);
Points of Interest (POIs)
PgOSM Flex loads an range of tags into a materialized view (osm.poi_all
) for
easily searching POIs.
Line and polygon data is forced to point geometry using
ST_Centroid()
. This layer duplicates a bunch of other more specific layers
(shop, amenity, etc.) to provide a single place for simplified POI searches.
Special layer included by layer sets run-all
and run-no-tags
.
See style/poi.lua
for logic on how to include POIs.
The topic of POIs is subject and likely is not inclusive of everything that probably should be considered
a POI. If there are POIs missing
from this table please submit a new issue
with sufficient details about what is missing.
Pull requests also welcome! See the contributing section
for more information.
Counts of POIs by osm_type
.
SELECT osm_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY osm_type
ORDER BY COUNT(*) DESC;
Results from Washington D.C. subregion (March 2020).
┌──────────┬───────┐
│ osm_type │ count │
╞══════════╪═══════╡
│ amenity │ 12663 │
│ leisure │ 2701 │
│ building │ 2045 │
│ shop │ 1739 │
│ tourism │ 729 │
│ man_made │ 570 │
│ landuse │ 32 │
│ natural │ 19 │
└──────────┴───────┘
Includes Points (N
), Lines (L
) and Polygons (W
).
SELECT geom_type, COUNT(*)
FROM osm.vpoi_all
GROUP BY geom_type
ORDER BY COUNT(*) DESC;
┌───────────┬───────┐
│ geom_type │ count │
╞═══════════╪═══════╡
│ W │ 10740 │
│ N │ 9556 │
│ L │ 202 │
└───────────┴───────┘
Meta table
PgOSM Flex tracks processing metadata in the osm.pgosm_flex
table. The initial import
has osm2pgsql_mode = 'create'
, the subsequent update has
osm2pgsql_mode = 'append'
.
SELECT osm_date, region, srid,
pgosm_flex_version, osm2pgsql_version, osm2pgsql_mode
FROM osm.pgosm_flex
;
┌────────────┬───────────────────────────┬──────┬────────────────────┬───────────────────┬────────────────┐
│ osm_date │ region │ srid │ pgosm_flex_version │ osm2pgsql_version │ osm2pgsql_mode │
╞════════════╪═══════════════════════════╪══════╪════════════════════╪═══════════════════╪════════════════╡
│ 2022-11-04 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ create │
│ 2022-11-25 │ north-america/us-colorado │ 3857 │ 0.6.2-e1f140f │ 1.7.2 │ append │
└────────────┴───────────────────────────┴──────┴────────────────────┴───────────────────┴────────────────┘
Routing with PgOSM Flex
This page provides a simple example of using OpenStreetMap roads
loaded with PgOSM Flex for routing.
The example uses the D.C. PBF included under tests/data/
.
cd ~/pgosm-data
wget https://github.com/rustprooflabs/pgosm-flex/raw/main/tests/data/district-of-columbia-2021-01-13.osm.pbf
wget https://github.com/rustprooflabs/pgosm-flex/raw/main/tests/data/district-of-columbia-2021-01-13.osm.pbf.md5
Loaded using docker exec
command below for specific date.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--pgosm-date=2021-01-13
Prepare data
Create the pgrouting
extension.
CREATE EXTENSION IF NOT EXISTS pgrouting;
CREATE SCHEMA IF NOT EXISTS routing;
Prepare roads for routing using pgrouting functions.
CREATE TABLE routing.road_line AS
WITH a AS (
SELECT osm_id, osm_type, maxspeed, oneway, layer,
route_foot, route_cycle, route_motor, access,
ST_LineMerge(geom) AS geom
FROM osm.road_line
), extra_cleanup AS (
SELECT osm_id, osm_type, maxspeed, oneway, layer,
route_foot, route_cycle, route_motor, access,
(ST_Dump(geom)).geom AS geom
FROM a
WHERE ST_GeometryType(geom) = 'ST_MultiLineString'
), combined AS (
SELECT osm_id, osm_type, maxspeed, oneway, layer,
route_foot, route_cycle, route_motor, access,
geom
FROM a
WHERE ST_GeometryType(geom) != 'ST_MultiLineString'
UNION
SELECT osm_id, osm_type, maxspeed, oneway, layer,
route_foot, route_cycle, route_motor, access,
geom
FROM extra_cleanup
WHERE ST_GeometryType(geom) != 'ST_MultiLineString'
)
SELECT ROW_NUMBER() OVER (ORDER BY geom) AS id, *
FROM combined
ORDER BY geom
;
SELECT pgr_nodeNetwork('routing.road_line', .1, 'id', 'geom');
SELECT pgr_createTopology('routing.road_line_noded', 0.1, 'geom');
SELECT pgr_analyzeGraph('routing.road_line_noded', 0.1, 'geom');
These commands create two (2) new tables usable by pgrouting.
routing.road_line_noded
routing.road_line_noded_vertices_pgr
Add simple cost_length
column to the routing.road_line_noded
table
as a generated column to use for routing costs.
ALTER TABLE routing.road_line_noded
ADD cost_length DOUBLE PRECISION NOT NULL
GENERATED ALWAYS AS (ST_Length(geom))
STORED;
Note: This is for non-directional routing. See the Routing
oneway
section below for more on directional routing.
Start/end points
The following query identifies the vertex IDs for a start and end point to use
for later queries. Use the start_id
and end_id
values from this query
in subsequent queries.
WITH s_point AS (
SELECT v.id AS start_id
FROM routing.road_line_noded_vertices_pgr v
INNER JOIN (SELECT
ST_Transform(ST_SetSRID(ST_MakePoint(-77.0211, 38.92245), 4326), 3857)
AS geom
) p ON v.the_geom <-> geom < 10
ORDER BY v.the_geom <-> geom
LIMIT 1
), e_point AS (
SELECT v.id AS end_id
FROM routing.road_line_noded_vertices_pgr v
INNER JOIN (SELECT
ST_Transform(ST_SetSRID(ST_MakePoint(-77.0183, 38.9227), 4326), 3857)
AS geom
) p ON v.the_geom <-> geom < 10
ORDER BY v.the_geom <-> geom
LIMIT 1
)
SELECT s_point.start_id, e_point.end_id
FROM s_point, e_point
;
┌──────────┬────────┐
│ start_id │ end_id │
╞══════════╪════════╡
│ 14630 │ 14686 │
└──────────┴────────┘
Picked vertex IDs 14630
and 14686
, they span a particular segment
of road that is tagged as highway=residential
and access=private
.
This was picked to illustrate how the calculated access control columns, route_motor
, route_cycle
and route_foot
,
can influence route selection.
Note: The vertex IDs in my test database will not necessary match the vertex IDs in your database!
SELECT *
FROM osm.road_line
WHERE osm_id = 6062791
;
See
flex-config/helpers.lua
functions (e.g.routable_motor()
) for logic behind access control columns.
Simple route
Using pgr_dijkstra()
and no additional filters will
use all roads from OpenStreetMap without regard to mode of travel
or access rules.
This query picks a route that traverses the access=private
section
of road.
SELECT d.*, n.the_geom AS node_geom, e.geom AS edge_geom
FROM pgr_dijkstra(
'SELECT id, source, target, cost_length AS cost,
geom
FROM routing.road_line_noded
',
14630, 14686, directed := False
) d
INNER JOIN routing.road_line_noded_vertices_pgr n ON d.node = n.id
LEFT JOIN routing.road_line_noded e ON d.edge = e.id
;
Route motorized
The following query modifies the query passed in to pgr_dijkstra()
to join to osm.road_line
. The join clause includes a filter on
the route_motor
column.
SELECT d.*, n.the_geom AS node_geom, e.geom AS edge_geom
FROM pgr_dijkstra(
'SELECT n.id, n.source, n.target, n.cost_length AS cost,
n.geom
FROM routing.road_line_noded n
INNER JOIN routing.road_line r ON n.old_id = r.id
AND r.route_motor
',
14630, 14686, directed := False
) d
INNER JOIN routing.road_line_noded_vertices_pgr n ON d.node = n.id
LEFT JOIN routing.road_line_noded e ON d.edge = e.id
;
Route oneway
The oneway
column in the road tables uses
osm2pgsql's direction
data type which resolves to int2
in Postgres.
Valid values are:
0
: Not one way1
: One way, forward travel allowed-1
: One way, reverse travel allowedNULL
: It's complicated. See #172.
Assuming a noded roads table routing table, bring over the oneway
detail
ALTER TABLE routing.road_line_noded
ADD oneway INT2 NULL
;
UPDATE routing.road_line_noded rn
SET oneway = r.oneway
FROM routing.road_line r
WHERE rn.old_id = r.id AND rn.oneway IS NULL
;
Forward and reverse costs
Calculate forward cost.
ALTER TABLE routing.road_line_noded
DROP COLUMN IF EXISTS cost_length
;
-- Cost with oneway considerations
ALTER TABLE routing.road_line_noded
ADD cost_length NUMERIC
GENERATED ALWAYS AS (
CASE WHEN oneway IN (0, 1)
THEN ST_Length(geom)
WHEN oneway = -1
THEN -1 * ST_Length(geom)
END
)
STORED
;
Reverse cost.
-- Reverse cost with oneway considerations
ALTER TABLE routing.road_line_noded
ADD cost_length_reverse NUMERIC
GENERATED ALWAYS AS (
CASE WHEN oneway IN (0, -1)
THEN ST_Length(geom)
WHEN oneway = 1
THEN -1 * ST_Length(geom)
END
)
STORED
;
New undirected route
Found new start point slightly to the north that will require one-way aware for proper routing. The route from this query incorrectly goes the wrong way on 6th Street Northwest.
SELECT d.*, n.the_geom AS node_geom, e.geom AS edge_geom
FROM pgr_dijkstra(
'SELECT n.id, n.source, n.target, n.cost_length AS cost,
n.cost_length_reverse AS reverse_cost,
n.geom
FROM routing.road_line_noded n
INNER JOIN osm.road_line r
ON n.old_id = r.id
AND route_motor
AND n.cost_length IS NOT NULL
',
14624, 14686,
directed := False
) d
INNER JOIN routing.road_line_noded_vertices_pgr n ON d.node = n.id
LEFT JOIN routing.road_line_noded e ON d.edge = e.id
;
Directed
Route, now directed. This respects the one-way rules.
SELECT d.*, n.the_geom AS node_geom, e.geom AS edge_geom
FROM pgr_dijkstra(
'SELECT n.id, n.source, n.target, n.cost_length AS cost,
n.cost_length_reverse AS reverse_cost,
n.geom
FROM routing.road_line_noded n
INNER JOIN routing.road_line r
ON n.old_id = r.id
AND route_motor
AND n.cost_length IS NOT NULL
',
14624, 14686,
directed := True
) d
INNER JOIN routing.road_line_noded_vertices_pgr n ON d.node = n.id
LEFT JOIN routing.road_line_noded e ON d.edge = e.id
;
Postgres permissions for PgOSM Flex
The quick start section showed how to get up and running using the Postgres instance within the PgOSM Flex docker image. Many production usage cases of PgOSM Flex prefer to connect the PgOSM Flex processing within Docker directly to an already running Postgres instances.
The first step to using PgOSM Flex with your own Postgres instance is to have a database already created, and a login role with proper permissions. The steps in this page prepare for the steps outlined in the Using External Postgres Connection section.
Create database and PostGIS
These first steps require elevated permissions within Postgres.
CREATE DATABASE
requires the CREATEDB
permission.
Creating the PostGIS extension requires
Postgres superuser permissions.
In the target Postgres instance, create your database.
CREATE DATABASE your_db_name;
Connect to your_db_name
and create the PostGIS extension.
This is done along with the CREATE DATABASE
since both steps
require the superuser role.
CREATE EXTENSION postgis;
Runtime permissions
Your target database needs to have an osm
schema and the database user
requires the ability to create and populate tables in osm
.
The following commands show one approach to granting permissions required for PgOSM Flex to run on an external database. Do not simply run this assuming this is the proper approach for your database security!
CREATE ROLE pgosm_flex WITH LOGIN PASSWORD 'mysecretpassword';
CREATE SCHEMA osm AUTHORIZATION pgosm_flex;
GRANT CREATE ON DATABASE your_db_name
TO pgosm_flex;
GRANT CREATE ON SCHEMA public
TO pgosm_flex;
These permissions should allow the full PgOSM Flex process to run.
GRANT CREATE ON DATABASE
is required to allow the sqitch process to run and create the pgosm
schema.
GRANT CREATE ON SCHEMA public
is required for Postgres 15 and newer.
Reduced permissions
GRANT CREATE
gives the pgosm_flex
role far more permissions than
it really needs in many cases.
Running docker exec
with --data-only
skips these steps and would make the GRANT CREATE
permission unnecessary for the pgosm_flex
role.
It also is often desired to not make a login role the owner of database objects. This example reduces the scope of permissions.
CREATE ROLE pgosm_flex;
CREATE SCHEMA osm AUTHORIZATION pgosm_flex;
GRANT pgosm_flex TO your_login_role;
Using External Postgres Connection
Prepare the database and permissions as described in Postgres Permissions.
Set environment variables to define the connection. Create a file with the configuration options.
touch ~/.pgosm-db-myproject
chmod 0700 ~/.pgosm-db-myproject
nano ~/.pgosm-db-myproject
Put in the contents specific to your database connection.
export POSTGRES_USER=your_login_role
export POSTGRES_PASSWORD=mysecretpassword
export POSTGRES_HOST=your-host-or-ip
export POSTGRES_DB=your_db_name
export POSTGRES_PORT=5432
Env vars can be loaded using source
.
source ~/.pgosm-db-myproject
Run the container with the additional environment variables.
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_USER=$POSTGRES_USER \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-e POSTGRES_HOST=$POSTGRES_HOST \
-e POSTGRES_DB=$POSTGRES_DB \
-e POSTGRES_PORT=$POSTGRES_PORT \
-p 5433:5432 -d rustprooflabs/pgosm-flex
The docker exec
command is the same as when using the internal Postgres instance.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia
Notes
The POSTGRES_HOST
value is in relation to the Docker container.
Using localhost
refers to the Docker container and will use the Postgres instance
within the Docker container, not your host running the Docker container.
Use ip addr
to find your local host's IP address and provide that.
Setting POSTGRES_HOST
to anything but localhost
disables the drop/create database step. This means the target database must be created prior to running PgOSM Flex.
Stay Updated with Replication
The --replication
option of PgOSM Flex enables osm2pgsql-replication
to provide an easy and quick way to keep your OpenStreetMap data refreshed.
The
--replication
mode is stable as of 0.7.0. It was added as an experimental feature in 0.4, originally under the--append
option.
PgOSM Flex's --replication
mode wraps around the osm2pgsql-replication
package
included with osm2pgsql
. The first time running an import with --replication
mode runs osm2pgsql normally, with --slim
mode and without --drop
.
After osm2pgsql completes, osm2pgsql-replication init ...
is ran to setup
the DB for updates.
This mode of operation results in larger database as the intermediate osm2pgsql
tables (--slim
) must be left in the database (no --drop
).
Important: The original
--append
option is now under--replication
. The--append
option was removed in PgOSM Flex 0.7.0. See #275 for context.
When using replication you need to pin your process to a specific PgOSM Flex version
in the docker run
command. When upgrading to new versions,
be sure to check the release notes for manual upgrade steps for --replication
.
The release notes for
PgOSM Flex 0.6.1
are one example.
The notes discussed in the release notes have reference SQL scripts
under db/data-migration
folder.
WARNING - Due to the ability to configure custom layersets these data-migration scripts need manual review, and possibly manual adjustments for your specific database and process.
The other important change when using replication is to increase Postgres' max_connections
.
See this discussion on osm2pgsql
for why this is necessary.
If using the Docker-internal Postgres instance this is done with -c max_connections=300
in the docker run
command. External database connections must update this
in the appropriate postgresql.conf
file.
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 \
-d rustprooflabs/pgosm-flex:0.8.0 \
-c max_connections=300
Run the docker exec
step with --replication
.
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--pgosm-date 2022-12-30 \
--replication
Running the above command a second time will detect that the target database
has osm2pgsql-replication
setup and load data via the defined replication
service.
Note: The
--pgosm-date
parameter is ignored during subsequent imports using--replication
.
PgOSM Flex Update Mode
Running in experimental Update mode enables using osm2pgsql's --append
option.
Note: This is not the
--append
option that existed in PgOSM Flex 0.6.3 and prior.
Testing steps
Important -- Needs higher max connections!
docker stop pgosm && docker build -t rustprooflabs/pgosm-flex .
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex \
-c max_connections=300
Run fresh import w/ D.C. using --update create
. This ensures osm2pgsql
uses --slim
w/out --drop
. Tested from commit 672d9fd
.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--skip-nested --skip-dump \
--update create
...
2022-12-27 09:02:37,654:INFO:pgosm-flex:helpers:PgOSM-Flex version: 0.6.3 672d9fd
...
real 0m43.904s
user 0m0.020s
sys 0m0.012s
Run with a second sub-region using --update append
.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=new-hampshire \
--skip-nested --skip-dump \
--update append
...
2022-12-27 10:14:26,792:INFO:pgosm-flex:helpers:2022-12-27 10:14:26 osm2pgsql took 1420s (23m 40s) overall.
2022-12-27 10:14:26,832:INFO:pgosm-flex:pgosm_flex:osm2pgsql completed
2022-12-27 10:14:26,832:INFO:pgosm-flex:pgosm_flex:Running with --update append: Skipping post-processing SQL
2022-12-27 10:14:26,832:INFO:pgosm-flex:pgosm_flex:Skipping pg_dump
2022-12-27 10:14:26,832:INFO:pgosm-flex:pgosm_flex:PgOSM Flex complete!
real 23m47.564s
user 0m0.083s
sys 0m0.025s
It seems to work, new output at the end: Skipping post-processing SQL
.
Verified that both New Hampshire and D.C. regions were loaded in osm.place_polygon
.
Smaller test
Put the following into ~/pgosm-data/extracts/colorado-extract.json
:
{
"directory": "/home/ryanlambert/pgosm-data/",
"extracts": [
{
"output": "colorado-boulder-latest.osm.pbf",
"description": "Area extracted around Boulder, Colorado",
"bbox": {
"left": -105.30,
"right": -105.20,
"top": 40.07,
"bottom": 39.98
}
},
{
"output": "colorado-longmont-latest.osm.pbf",
"description": "Area extracted around Longmont, Colorado",
"bbox": {
"left": -105.15,
"right": -105.05,
"top": 40.21,
"bottom": 40.12
}
}
]
}
Create Boulder and Longmont extracts.
osmium extract -c extracts/colorado-extracts.json colorado-2022-12-27.osm.pbf
ryanlambert@tag201:~/pgosm-data$ ls -alh | grep boulder
-rw-rw-r-- 1 ryanlambert ryanlambert 2.4M Dec 27 14:31 colorado-boulder-latest.osm.pbf
ryanlambert@tag201:~/pgosm-data$ ls -alh | grep longmont
-rw-rw-r-- 1 ryanlambert ryanlambert 988K Dec 27 14:31 colorado-longmont-latest.osm.pbf
Takes 11 seconds.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=colorado-longmont --input-file colorado-longmont-latest.osm.pbf \
--skip-dump --update create
Takes 2 minutes.
time docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=colorado-boulder --input-file colorado-boulder-latest.osm.pbf \
--skip-dump --update append
QGIS Styles for PgOSM Flex
If you use QGIS to visualize OpenStreetMap, there are a few basic
styles using the public.layer_styles
table created by QGIS.
This data is loaded by default. Run PgOSM Flex with --data-only
to skip loading
this data.
QGIS can save its styling information directly in a table in the Postgres database
using a table public.layer_styles
.
SELECT f_table_catalog, f_table_schema, f_table_name, stylename,
useasdefault, description
FROM public.layer_styles
;
Projects using PgOSM Flex
Do you have a project using PgOSM Flex? Submit a PR with details!
- geoshiny - create highly customized maps - github repo
- eBook - Mastering PostGIS and OpenStreetMap by Ryan Lambert - website
- Geo Faker - documentation - github
PgOSM Flex Docker
This page outlines how and when images are built and pushed to Docker Hub.
Docker image background
The PgOSM Flex Docker image uses main Postgres image via the main PostGIS image as starting point. Those repositories have detailed instructions on using and customizing the core Postgres functionality.
Images on Docker Hub
There are three main types of images pushed to Docker Hub.
latest
x.x.x
dev
Which branch is best for you depends on how you use the data from PgOSM Flex.
When to use tagged (x.x.x
) release
Tagged releases are the most stable option and are recommended if you are
using --replication
or --update
mode.
Tagged releases are built with the latest versions of all key software, e.g. Postgres,
PostGIS and osm2pgsql, and their dependencies. These tagged images (e.g. 0.6.2
)
are typically built at the time the tag is added to GitHub, and are (typically)
not rebuilt.
PgOSM Flex is still evolving on a regular basis. This means new tagged releases are coming out as activity happens in the project.
When to use latest
If you run PgOSM Flex without --replication
or --update
mode this image is
generally stable and includes the latest features.
The latest
Docker image could include changes that
require manual changes in Postgres. Those changes are documented in the release notes,
for example, see "Notes for --append
users" in 0.6.1 release notes.
When to use dev
The dev
image exists when there's something worth testing. Typically the dev
image is deleted from Docker Hub as functionality is worked into the latest
image.
Building the image
Build latest. Occasionally run with --no-cache
to force some software updates.
docker build -t rustprooflabs/pgosm-flex .
Tag with version.
docker build -t rustprooflabs/pgosm-flex:0.8.0 .
Push to Docker Hub.
docker push rustprooflabs/pgosm-flex:latest
docker push rustprooflabs/pgosm-flex:0.8.0
Ensure updates
To be certain the latest images are being used and latest
software is installed, pull the latest PostGIS image and build
the PgOSM Flex image using --no-cache
.
docker pull postgis/postgis:15-3.3
docker build --no-cache -t rustprooflabs/pgosm-flex:dev .
Testing PgOSM Flex
The Makefile
at the root of this project tests many core aspects of
PgOSM Flex's functionality. It builds the Docker image, tests a few usage
scenarios (including --input-file
) and runs both Python unit tests
and Data Import tests. The data import tests verify row counts by
osm_type
and osm_subtype
of many tables.
Run all tests
To run all tests run make
from the project's root directory.
make
A simplified usage for quicker testing during development.
make docker-exec-default unit-tests
Python unit tests
The Python unit tests are under pgosm-flex/docker/tests/
. These tests use
Python's unittest
module. The make
process runs these using
coverage run ...
and coverage report ...
.
See the Makefile
for exact implementation.
Data import tests
Under pgosm-flex/tests
. The run-output-tests.sh
script is ran by
running make
. The script loops over the .sql
scripts under
pgosm-flex/tests/sql/
, runs the queries via psql
using
--no-psqlrc -tA
and compares the output from the query against the
expected output saved under pgosm-flex/tests/expected
.
FIXME: At this time the
run-extra-loads.sh
script is not ran automatically. There are not any usage notes covering those random side tests.
What is not tested
Functionality of osm2pgsql-replication
to actually update data. Challenge
is that to test this it requires having a recent .osm.pbf
file for the initial
import. Attempting to use the test D.C. file used for all other testing
(from January 13, 2021), the initial import works, however a subsequent
refresh fails.
2023-01-29 08:11:35,553:INFO:pgosm-flex:helpers:2023-01-29 08:11:35 [INFO]: Using replication service 'http://download.geofabrik.de/north-america/us/district-of-columbia-updates'. Current sequence 2856 (2021-01-13 14:42:03-07:00).
2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:Traceback (most recent call last):
2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 556, in <module>
2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:sys.exit(main())
2023-01-29 08:11:36,866:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 550, in main
2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:return args.handler(conn, args)
2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:File "/usr/local/bin/osm2pgsql-replication", line 402, in update
2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:endseq = repl.apply_diffs(outhandler, seq + 1,
2023-01-29 08:11:36,867:INFO:pgosm-flex:helpers:File "/usr/local/lib/python3.9/dist-packages/osmium/replication/server.py", line 177, in apply_diffs
2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:diffs = self.collect_diffs(start_id, max_size)
2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:File "/usr/local/lib/python3.9/dist-packages/osmium/replication/server.py", line 143, in collect_diffs
2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:left_size -= rd.add_buffer(diffdata, self.diff_type)
2023-01-29 08:11:36,868:INFO:pgosm-flex:helpers:RuntimeError: gzip error: inflate failed: incorrect header check
2023-01-29 08:11:36,890:WARNING:pgosm-flex:pgosm_flex:Failure. Return code: 1
2023-01-29 08:11:36,890:INFO:pgosm-flex:pgosm_flex:Skipping pg_dump
2023-01-29 08:11:36,890:WARNING:pgosm-flex:pgosm_flex:PgOSM Flex completed with errors. Details in output
Developing QGIS Styles
This page explains how to maintain QGIS layer styles.
Add/Update existing records
The QGIS table does not include UNIQUE
constraints, so using Postgres' UPSERT
is
not available by default.
Add new records from staging, based on object names.
INSERT INTO public.layer_styles
(f_table_catalog, f_table_schema, f_table_name,
f_geometry_column, stylename, styleqml, stylesld,
useasdefault, description, "owner", ui, update_time)
SELECT new.f_table_catalog, new.f_table_schema, new.f_table_name,
new.f_geometry_column, new.stylename, new.styleqml, new.stylesld,
new.useasdefault, new.description, new."owner", new.ui, new.update_time
FROM public.layer_styles_staging new
LEFT JOIN public.layer_styles ls
ON new.f_table_catalog = ls.f_table_catalog
AND new.f_table_schema = ls.f_table_schema
AND new.f_table_name = ls.f_table_name
AND new.stylename = ls.stylename
WHERE ls.id IS NULL
;
To update existing styles.
UPDATE public.layer_styles ls
SET f_geometry_column = new.f_geometry_column,
styleqml = new.styleqml,
stylesld = new.stylesld,
useasdefault = new.useasdefault,
description = new.description,
"owner" = new."owner",
ui = new.ui,
update_time = new.update_time
FROM public.layer_styles_staging new
WHERE new.f_table_catalog = ls.f_table_catalog
AND new.f_table_schema = ls.f_table_schema
AND new.f_table_name = ls.f_table_name
AND new.stylename = ls.stylename
;
Cleanup the staging table.
DELETE FROM public.layer_styles_staging;
Updating Style .sql
To update (or create new) the .sql file with styles.
Load into _staging
table so restoring the data puts it back in the same place.
Optionally add a WHERE
clause to only export certain styles.
You may want to update the owner
field.
INSERT INTO public.layer_styles_staging
SELECT * FROM public.layer_styles;
UPDATE public.layer_styles_staging
SET owner = 'rustprooflabs'
WHERE owner != 'rustprooflabs'
;
pg_dump --no-owner --no-privileges --data-only --table=public.layer_styles_staging \
-d pgosm \
-f layer_styles.sql
Cleanup the staging table.
DELETE FROM public.layer_styles_staging;
Troubleshoot errors in osm2pgsql processing
This section contains rough notes about how to troubleshoot errors in PgOSM Flex.
Reduce --ram
If you encounter an unusual failure during the osm2pgsql
step of PgOSM Flex,
try reducing the --ram
value. Choosing a --ram
option too high can cause
the process to fail with a variety of unexpected errors. If that isn't the problem,
continue reading.
Docker logs
Output such as this.
2023-02-26 22:14:31,760:INFO:pgosm-flex:helpers:Processing: Node(10k 10.0k/s) Way(0k 0.00k/s) Relation(0Processing: Node(84760k 277.9k/s) Way(0k 0.00k/s) Relation(0 0.0/s)
2023-02-26 22:14:31,774:ERROR:pgosm-flex:pgosm_flex:Failed to run osm2pgsql. Return code: -9
Failed to run osm2pgsql. Return code: -9 - Check the log output for details
Checking logs from Docker might shed light on issue.
docker logs pgosm
2023-02-26 22:14:31.777 UTC [114] LOG: incomplete message from client
2023-02-26 22:14:31.777 UTC [114] CONTEXT: COPY tags, line 1
2023-02-26 22:14:31.777 UTC [114] STATEMENT: COPY "osm"."tags" ("geom_type","osm_id","tags") FROM STDIN
2023-02-26 22:14:31.807 UTC [114] ERROR: unexpected EOF on client connection with an open transaction
2023-02-26 22:14:31.807 UTC [114] CONTEXT: COPY tags, line 1
2023-02-26 22:14:31.807 UTC [114] STATEMENT: COPY "osm"."tags" ("geom_type","osm_id","tags") FROM STDIN
2023-02-26 22:14:31.812 UTC [114] FATAL: terminating connection because protocol synchronization was lost
2023-02-26 22:14:31.822 UTC [114] LOG: could not send data to client: Broken pipe
Troubleshoot within Docker
Enter the docker container into /bin/bash
.
docker exec -it pgosm /bin/bash
Set environment variables required for PgOSM Flex's operation.
export PGOSM_CONN=postgresql://postgres:mysecretpassword@localhost:5432/pgosm?application_name=pgosm-flex
export PGOSM_REPLICATION=False
export PGOSM_IMPORT_UUID=this-is-not-a-real-uuid
export PGOSM_LAYERSET=minimal
Run osm2pgsql
manually. Start with a simple operation shown below,
consider adding adding -v
and/or --log-sql-data
to the osm2pgsql
command
to dig deeper.
osm2pgsql -d $PGOSM_CONN \
--create --output=flex --style=./run.lua \
/app/output/district-of-columbia-latest.osm.pbf
Configure more things
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
-v /etc/localtime:/etc/localtime:ro \
-e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
-p 5433:5432 -d rustprooflabs/pgosm-flex:0.8.0 \
-c shared_buffers=1GB \
-c work_mem=50MB \
-c maintenance_work_mem=10GB \
-c autovacuum_work_mem=2GB \
-c checkpoint_timeout=300min \
-c max_wal_senders=0 -c wal_level=minimal \
-c max_wal_size=10GB \
-c checkpoint_completion_target=0.9 \
-c random_page_cost=1.0 \
-c full_page_writes=off \
-c fsync=off \
-c log_statement=all \
-c log_duration=on