Processing Time

The purpose of this page is to provide a rough guideline of what to expect for how long PgOSM Flex processing will take. Two server sizes are used for this testing hosted by Digital Ocean. The larger size server has 8 vCPU and 64 GB RAM to match the target server size outlined in the osm2pgsql manual. The current matching Digital Ocean resource class is the Memory-Optimized with dedicated CPU resources. This comes with a 200 GB SSD. The cost for this class of instance is $0.500 / hour, or $336 / month. A good number of production Postgres instances can run on this hardware.

The smaller server size is a budget friendly 2 AMD vCPU and 2 GB RAM on shared CPU resources. The cost for this class of instance is $0.031 / hour, or $21 / month.

Versions Tested

Versions used for testing: PgOSM Flex 0.7.1 Docker image, based on the official PostGIS image with Postgres 15.2 / PostGIS 3.3. osm2pgsql 1.8.1.

Note: Postgres 15 made GIST indexes faster to create. Using an version prior to Postgres 14 will likely take longer.

Methodology

Create instance, Ubuntu 22.04.

sudo apt update \
    && sudo apt upgrade -y \
    && sudo apt autoremove -y \
    && sudo apt install -y apt-transport-https ca-certificates curl software-properties-common \
    && curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null \
    && sudo apt update \
    && sudo apt install docker-ce \
    && sudo reboot -h now

The timing for the first docker exec for each region was discarded as it included the timing for downloading the PBF file.

Timings are an average of multiple recorded test runs over more than one day. For example, the Norway region for the minimal layerset had two times: 5 min 35 seconds and 5 minutes 37 seconds for an average of 5 minutes 36 seconds.

Time for the import step is reported using the Linux time command on the docker exec step as shown in the following commands.

PostGIS Size reported is according to the meta-data in Postgres using this query.

SELECT d.oid, d.datname AS db_name,
        pg_size_pretty(pg_database_size(d.datname)) AS db_size
    FROM pg_catalog.pg_database d
    WHERE d.datname = current_database()

Commands

Set environment variables and start pgosm Docker container with configurations set per the osm2pgsql tuning guidelines.

export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=mysecretpassword

docker run --name pgosm -d --rm \
    -v ~/pgosm-data:/app/output \
    -v /etc/localtime:/etc/localtime:ro \
    -e POSTGRES_PASSWORD=$POSTGRES_PASSWORD \
    -p 5433:5432 -d rustprooflabs/pgosm-flex:0.7.1 \
    -c shared_buffers=1GB \
    -c work_mem=50MB \
    -c maintenance_work_mem=10GB \
    -c autovacuum_work_mem=2GB \
    -c checkpoint_timeout=300min \
    -c max_wal_senders=0 -c wal_level=minimal \
    -c max_wal_size=10GB \
    -c checkpoint_completion_target=0.9 \
    -c random_page_cost=1.0 \
    -c full_page_writes=off \
    -c fsync=off

WARNING: Setting full_page_writes=off and fsync=off is part of the expert tuning for the best possible performance. This is deemed acceptable in this Docker container running --rm, obviously this container will be discarded immediately after processing. DO NOT use these configurations unless you understand and accept the risks of corruption.

Run PgOSM Flex within Docker. The first run time is discarded because the first run time includes time downloading the PBF file. Subsequent runs only include the time running the processing.


time docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=64 \
    --region=north-america/us \
    --subregion=colorado \
    --layerset=minimal

Layerset: Minimal

The minimal layer set only loads major roads, places, and POIs.

Timings with nested admin polygons and dumping the processed data to a .sql file.

Sub-regionPBF SizePostGIS Size.sql SizeImport Time
District of Columbia18 MB36 MB14 MB15.3 sec
Colorado226 MB181 MB129 MB1 min 23 sec
Norway1.1 GB618 MB489 MB5 min 36 sec
North America12 GB9.5 GB7.7 GB3.03 hours

Timings skipping nested admin polygons the dump to .sql. This adds --skip-dump --skip-nested to the docker exec process. The following table compares the import time using these skips against the full times reported above.

Sub-regionImport Time (full)Import Time (skips)
District of Columbia15.3 sec15.0 sec
Colorado1 min 23 sec1 min 21 sec
Norway5 min 36 sec5 min 12 sec
North America3.03 hours1.25 hours

Layerset: Default

The default layer set....

Timings with nested admin polygons and dumping the processed data to a .sql file.

Sub-regionPBF SizePostGIS Size.sql SizeImport Time
District of Columbia18 MB212 MB160 MB53 sec
Colorado226 MB2.1 GB1.9 GB8 min 20 sec
Norway1.1 GB7.2 GB6.5 GB33 min 44 sec
North America12 GB98 GB55 GB8.78 hours

Timings skipping nested admin polygons the dump to .sql. This adds --skip-dump --skip-nested to the docker exec process. The following table compares the import time using these skips against the full times reported above.

Sub-regionImport Time (full)Import Time (skips)
District of Columbia53 sec51 sec
Colorado8 min 20 sec7 min 55 sec
Norway33 min 44 sec32 min 18 sec
North America8.78 hours6.58 hours