OpenStreetMap Data in Parquet Format

Efficiently managing and analyzing OpenStreetMap data

Working with OpenStreetMap (OSM) data can be challenging, especially when dealing with global datasets rather than regional exports. Traditional tools like osmium or osm2pgsql simplify some aspects of data handling but often fall short in areas such as processing complex way geometries, which can be resource-intensive. Furthermore, these tools don't always integrate seamlessly with modern data science ecosystems like Apache Spark, Polars, or DuckDB.

To address these challenges, we've transformed the native OSM XML-based data into a highly optimized Parquet format and made it available in an S3-compatible object storage. This approach significantly enhances compatibility with leading data lake technologies and popular data science software, streamlining your analytical workflows.

Accessing the Data

Option 1: Object Storage (S3-compatible)

Accessing our OSM Parquet datasets directly from an S3-compatible object storage is straightforward. You can connect using the following credentials:

Endpoint URL: https://object-store.geo-lake.com
Bucket name: data-lakehouse

For those using the AWS CLI, you can list the available Parquet files with this command:

aws s3 ls --endpoint-url https://object-store.geo-lake.com --no-sign-request s3://data-lakehouse/bronze/osm/parquet/

INSTALL httpfs;
LOAD httpfs;

SET s3_region='us-east-1';
SET s3_url_style='path';
SET s3_endpoint='object-store.geo-lake.com';

SELECT *
FROM read_parquet('s3://data-lakehouse/bronze/osm/parquet/node/version=2025-05-17T14:31:37Z/*.parquet')
WHERE id = 2480654035;

Polars (Python)

This Polars (Python) snippet illustrates how to efficiently load and collect OSM relation data directly from the S3 storage into a DataFrame:

import polars as pl

storage_options = {
    'aws_region': 'us-east-1',
    'aws_endpoint_url': 'https://object-store.geo-lake.com',
    'skip_signature': 'true'
}

df = pl.scan_parquet('s3://data-lakehouse/bronze/osm/parquet/relation/version=2025-05-17T14:31:37Z/*.parquet', storage_options=storage_options).collect()

🗺️ Geospatial

🕸️ Knowledge Graph

🚆 Public Transport

OpenStreetMap Data in Parquet Format

Accessing the Data

Option 1: Object Storage (S3-compatible)

Option 2: Direct File Downloads

Usage Examples

DuckDB

Polars (Python)