Overview

RaQuet files are standard Apache Parquet — they work with any Parquet-compatible query engine. For full raster analytics (point queries, zonal statistics, spatial filtering), use engines with spatial raster functions.

Engine Basic Parquet Raster Functions Notes
DuckDB ✓ (via extension) Best for local analytics (1.5+)
BigQuery ✓ (Analytics Toolbox) Fully managed, scalable
Snowflake ✓ (Analytics Toolbox) Fully managed, scalable
Databricks ✓ (Analytics Toolbox) Spark-based
PostgreSQL ✓ (Analytics Toolbox) Self-hosted
Spark Native Parquet support
Pandas Via pyarrow
Polars Fast DataFrames

DuckDB

DuckDB provides the best local experience with the DuckDB Raquet Extension. Requires DuckDB 1.5+ which includes native GEOMETRY type support — no separate spatial extension needed.

Installation

INSTALL raquet FROM community;
LOAD raquet;

Key Functions

Function Description
read_raquet(file) Read all data rows (metadata propagated)
read_raquet(file, geometry) Spatial filter with auto resolution
read_raquet_at(file, lon, lat) Point query (reads only the needed tile)
read_raquet_metadata(file) Read metadata row only
ST_RasterValue(block, band, point, metadata) Get pixel value at a point
ST_RasterSummaryStats(band, metadata) Per-tile statistics (count, sum, mean, min, max, stddev)
ST_RegionStats(band, block, geometry, metadata) Aggregate statistics within a polygon
ST_Intersects(block, geometry) Spatial filter (EPSG:4326)
ST_Clip(band, block, geometry, metadata) Extract pixels within a geometry
ST_NormalizedDifference(band1, band2, metadata) Band math (e.g., NDVI)
ST_Point(lon, lat) Create POINT geometry

Examples

Point Query — Elevation at Madrid:

LOAD raquet;

SELECT
    ST_RasterValue(block, band_1, ST_Point(-3.7038, 40.4168), metadata) AS elevation_meters
FROM read_raquet_at('https://storage.googleapis.com/raquet_demo_data/world_elevation.parquet', -3.7038, 40.4168);

Region Statistics — Solar potential in a region:

LOAD raquet;

SELECT (ST_RegionStats(
    band_1, block,
    'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY,
    metadata
)).*
FROM read_raquet(
    'https://storage.googleapis.com/raquet_demo_data/world_solar_pvout.parquet',
    'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY
);

Time Series Analysis:

LOAD raquet;

SELECT
    YEAR(time_ts) AS year,
    AVG((ST_RasterSummaryStats(band_1, metadata)).mean) AS avg_sst
FROM read_raquet('https://storage.googleapis.com/raquet_demo_data/cfsr_sst.parquet')
GROUP BY YEAR(time_ts)
ORDER BY year;

Band Math — NDVI:

LOAD raquet;

SELECT
    block,
    (ST_NormalizedDifferenceStats(band_4, band_3, metadata)).*
FROM read_raquet('satellite.parquet')
LIMIT 5;

Without Extension

RaQuet files work as standard Parquet even without the extension, but you need to manually filter the metadata row:

-- Basic query (no extension needed, manual metadata filtering)
SELECT *
FROM read_parquet('https://storage.googleapis.com/raquet_demo_data/world_elevation.parquet')
WHERE block != 0
LIMIT 10;

-- Read metadata
SELECT metadata
FROM read_parquet('file.parquet')
WHERE block = 0;

With the DuckDB Raquet Extension (DuckDB 1.5+), this becomes simpler:

INSTALL raquet FROM community;
LOAD raquet;

-- Cleaner API (metadata row excluded automatically)
SELECT * FROM read_raquet('file.parquet') LIMIT 10;

-- Spatial filter (only reads tiles that intersect)
SELECT * FROM read_raquet('file.parquet', 'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY);

-- Point query (reads only the needed tile, ~2KB from cloud)
SELECT * FROM read_raquet_at('file.parquet', -3.7038, 40.4168);

-- Read metadata
SELECT metadata FROM read_raquet_metadata('file.parquet');

BigQuery

Query RaQuet files in BigQuery using CARTO Analytics Toolbox.

Loading Data

-- Create external table from GCS
CREATE EXTERNAL TABLE `project.dataset.elevation`
OPTIONS (
  format = 'PARQUET',
  uris = ['gs://raquet_demo_data/world_elevation.parquet']
);

Raster Functions

With CARTO Analytics Toolbox installed:

-- Point query
SELECT
  `carto-un`.carto.RASTER_ST_VALUE(block, band_1, ST_GEOGPOINT(-3.7038, 40.4168)) AS elevation
FROM `project.dataset.elevation`
WHERE `carto-un`.carto.RASTER_ST_INTERSECTS(block, ST_GEOGPOINT(-3.7038, 40.4168));

-- Zonal statistics
SELECT
  SUM(`carto-un`.carto.RASTER_ST_SUMMARYSTATS(block, band_1).sum) AS total
FROM `project.dataset.elevation`
WHERE `carto-un`.carto.RASTER_ST_INTERSECTS(
  block,
  ST_GEOGFROMTEXT('POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))')
);

Snowflake

Query RaQuet files in Snowflake using CARTO Analytics Toolbox.

Loading Data

-- Create stage for external data
CREATE STAGE raquet_stage
  URL = 'gcs://raquet_demo_data/'
  FILE_FORMAT = (TYPE = PARQUET);

-- Query directly from stage
SELECT *
FROM @raquet_stage/world_elevation.parquet
LIMIT 10;

Raster Functions

With CARTO Analytics Toolbox:

SELECT
  carto.RASTER_ST_VALUE(block, band_1, ST_POINT(-3.7038, 40.4168)) AS elevation
FROM @raquet_stage/world_elevation.parquet
WHERE carto.RASTER_ST_INTERSECTS(block, ST_POINT(-3.7038, 40.4168));

Databricks

Query RaQuet files in Databricks using Spark’s native Parquet support plus CARTO Analytics Toolbox.

Loading Data

# Read from cloud storage
df = spark.read.parquet("gs://raquet_demo_data/world_elevation.parquet")
df.createOrReplaceTempView("elevation")

SQL Queries

-- Basic query
SELECT * FROM elevation WHERE block != 0 LIMIT 10;

-- With CARTO Analytics Toolbox
SELECT
  carto.RASTER_ST_VALUE(block, band_1, ST_POINT(-3.7038, 40.4168)) AS elevation
FROM elevation
WHERE carto.RASTER_ST_INTERSECTS(block, ST_POINT(-3.7038, 40.4168));

PostgreSQL

Query RaQuet files in PostgreSQL using CARTO Analytics Toolbox or load data directly.

Using parquet_fdw

-- Install parquet_fdw extension
CREATE EXTENSION parquet_fdw;

-- Create foreign table
CREATE FOREIGN TABLE elevation (
  block BIGINT,
  band_1 BYTEA,
  metadata JSONB
)
SERVER parquet_server
OPTIONS (filename '/path/to/world_elevation.parquet');

-- Query
SELECT * FROM elevation WHERE block != 0 LIMIT 10;

With CARTO Analytics Toolbox

SELECT
  carto.RASTER_ST_VALUE(block, band_1, ST_SetSRID(ST_MakePoint(-3.7038, 40.4168), 4326)) AS elevation
FROM elevation
WHERE carto.RASTER_ST_INTERSECTS(block, ST_SetSRID(ST_MakePoint(-3.7038, 40.4168), 4326));

Python (pandas/polars)

pandas

import pandas as pd

# Read RaQuet file
df = pd.read_parquet('https://storage.googleapis.com/raquet_demo_data/spain_solar_ghi.parquet')

# Get metadata
metadata_row = df[df['block'] == 0].iloc[0]
metadata = json.loads(metadata_row['metadata'])

# Filter data rows
data = df[df['block'] != 0]

polars

import polars as pl

# Read RaQuet file
df = pl.read_parquet('https://storage.googleapis.com/raquet_demo_data/spain_solar_ghi.parquet')

# Get metadata
metadata = df.filter(pl.col('block') == 0).select('metadata').item()

# Filter data rows
data = df.filter(pl.col('block') != 0)

CARTO Analytics Toolbox

CARTO Analytics Toolbox provides consistent raster functions across BigQuery, Snowflake, Databricks, and PostgreSQL.

Raster Functions

Function Description
RASTER_ST_VALUE Get pixel value at a point
RASTER_ST_INTERSECTS Spatial filter
RASTER_ST_SUMMARYSTATS Zonal statistics (sum, mean, min, max, etc.)
RASTER_ST_CLIP Clip raster to geometry

Installation

See CARTO Analytics Toolbox documentation for installation instructions for your platform.


UDF-Free Analytics with Tile Statistics

RaQuet v0.5.0 files converted with --tile-stats include pre-computed per-tile statistics as plain Parquet columns. This means any SQL engine can perform raster analytics without UDFs, extensions, or decompression:

-- Works on ANY Parquet-compatible engine — no extensions needed
-- Find average slope and steepest point across all tiles
SELECT
    AVG(band_1_mean) AS avg_slope,
    MAX(band_1_max) AS steepest_slope,
    SUM(band_1_count) AS total_pixels
FROM 'slope.parquet'
WHERE block != 0;

-- Filter tiles by statistics (e.g., find flat areas for data centers)
SELECT block, band_1_mean, band_1_max
FROM 'slope.parquet'
WHERE block != 0 AND band_1_max < 5.0;

Available columns per band: {band}_count (int64), {band}_min, {band}_max, {band}_sum, {band}_mean, {band}_stddev (all float64).


Performance Tips

  1. Use tile statistics for aggregates — When you only need summary stats (mean, min, max), query the {band}_mean etc. columns directly instead of decompressing tile data

  2. Use spatial filtering — Always include ST_RasterIntersects or equivalent to enable row group pruning

  3. Query remote files directly — Parquet’s columnar format enables efficient range requests; no need to download first

  4. Partition large files — Use raquet-io partition to spatially split files for optimal cloud storage parallelism

  5. Small row groups for remote access — Use --row-group-size 100-200 when converting files that will be queried remotely