Overview
RaQuet files are standard Apache Parquet — they work with any Parquet-compatible query engine. For full raster analytics (point queries, zonal statistics, spatial filtering), use engines with spatial raster functions.
| Engine | Basic Parquet | Raster Functions | Notes |
|---|---|---|---|
| DuckDB | ✓ | ✓ (via extension) | Best for local analytics (1.5+) |
| BigQuery | ✓ | ✓ (Analytics Toolbox) | Fully managed, scalable |
| Snowflake | ✓ | ✓ (Analytics Toolbox) | Fully managed, scalable |
| Databricks | ✓ | ✓ (Analytics Toolbox) | Spark-based |
| PostgreSQL | ✓ | ✓ (Analytics Toolbox) | Self-hosted |
| Spark | ✓ | — | Native Parquet support |
| Pandas | ✓ | — | Via pyarrow |
| Polars | ✓ | — | Fast DataFrames |
DuckDB
DuckDB provides the best local experience with the DuckDB Raquet Extension. Requires DuckDB 1.5+ which includes native GEOMETRY type support — no separate spatial extension needed.
Installation
INSTALL raquet FROM community;
LOAD raquet;
Key Functions
| Function | Description |
|---|---|
read_raquet(file) |
Read all data rows (metadata propagated) |
read_raquet(file, geometry) |
Spatial filter with auto resolution |
read_raquet_at(file, lon, lat) |
Point query (reads only the needed tile) |
read_raquet_metadata(file) |
Read metadata row only |
ST_RasterValue(block, band, point, metadata) |
Get pixel value at a point |
ST_RasterSummaryStats(band, metadata) |
Per-tile statistics (count, sum, mean, min, max, stddev) |
ST_RegionStats(band, block, geometry, metadata) |
Aggregate statistics within a polygon |
ST_Intersects(block, geometry) |
Spatial filter (EPSG:4326) |
ST_Clip(band, block, geometry, metadata) |
Extract pixels within a geometry |
ST_NormalizedDifference(band1, band2, metadata) |
Band math (e.g., NDVI) |
ST_Point(lon, lat) |
Create POINT geometry |
Examples
Point Query — Elevation at Madrid:
LOAD raquet;
SELECT
ST_RasterValue(block, band_1, ST_Point(-3.7038, 40.4168), metadata) AS elevation_meters
FROM read_raquet_at('https://storage.googleapis.com/raquet_demo_data/world_elevation.parquet', -3.7038, 40.4168);
Region Statistics — Solar potential in a region:
LOAD raquet;
SELECT (ST_RegionStats(
band_1, block,
'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY,
metadata
)).*
FROM read_raquet(
'https://storage.googleapis.com/raquet_demo_data/world_solar_pvout.parquet',
'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY
);
Time Series Analysis:
LOAD raquet;
SELECT
YEAR(time_ts) AS year,
AVG((ST_RasterSummaryStats(band_1, metadata)).mean) AS avg_sst
FROM read_raquet('https://storage.googleapis.com/raquet_demo_data/cfsr_sst.parquet')
GROUP BY YEAR(time_ts)
ORDER BY year;
Band Math — NDVI:
LOAD raquet;
SELECT
block,
(ST_NormalizedDifferenceStats(band_4, band_3, metadata)).*
FROM read_raquet('satellite.parquet')
LIMIT 5;
Without Extension
RaQuet files work as standard Parquet even without the extension, but you need to manually filter the metadata row:
-- Basic query (no extension needed, manual metadata filtering)
SELECT *
FROM read_parquet('https://storage.googleapis.com/raquet_demo_data/world_elevation.parquet')
WHERE block != 0
LIMIT 10;
-- Read metadata
SELECT metadata
FROM read_parquet('file.parquet')
WHERE block = 0;
With the DuckDB Raquet Extension (DuckDB 1.5+), this becomes simpler:
INSTALL raquet FROM community;
LOAD raquet;
-- Cleaner API (metadata row excluded automatically)
SELECT * FROM read_raquet('file.parquet') LIMIT 10;
-- Spatial filter (only reads tiles that intersect)
SELECT * FROM read_raquet('file.parquet', 'POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))'::GEOMETRY);
-- Point query (reads only the needed tile, ~2KB from cloud)
SELECT * FROM read_raquet_at('file.parquet', -3.7038, 40.4168);
-- Read metadata
SELECT metadata FROM read_raquet_metadata('file.parquet');
BigQuery
Query RaQuet files in BigQuery using CARTO Analytics Toolbox.
Loading Data
-- Create external table from GCS
CREATE EXTERNAL TABLE `project.dataset.elevation`
OPTIONS (
format = 'PARQUET',
uris = ['gs://raquet_demo_data/world_elevation.parquet']
);
Raster Functions
With CARTO Analytics Toolbox installed:
-- Point query
SELECT
`carto-un`.carto.RASTER_ST_VALUE(block, band_1, ST_GEOGPOINT(-3.7038, 40.4168)) AS elevation
FROM `project.dataset.elevation`
WHERE `carto-un`.carto.RASTER_ST_INTERSECTS(block, ST_GEOGPOINT(-3.7038, 40.4168));
-- Zonal statistics
SELECT
SUM(`carto-un`.carto.RASTER_ST_SUMMARYSTATS(block, band_1).sum) AS total
FROM `project.dataset.elevation`
WHERE `carto-un`.carto.RASTER_ST_INTERSECTS(
block,
ST_GEOGFROMTEXT('POLYGON((-4 40, -3 40, -3 41, -4 41, -4 40))')
);
Snowflake
Query RaQuet files in Snowflake using CARTO Analytics Toolbox.
Loading Data
-- Create stage for external data
CREATE STAGE raquet_stage
URL = 'gcs://raquet_demo_data/'
FILE_FORMAT = (TYPE = PARQUET);
-- Query directly from stage
SELECT *
FROM @raquet_stage/world_elevation.parquet
LIMIT 10;
Raster Functions
With CARTO Analytics Toolbox:
SELECT
carto.RASTER_ST_VALUE(block, band_1, ST_POINT(-3.7038, 40.4168)) AS elevation
FROM @raquet_stage/world_elevation.parquet
WHERE carto.RASTER_ST_INTERSECTS(block, ST_POINT(-3.7038, 40.4168));
Databricks
Query RaQuet files in Databricks using Spark’s native Parquet support plus CARTO Analytics Toolbox.
Loading Data
# Read from cloud storage
df = spark.read.parquet("gs://raquet_demo_data/world_elevation.parquet")
df.createOrReplaceTempView("elevation")
SQL Queries
-- Basic query
SELECT * FROM elevation WHERE block != 0 LIMIT 10;
-- With CARTO Analytics Toolbox
SELECT
carto.RASTER_ST_VALUE(block, band_1, ST_POINT(-3.7038, 40.4168)) AS elevation
FROM elevation
WHERE carto.RASTER_ST_INTERSECTS(block, ST_POINT(-3.7038, 40.4168));
PostgreSQL
Query RaQuet files in PostgreSQL using CARTO Analytics Toolbox or load data directly.
Using parquet_fdw
-- Install parquet_fdw extension
CREATE EXTENSION parquet_fdw;
-- Create foreign table
CREATE FOREIGN TABLE elevation (
block BIGINT,
band_1 BYTEA,
metadata JSONB
)
SERVER parquet_server
OPTIONS (filename '/path/to/world_elevation.parquet');
-- Query
SELECT * FROM elevation WHERE block != 0 LIMIT 10;
With CARTO Analytics Toolbox
SELECT
carto.RASTER_ST_VALUE(block, band_1, ST_SetSRID(ST_MakePoint(-3.7038, 40.4168), 4326)) AS elevation
FROM elevation
WHERE carto.RASTER_ST_INTERSECTS(block, ST_SetSRID(ST_MakePoint(-3.7038, 40.4168), 4326));
Python (pandas/polars)
pandas
import pandas as pd
# Read RaQuet file
df = pd.read_parquet('https://storage.googleapis.com/raquet_demo_data/spain_solar_ghi.parquet')
# Get metadata
metadata_row = df[df['block'] == 0].iloc[0]
metadata = json.loads(metadata_row['metadata'])
# Filter data rows
data = df[df['block'] != 0]
polars
import polars as pl
# Read RaQuet file
df = pl.read_parquet('https://storage.googleapis.com/raquet_demo_data/spain_solar_ghi.parquet')
# Get metadata
metadata = df.filter(pl.col('block') == 0).select('metadata').item()
# Filter data rows
data = df.filter(pl.col('block') != 0)
CARTO Analytics Toolbox
CARTO Analytics Toolbox provides consistent raster functions across BigQuery, Snowflake, Databricks, and PostgreSQL.
Raster Functions
| Function | Description |
|---|---|
RASTER_ST_VALUE |
Get pixel value at a point |
RASTER_ST_INTERSECTS |
Spatial filter |
RASTER_ST_SUMMARYSTATS |
Zonal statistics (sum, mean, min, max, etc.) |
RASTER_ST_CLIP |
Clip raster to geometry |
Installation
See CARTO Analytics Toolbox documentation for installation instructions for your platform.
UDF-Free Analytics with Tile Statistics
RaQuet v0.5.0 files converted with --tile-stats include pre-computed per-tile statistics as plain Parquet columns. This means any SQL engine can perform raster analytics without UDFs, extensions, or decompression:
-- Works on ANY Parquet-compatible engine — no extensions needed
-- Find average slope and steepest point across all tiles
SELECT
AVG(band_1_mean) AS avg_slope,
MAX(band_1_max) AS steepest_slope,
SUM(band_1_count) AS total_pixels
FROM 'slope.parquet'
WHERE block != 0;
-- Filter tiles by statistics (e.g., find flat areas for data centers)
SELECT block, band_1_mean, band_1_max
FROM 'slope.parquet'
WHERE block != 0 AND band_1_max < 5.0;
Available columns per band: {band}_count (int64), {band}_min, {band}_max, {band}_sum, {band}_mean, {band}_stddev (all float64).
Performance Tips
-
Use tile statistics for aggregates — When you only need summary stats (mean, min, max), query the
{band}_meanetc. columns directly instead of decompressing tile data -
Use spatial filtering — Always include
ST_RasterIntersectsor equivalent to enable row group pruning -
Query remote files directly — Parquet’s columnar format enables efficient range requests; no need to download first
-
Partition large files — Use
raquet-io partitionto spatially split files for optimal cloud storage parallelism -
Small row groups for remote access — Use
--row-group-size 100-200when converting files that will be queried remotely