An open source project by CARTO
RaQuet: Raster + Parquet
Bring raster data into the lakehouse — query rasters with SQL, powered by production-grade analytics from CARTO.
RaQuet is a specification for storing and querying raster data using Apache Parquet, enabling efficient cloud-native raster workflows. Developed by CARTO, RaQuet brings raster data into the modern data stack.
Why RaQuet?
- Cloud-Native: Query raster data directly from cloud storage using DuckDB, BigQuery, or any Parquet-compatible tool
- Efficient: QUADBIN spatial indexing enables fast tile lookups with row group pruning
- Simple: Standard Parquet format works with existing data warehouse infrastructure
- Interoperable: Convert from GeoTIFF, COG, or ArcGIS ImageServer
Why Parquet for Raster Data?
RaQuet exists to bring raster data into the same analytical ecosystem where vector data already lives. Over the last decade, vector geospatial data has successfully integrated into cloud data warehouses and lakehouses through open formats like Parquet, GeoParquet, and table formats like Iceberg. Raster data, however, remains locked in GIS- and HPC-oriented formats like GeoTIFF/COG and Zarr — powerful, but largely invisible to SQL engines and analytics platforms.
RaQuet bridges that gap by encoding rasters as Parquet. This makes raster tiles queryable with DuckDB, BigQuery, Snowflake, Spark, Trino/Presto — and makes rasters governable, versionable, and joinable inside the lakehouse.
Key idea: COG/Zarr optimize storage & access. RaQuet optimizes integration & computation in the modern data stack.
Because in most analytics stacks… you don't simply query a raster.
Meme image used for illustrative purposes.
RaQuet vs COG vs Zarr
These formats serve different workflows and are complementary, not competing:
| COG (GeoTIFF) | Zarr | RaQuet | |
|---|---|---|---|
| Best for | GIS pipelines | Scientific / array computing | Analytics / lakehouse |
| Strengths | Optimized for GDAL-style window reads and visualization. Great for tiling + overviews. | Chunked multidimensional arrays (NumPy/Xarray/HPC). Parallel-friendly and cloud-native. | Parquet-native: works with warehouses and SQL engines. Joins with vector data in the same stack. |
| Limitation | Not natively queryable in SQL engines | Requires specialized runtimes/APIs (not warehouse-native) | Designed for tiles, not arbitrary window reads |
RaQuet is complementary to COG and Zarr — it’s the representation designed for SQL + lakehouse workflows.
Backed by Production Analytics Engines
RaQuet isn’t just a specification — it’s designed to plug directly into CARTO’s Analytics Toolbox, which already runs natively inside major data warehouses. This means you get production-grade spatial functions, not just file format support.
With CARTO’s toolboxes, you can perform spatial joins between raster tiles and vector geometries, run zonal statistics, and build ML pipelines — all in pure SQL, inside your warehouse.
Roadmap: Apache Iceberg Integration
Status: Active development — not yet generally available.
We’re actively working on support for registering RaQuet datasets as Apache Iceberg tables. The goal: publish rasters directly into Iceberg catalogs so they can be discovered and queried like any other table in your lakehouse.
GeoParquet brought vector data into the lakehouse. RaQuet does the same for raster. Iceberg unifies them under a single governance and query layer — enabling true multimodal spatial analytics where vector and raster live side by side, versioned together, and queryable with the same SQL engine.
We’re collaborating with the community to define best practices for spatial data in Iceberg. Follow progress on the GitHub repository or reach out if you’re interested in early access.
Quick Start
# Install
pip install raquet-io
# Convert a GeoTIFF to RaQuet
raquet-io convert geotiff input.tif output.parquet
# Inspect a RaQuet file
raquet-io inspect output.parquet
# Query with DuckDB
duckdb -c "SELECT * FROM read_parquet('output.parquet') WHERE block != 0 LIMIT 5"
Try It Now
Open the RaQuet Viewer - A client-side viewer powered by DuckDB-WASM. Load any RaQuet file from a URL and explore it interactively, no server required!
How It Works
Each row in a RaQuet file represents a single rectangular tile of raster data:
| block | band_1 | band_2 | band_3 | metadata |
|---|---|---|---|---|
| 0 | NULL | NULL | NULL | {"version": "0.1.0", ...} |
| 5270201491262341119 | <binary> |
<binary> |
<binary> |
NULL |
| 5270201491262406655 | <binary> |
<binary> |
<binary> |
NULL |
- block: QUADBIN cell ID encoding tile location and zoom level
- band_N: Gzip-compressed binary pixel data (row-major order)
- metadata: JSON metadata in the special
block=0row
Resources
Documentation
- Format Specification - Complete technical specification
- CLI Reference - Command-line tool documentation
- Python API - Programmatic usage
Tools
- RaQuet Viewer - Browser-based viewer (DuckDB-WASM)
- raquet CLI - Convert, inspect, and export RaQuet files
CLI Reference
Installation
# Basic installation
pip install raquet-io
# With all features
pip install "raquet-io[all]"
Note: GDAL must be installed separately. On macOS: brew install gdal
Commands
raquet-io inspect
Display metadata and statistics for a RaQuet file.
raquet-io inspect landcover.parquet
raquet-io inspect landcover.parquet -v # verbose
raquet-io convert geotiff
Convert a GeoTIFF to RaQuet format.
raquet-io convert geotiff input.tif output.parquet
# With options
raquet-io convert geotiff input.tif output.parquet \
--resampling bilinear \
--block-size 512 \
--row-group-size 200 \
-v
| Option | Description |
|---|---|
--zoom-strategy |
auto, lower, upper (default: auto) |
--resampling |
near, bilinear, cubic, etc. |
--block-size |
Block size in pixels (default: 256) |
--row-group-size |
Rows per Parquet row group (default: 200) |
-v, --verbose |
Enable verbose output |
raquet-io convert imageserver
Convert an ArcGIS ImageServer to RaQuet format.
raquet-io convert imageserver https://server/.../ImageServer output.parquet \
--bbox "-122.5,37.5,-122.0,38.0" \
--resolution 12
raquet-io export geotiff
Export a RaQuet file back to GeoTIFF.
raquet-io export geotiff input.parquet output.tif
raquet-io split-zoom
Split a RaQuet file by zoom level for optimized remote access.
raquet-io split-zoom input.parquet output_dir/
FAQ
What’s the difference between RaQuet and COG (Cloud Optimized GeoTIFF)?
COG and RaQuet serve different layers of the data stack. COG is ideal for classic raster access patterns: window reads, visualization, GDAL pipelines, and serving tiles to web maps. RaQuet targets a different problem: making raster data computable and governable inside data warehouses and lakehouses using Parquet.
Think of it this way: COG is how you store and serve rasters; RaQuet is how you analyze and join them with the rest of your data.
| Feature | RaQuet | COG |
|---|---|---|
| Format | Parquet | GeoTIFF |
| Query Tool | SQL (DuckDB, BigQuery) | GDAL, rasterio |
| Index Type | QUADBIN (discrete tiles) | Internal overviews |
| Data Warehouse | Native support | Requires conversion |
| Best for | Analytics, joins, SQL workflows | Visualization, window reads, GIS pipelines |
RaQuet is ideal when you need SQL-based queries, want to join raster with vector data, or need lakehouse governance (versioning, lineage, access control).
Can I use RaQuet with BigQuery or Snowflake?
Yes! RaQuet files are standard Parquet files that can be loaded into any Parquet-compatible data warehouse. The QUADBIN indexing works natively with CARTO’s Analytics Toolbox.
How do I query specific tiles?
-- Query a specific tile by QUADBIN ID
SELECT block, band_1, band_2, band_3
FROM read_parquet('raster.parquet')
WHERE block = 5270201491262341119;
-- Query a range of tiles (efficient with sorted data)
SELECT block, band_1, band_2, band_3
FROM read_parquet('raster.parquet')
WHERE block BETWEEN 5270201491262341119 AND 5270201491263324159;
What raster formats can I convert to RaQuet?
Currently supported:
- GeoTIFF / Cloud Optimized GeoTIFF (COG)
- ArcGIS ImageServer
Is there a size limit?
RaQuet can handle rasters of any size. For very large datasets, consider using raquet-io split-zoom to create separate files per zoom level for optimal query performance.
Performance Considerations
RaQuet is optimized for efficient remote access. Here are key recommendations:
Block Sorting (Critical)
RaQuet files must have blocks sorted by QUADBIN ID for optimal performance. This enables Parquet row group pruning, reducing data transfer by 90%+ for typical queries.
# The CLI automatically sorts blocks during conversion
raquet-io convert geotiff input.tif output.parquet
Row Group Size
Smaller row groups (default: 200 rows) enable finer-grained filtering:
raquet-io convert geotiff input.tif output.parquet --row-group-size 200
| Row Group Size | Best For |
|---|---|
| 200 (default) | Remote access, cloud storage |
| 1000+ | Local queries, full scans |
Query Patterns
Fast (contiguous read):
-- Range query - uses row group pruning effectively
SELECT * FROM read_parquet('file.parquet')
WHERE block BETWEEN 5270201491262341119 AND 5270201491263324159
Slower (scattered reads):
-- IN clause - may require multiple row group reads
SELECT * FROM read_parquet('file.parquet')
WHERE block IN (5270201491262341119, 5280000000000000000, ...)
Client-Side vs Server-Side
| Environment | Performance | Best For |
|---|---|---|
| DuckDB (server/native) | Fast (~200ms for 20 tiles) | Production APIs |
| DuckDB-WASM (browser) | Slower (~5s for 20 tiles) | Interactive demos |
The browser viewer uses batched BETWEEN queries to mitigate WASM limitations.
Zoom Level Splitting
For very large datasets, split by zoom level:
raquet-io split-zoom large_raster.parquet output_dir/
# Creates: zoom_11.parquet, zoom_12.parquet, etc.
This allows queries to target specific zoom levels without scanning the entire file.
About
RaQuet is an open source project created and maintained by CARTO, the leading Location Intelligence platform. CARTO helps organizations unlock the power of spatial data through cloud-native analytics.
Learn more about CARTO’s spatial data solutions at carto.com.
License
RaQuet is open source under the BSD-3-Clause License.
Contributing
Contributions are welcome! See the GitHub repository for issues and pull requests.