Skip to content

feat: Add OpenAPI Foreign Data Wrapper#566

Open
codybrom wants to merge 2 commits intosupabase:mainfrom
codybrom:main
Open

feat: Add OpenAPI Foreign Data Wrapper#566
codybrom wants to merge 2 commits intosupabase:mainfrom
codybrom:main

Conversation

@codybrom
Copy link

What kind of change does this PR introduce?

Two years ago, @kiwicopple opened #49 with a vision: what if you could query any REST API from Postgres, just by pointing at its OpenAPI spec? Then, 5 days ago, he closed it as completed. I was ecstatic when I saw the GitHub notification email, but unfortunately I couldn't find any PR or commit that implemented it so I decided to take a stab at it.

What this PR includes is a generic OpenAPI 3.0 FDW. Instead of writing a new wrapper for every REST API, you could point this at an OpenAPI spec and basically have it work out the tables for you.

This first version covers just the "read" part of what @kiwicopple suggested in #49 and implements GET requests as SELECTs using the OpenAPI spec to generate Postgres types for the responses. It also includes support for unauthenticated, bearer token or other header-based authentication (no support for OAuth flows - BYO-token).

What is the current behavior?

Every API needs its own FDW. Stripe has stripe_fdw, Shopify has shopify_fdw, etc. Each one has to handle pagination, auth, response parsing. You know... all the usual stuff.

I've been doing a lot of this work via plpgsql functions and pg_net/http extensions lately, and it gets old fast. Every new API means writing another function to handle its quirks.

What is the new behavior?

Point at a spec and query the API.

CREATE SERVER weather_api
FOREIGN DATA WRAPPER wasm_wrapper
OPTIONS (
    fdw_package_url 'file:///path/to/openapi_fdw.wasm',
    fdw_package_name 'supabase:openapi-fdw',
    fdw_package_version '0.1.0',
    base_url 'https://api.weather.gov',
    spec_url 'https://api.weather.gov/openapi.json'
);

-- One command generates tables for every endpoint
IMPORT FOREIGN SCHEMA openapi FROM SERVER weather_api INTO public;

-- Now query weather stations like they're local tables
SELECT * FROM stations LIMIT 5;

The NWS API returns GeoJSON, so setting up alerts requires digging into the nested structure:

CREATE FOREIGN TABLE zone_alerts (
    zone_id text,
    event text,
    headline text,
    severity text
)
SERVER weather_api
OPTIONS (
    endpoint '/alerts/active/zone/{zone_id}',
    response_path '/features',
    object_path '/properties'
);

SELECT event, severity, headline
FROM zone_alerts
WHERE zone_id = 'OKC143';

Result:

event severity headline
Extreme Cold Warning Severe Extreme Cold Warning issued January 25 at 10:15PM CST until January 26 at 12:00PM CST by NWS Tulsa OK

The FDW handles the stuff I got tired of reimplementing. Path parameters just work. Define an endpoint like /users/{user_id}/posts and the FDW substitutes from your WHERE clause. If you forget a required param, it tells you what's missing instead of just failing.

The FDW also auto-detects cursor-based, URL-based, or offset pagination and handles it transparently. It also does limit pushdown, so SELECT * FROM big_table LIMIT 10 stops fetching after 10 rows instead of pulling everything first. Rate limiting respects Retry-After headers when APIs send them, otherwise exponential backoff.

I also got tired of dealing with column name mismatches. Some APIs occasionally slip in some camelCase but since Postgres lowercases everything I made sure the FDW translates automatically (created_at matches createdAt). Same with schema composition, real-world OpenAPI specs might use allOf/oneOf/anyOf everywhere and this handles them. For GeoJSON or wrapped responses, there's response_path and object_path to dig into nested structures. And, maybe my favorite feature, any table can have an attrs column (jsonb) if you need the raw response.

For APIs that need auth, credentials can even be stored in the Vault (if you're on Supabase), or inline if you're not.

Scope

I started it read-only for now (GET -> SELECT). Write ops are stubbed. I figured it's better to get read working solid first and someone with a better use case for writes can add them later if people want them.

Additional context

  • Implemented as a WebAssembly wrapper in wasm-wrappers/fdw/openapi_fdw/
  • Has 10 unit tests + integration tests all passing
  • Most thoroughly tested against the weather.gov API (my use case)
  • Includes docs at docs/catalog/openapi.md

This is my first PR on this project so please let me know if anything needs changed or fixed.


Partially addresses #49

* feat: Implement OpenAPI FDW with schema mapping and foreign table generation

- Added schema.rs for mapping OpenAPI types to PostgreSQL types and generating CREATE FOREIGN TABLE statements.
- Introduced spec.rs for parsing OpenAPI specifications and extracting endpoint/schema information.
- Created world.wit file for OpenAPI FDW package definition.
- Enhanced server.py to mock OpenAPI endpoints for testing.
- Updated tests.rs to include tests for OpenAPI FDW functionality, including server creation and query execution.

* feat(openapi_fdw): add GeoJSON support, configurable pagination, and cleanup dead code

- Add object_path option for GeoJSON responses (e.g., /properties)
- Make page_size_param configurable (default: limit) with table-level override
- Add case-insensitive column matching for APIs with camelCase fields
- Use nullable field to generate NOT NULL constraints in CREATE TABLE
- Remove unused spec parsing structs: Parameter, RequestBody, SecurityScheme
- Simplify EndpointInfo to only path and response_schema
- Mark deserialize-only fields with #[allow(dead_code)]
- Make from_str() test-only with #[cfg(test)]

Tested with weather.gov API (stations, alerts endpoints)

* refactor(openapi_fdw): simplify code for clarity and maintainability

- Extract helper functions: json_to_rows, cell_to_string, extract_non_empty_string
- Flatten nested control flow with early returns in build_url and handle_pagination
- Simplify sanitize_column_name with cleaner string operations
- Clean up doc comments and move #[allow(dead_code)] to field level

* feat(vscode): add extensions.json for recommended Rust Analyzer

* feat(openapi_fdw): add custom HTTP headers support

- Add optional user_agent option for API identification (required by NWS)
- Add optional accept option for content negotiation (GeoJSON, JSON-LD)
- Add headers option for arbitrary custom headers as JSON object
- Apply clippy fixes: #[derive(Default)] for Schema, .next_back() for iterator

* feat(openapi_fdw): add path parameter substitution for complex endpoints

Support endpoint templates with path parameters like:
- /stations/{station_id}/observations
- /gridpoints/{wfo}/{x},{y}/forecast
- /zones/{type}/{zone_id}/alerts

Path parameters are extracted from WHERE clause quals and substituted
into the URL template. Values are also injected back into returned rows
so PostgreSQL's WHERE filter passes (since API responses often don't
include path param columns).

Also changed page_size default from 100 to 0 to avoid adding ?limit=100
to APIs that don't support the limit parameter.

* docs(openapi_fdw): add SAFETY comments to unsafe code

Document the safety invariants for:
- static mut INSTANCE: explains Wasm single-threaded execution model
- init_instance(): explains intentional Box::leak for FDW lifetime
- this_mut(): explains initialization order and aliasing guarantees

* docs(openapi_fdw): use generic API examples instead of NWS-specific

Replace NWS API-specific path parameter examples with generic REST API
patterns to avoid bias toward any specific API.

- /stations/{station_id}/observations -> /users/{user_id}/posts
- /gridpoints/{wfo}/{x},{y}/forecast -> /projects/{org}/{repo}/issues
- /zones/{type}/{zone_id} -> /resources/{type}/{id}

* test(openapi_fdw): expand mock server and tests for all FDW features

Add comprehensive test coverage for OpenAPI FDW:
- Path parameter substitution (/users/{user_id}/posts)
- Multiple path parameters (/projects/{org}/{repo}/issues)
- GeoJSON FeatureCollection with object_path
- Direct array responses
- Resource type/id patterns (/resources/{type}/{id})

Update mock server with matching test endpoints.

* docs(openapi_fdw): add documentation and make FDW read-only

- Add docs/catalog/openapi.md with full usage documentation
- Add OpenAPI FDW entry to root README.md
- Remove untested INSERT/UPDATE/DELETE implementation
- Replace write methods with read-only error stubs

* fix(openapi_fdw): address clippy warnings and improve safety

- Use try_from instead of unsafe `as` casts for integer conversions
  to prevent truncation on overflow (returns None instead)
- Add depth limiting (MAX_RESOLVE_DEPTH=32) to schema resolution
  to prevent stack overflow on circular OpenAPI references
- Fix SAFETY comments to correctly reference init() instead of
  host_version_requirement()
- Fix numerous clippy warnings: doc_markdown, format strings,
  redundant closures, match arm simplification, map_or_else usage

* feat(openapi_fdw): add rate limiting, limit pushdown, and documentation

- Add README.md with comprehensive usage documentation, configuration
  options, query pushdown details, and type mapping reference
- Add HTTP 429 rate limiting with retry-after header parsing and
  exponential backoff (1s, 2s, 4s) up to 3 retries
- Add limit pushdown to stop pagination early when LIMIT is satisfied,
  reducing unnecessary API calls
- Add URL validation for base_url and spec_url in init()

* fix(openapi_fdw): improve path parameter validation and allOf schema merging

- Return clear error when required path parameters are missing from WHERE clause
  instead of silently constructing invalid URLs that return empty results
- Fix allOf property merging to use "later wins" semantics per OpenAPI spec,
  allowing child schemas to refine/override parent property definitions
- Add test for allOf override behavior

---------

Co-authored-by: Cody Bromley <codybrom@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 26, 2026 05:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a generic OpenAPI 3.0+ Foreign Data Wrapper that allows querying any REST API with an OpenAPI specification directly from PostgreSQL. The implementation is read-only and supports GET operations with automatic pagination, rate limiting with retry logic, path parameter substitution, and IMPORT FOREIGN SCHEMA for automatic table generation.

Changes:

  • Added new openapi_fdw WASM wrapper with OpenAPI spec parsing, schema generation, and FDW implementation
  • Integrated the new FDW into the workspace build system and main README
  • Added comprehensive integration tests and mock API server endpoints
  • Provided user documentation and developer README with examples

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
wasm-wrappers/fdw/openapi_fdw/src/lib.rs Main FDW implementation with request handling, pagination, and data mapping
wasm-wrappers/fdw/openapi_fdw/src/spec.rs OpenAPI 3.0 specification parsing with schema resolution and composition support
wasm-wrappers/fdw/openapi_fdw/src/schema.rs PostgreSQL schema generation and type mapping from OpenAPI schemas
wasm-wrappers/fdw/openapi_fdw/wit/world.wit WASM component interface definition
wasm-wrappers/fdw/openapi_fdw/Cargo.toml Package configuration and dependencies
wasm-wrappers/fdw/openapi_fdw/README.md Developer documentation with build and test instructions
wasm-wrappers/fdw/Cargo.toml Updated workspace to include openapi_fdw
wasm-wrappers/fdw/Cargo.lock Dependency lock file updates
wrappers/src/fdw/wasm_fdw/tests.rs Integration tests covering various endpoint patterns and features
wrappers/dockerfiles/wasm/server.py Mock API server endpoints for testing
docs/catalog/openapi.md Comprehensive user documentation with configuration and usage examples
README.md Added OpenAPI FDW to the main wrapper list
.vscode/extensions.json VS Code extension recommendations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add URL encoding for path params, query params, and pagination cursors
  using urlencoding crate
- Quote SQL identifiers in generated foreign tables to handle special
  characters and prevent injection from external OpenAPI specs
- Handle relative pagination URLs by prepending base_url
- Validate OpenAPI version is 3.x with helpful error message
- Rename init_instance() to init() for consistency with other Wasm FDWs
- Report warning on invalid page_size option instead of silent fallback
- Fix docs: document actual rate limiting behavior (HTTP 429 retry with
  Retry-After support and exponential backoff)
@codybrom
Copy link
Author

I've addressed all the Copilot feedback in a follow-up commit. Happy to squash if you'd prefer a single commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant