feat: Add OpenAPI Foreign Data Wrapper#566
Open
codybrom wants to merge 2 commits intosupabase:mainfrom
Open
Conversation
* feat: Implement OpenAPI FDW with schema mapping and foreign table generation
- Added schema.rs for mapping OpenAPI types to PostgreSQL types and generating CREATE FOREIGN TABLE statements.
- Introduced spec.rs for parsing OpenAPI specifications and extracting endpoint/schema information.
- Created world.wit file for OpenAPI FDW package definition.
- Enhanced server.py to mock OpenAPI endpoints for testing.
- Updated tests.rs to include tests for OpenAPI FDW functionality, including server creation and query execution.
* feat(openapi_fdw): add GeoJSON support, configurable pagination, and cleanup dead code
- Add object_path option for GeoJSON responses (e.g., /properties)
- Make page_size_param configurable (default: limit) with table-level override
- Add case-insensitive column matching for APIs with camelCase fields
- Use nullable field to generate NOT NULL constraints in CREATE TABLE
- Remove unused spec parsing structs: Parameter, RequestBody, SecurityScheme
- Simplify EndpointInfo to only path and response_schema
- Mark deserialize-only fields with #[allow(dead_code)]
- Make from_str() test-only with #[cfg(test)]
Tested with weather.gov API (stations, alerts endpoints)
* refactor(openapi_fdw): simplify code for clarity and maintainability
- Extract helper functions: json_to_rows, cell_to_string, extract_non_empty_string
- Flatten nested control flow with early returns in build_url and handle_pagination
- Simplify sanitize_column_name with cleaner string operations
- Clean up doc comments and move #[allow(dead_code)] to field level
* feat(vscode): add extensions.json for recommended Rust Analyzer
* feat(openapi_fdw): add custom HTTP headers support
- Add optional user_agent option for API identification (required by NWS)
- Add optional accept option for content negotiation (GeoJSON, JSON-LD)
- Add headers option for arbitrary custom headers as JSON object
- Apply clippy fixes: #[derive(Default)] for Schema, .next_back() for iterator
* feat(openapi_fdw): add path parameter substitution for complex endpoints
Support endpoint templates with path parameters like:
- /stations/{station_id}/observations
- /gridpoints/{wfo}/{x},{y}/forecast
- /zones/{type}/{zone_id}/alerts
Path parameters are extracted from WHERE clause quals and substituted
into the URL template. Values are also injected back into returned rows
so PostgreSQL's WHERE filter passes (since API responses often don't
include path param columns).
Also changed page_size default from 100 to 0 to avoid adding ?limit=100
to APIs that don't support the limit parameter.
* docs(openapi_fdw): add SAFETY comments to unsafe code
Document the safety invariants for:
- static mut INSTANCE: explains Wasm single-threaded execution model
- init_instance(): explains intentional Box::leak for FDW lifetime
- this_mut(): explains initialization order and aliasing guarantees
* docs(openapi_fdw): use generic API examples instead of NWS-specific
Replace NWS API-specific path parameter examples with generic REST API
patterns to avoid bias toward any specific API.
- /stations/{station_id}/observations -> /users/{user_id}/posts
- /gridpoints/{wfo}/{x},{y}/forecast -> /projects/{org}/{repo}/issues
- /zones/{type}/{zone_id} -> /resources/{type}/{id}
* test(openapi_fdw): expand mock server and tests for all FDW features
Add comprehensive test coverage for OpenAPI FDW:
- Path parameter substitution (/users/{user_id}/posts)
- Multiple path parameters (/projects/{org}/{repo}/issues)
- GeoJSON FeatureCollection with object_path
- Direct array responses
- Resource type/id patterns (/resources/{type}/{id})
Update mock server with matching test endpoints.
* docs(openapi_fdw): add documentation and make FDW read-only
- Add docs/catalog/openapi.md with full usage documentation
- Add OpenAPI FDW entry to root README.md
- Remove untested INSERT/UPDATE/DELETE implementation
- Replace write methods with read-only error stubs
* fix(openapi_fdw): address clippy warnings and improve safety
- Use try_from instead of unsafe `as` casts for integer conversions
to prevent truncation on overflow (returns None instead)
- Add depth limiting (MAX_RESOLVE_DEPTH=32) to schema resolution
to prevent stack overflow on circular OpenAPI references
- Fix SAFETY comments to correctly reference init() instead of
host_version_requirement()
- Fix numerous clippy warnings: doc_markdown, format strings,
redundant closures, match arm simplification, map_or_else usage
* feat(openapi_fdw): add rate limiting, limit pushdown, and documentation
- Add README.md with comprehensive usage documentation, configuration
options, query pushdown details, and type mapping reference
- Add HTTP 429 rate limiting with retry-after header parsing and
exponential backoff (1s, 2s, 4s) up to 3 retries
- Add limit pushdown to stop pagination early when LIMIT is satisfied,
reducing unnecessary API calls
- Add URL validation for base_url and spec_url in init()
* fix(openapi_fdw): improve path parameter validation and allOf schema merging
- Return clear error when required path parameters are missing from WHERE clause
instead of silently constructing invalid URLs that return empty results
- Fix allOf property merging to use "later wins" semantics per OpenAPI spec,
allowing child schemas to refine/override parent property definitions
- Add test for allOf override behavior
---------
Co-authored-by: Cody Bromley <codybrom@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a generic OpenAPI 3.0+ Foreign Data Wrapper that allows querying any REST API with an OpenAPI specification directly from PostgreSQL. The implementation is read-only and supports GET operations with automatic pagination, rate limiting with retry logic, path parameter substitution, and IMPORT FOREIGN SCHEMA for automatic table generation.
Changes:
- Added new
openapi_fdwWASM wrapper with OpenAPI spec parsing, schema generation, and FDW implementation - Integrated the new FDW into the workspace build system and main README
- Added comprehensive integration tests and mock API server endpoints
- Provided user documentation and developer README with examples
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| wasm-wrappers/fdw/openapi_fdw/src/lib.rs | Main FDW implementation with request handling, pagination, and data mapping |
| wasm-wrappers/fdw/openapi_fdw/src/spec.rs | OpenAPI 3.0 specification parsing with schema resolution and composition support |
| wasm-wrappers/fdw/openapi_fdw/src/schema.rs | PostgreSQL schema generation and type mapping from OpenAPI schemas |
| wasm-wrappers/fdw/openapi_fdw/wit/world.wit | WASM component interface definition |
| wasm-wrappers/fdw/openapi_fdw/Cargo.toml | Package configuration and dependencies |
| wasm-wrappers/fdw/openapi_fdw/README.md | Developer documentation with build and test instructions |
| wasm-wrappers/fdw/Cargo.toml | Updated workspace to include openapi_fdw |
| wasm-wrappers/fdw/Cargo.lock | Dependency lock file updates |
| wrappers/src/fdw/wasm_fdw/tests.rs | Integration tests covering various endpoint patterns and features |
| wrappers/dockerfiles/wasm/server.py | Mock API server endpoints for testing |
| docs/catalog/openapi.md | Comprehensive user documentation with configuration and usage examples |
| README.md | Added OpenAPI FDW to the main wrapper list |
| .vscode/extensions.json | VS Code extension recommendations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add URL encoding for path params, query params, and pagination cursors using urlencoding crate - Quote SQL identifiers in generated foreign tables to handle special characters and prevent injection from external OpenAPI specs - Handle relative pagination URLs by prepending base_url - Validate OpenAPI version is 3.x with helpful error message - Rename init_instance() to init() for consistency with other Wasm FDWs - Report warning on invalid page_size option instead of silent fallback - Fix docs: document actual rate limiting behavior (HTTP 429 retry with Retry-After support and exponential backoff)
Author
|
I've addressed all the Copilot feedback in a follow-up commit. Happy to squash if you'd prefer a single commit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
Two years ago, @kiwicopple opened #49 with a vision: what if you could query any REST API from Postgres, just by pointing at its OpenAPI spec? Then, 5 days ago, he closed it as completed. I was ecstatic when I saw the GitHub notification email, but unfortunately I couldn't find any PR or commit that implemented it so I decided to take a stab at it.
What this PR includes is a generic OpenAPI 3.0 FDW. Instead of writing a new wrapper for every REST API, you could point this at an OpenAPI spec and basically have it work out the tables for you.
This first version covers just the "read" part of what @kiwicopple suggested in #49 and implements GET requests as SELECTs using the OpenAPI spec to generate Postgres types for the responses. It also includes support for unauthenticated, bearer token or other header-based authentication (no support for OAuth flows - BYO-token).
What is the current behavior?
Every API needs its own FDW. Stripe has
stripe_fdw, Shopify hasshopify_fdw, etc. Each one has to handle pagination, auth, response parsing. You know... all the usual stuff.I've been doing a lot of this work via plpgsql functions and
pg_net/httpextensions lately, and it gets old fast. Every new API means writing another function to handle its quirks.What is the new behavior?
Point at a spec and query the API.
CREATE SERVER weather_api FOREIGN DATA WRAPPER wasm_wrapper OPTIONS ( fdw_package_url 'file:///path/to/openapi_fdw.wasm', fdw_package_name 'supabase:openapi-fdw', fdw_package_version '0.1.0', base_url 'https://api.weather.gov', spec_url 'https://api.weather.gov/openapi.json' ); -- One command generates tables for every endpoint IMPORT FOREIGN SCHEMA openapi FROM SERVER weather_api INTO public; -- Now query weather stations like they're local tables SELECT * FROM stations LIMIT 5;The NWS API returns GeoJSON, so setting up alerts requires digging into the nested structure:
CREATE FOREIGN TABLE zone_alerts ( zone_id text, event text, headline text, severity text ) SERVER weather_api OPTIONS ( endpoint '/alerts/active/zone/{zone_id}', response_path '/features', object_path '/properties' ); SELECT event, severity, headline FROM zone_alerts WHERE zone_id = 'OKC143';Result:
The FDW handles the stuff I got tired of reimplementing. Path parameters just work. Define an endpoint like
/users/{user_id}/postsand the FDW substitutes from your WHERE clause. If you forget a required param, it tells you what's missing instead of just failing.The FDW also auto-detects cursor-based, URL-based, or offset pagination and handles it transparently. It also does limit pushdown, so
SELECT * FROM big_table LIMIT 10stops fetching after 10 rows instead of pulling everything first. Rate limiting respectsRetry-Afterheaders when APIs send them, otherwise exponential backoff.I also got tired of dealing with column name mismatches. Some APIs occasionally slip in some camelCase but since Postgres lowercases everything I made sure the FDW translates automatically (
created_atmatchescreatedAt). Same with schema composition, real-world OpenAPI specs might useallOf/oneOf/anyOfeverywhere and this handles them. For GeoJSON or wrapped responses, there'sresponse_pathandobject_pathto dig into nested structures. And, maybe my favorite feature, any table can have anattrscolumn (jsonb) if you need the raw response.For APIs that need auth, credentials can even be stored in the Vault (if you're on Supabase), or inline if you're not.
Scope
I started it read-only for now (GET -> SELECT). Write ops are stubbed. I figured it's better to get read working solid first and someone with a better use case for writes can add them later if people want them.
Additional context
wasm-wrappers/fdw/openapi_fdw/docs/catalog/openapi.mdThis is my first PR on this project so please let me know if anything needs changed or fixed.
Partially addresses #49