-
Notifications
You must be signed in to change notification settings - Fork 16.6k
feat(examples): Modernize example data loading with Parquet and YAML configs #36538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fd5b47e to
be5e9f1
Compare
2131e36 to
cd9cf76
Compare
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
CodeAnt AI Incremental review completed. |
betodealmeida
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
My suggestion for future improvements: let's put all virtual datasets in per-DB directories, and use them depending of the examples DB:
- datasets/virtual/postgres/
- datasets/virtual/mysql/
We can have only Postgres intiially, and add support for more over time.
|
I love the decision to go with Parquet instead of DuckDB for this use case. Nicely done! |
The modernized example loading (apache#36538) routes through import_database() which checks PREVENT_UNSAFE_DB_CONNECTIONS. This blocks SQLite examples URIs in environments where this safety flag is enabled. Skip the check when ignore_permissions=True, since system imports (like examples) use URIs from server config, not untrusted user input. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
This PR modernizes the Superset example data loading system by migrating to a Parquet-based approach with YAML configuration files, organized by dashboard for better developer experience.
Key Changes
New Directory Structure by Dashboard
_shared/directoryMigrated to Parquet Storage Format
Auto-Discovery System
data.parquetfile in a new directory to add an exampleGeneric Loading System
load_parquet_table()for unified data loadingExport as Example Feature ✨ NEW
superset export-example --dashboard-id <id> --name <name> --output-dir <dir>GET /api/v1/dashboard/<pk>/export_as_example/exportpermission as the regular YAML exportShowtime/Ephemeral Environment Support 🐤 NEW
examples.duckdbfile from external repoLOAD_EXAMPLES_DUCKDBbuild argument from DockerfileExport as Example UI
The "Export as Example" option appears in the Download submenu alongside "Export YAML":
Example Dashboards (10 total)
Why Parquet?
Benefits
Testing
Cypress Test Status
box_plot.test.jsbubble.test.jsnativeFilters.test.tsfilter.test.ts(chart_list)_skip.tabs.test.ts_skip.AdhocMetrics.test.ts_skip.advanced_analytics.test.ts_skip.link.test.ts_skip.annotations.test.tsBreaking Changes
None for end users. The
superset load-examplescommand works exactly as before.For developers:
superset.examples.birth_namesare removedsuperset/examples/data/tosuperset/examples/{name}/data.parquetLOAD_EXAMPLES_DUCKDBbuild argument removed - examples are loaded from Parquet at runtimeNext Steps (Follow-up PRs)
The following items are out of scope for this PR but should be addressed in follow-up work:
1. Add Missing Charts to YAML Configs
The skipped Cypress tests depend on specific charts that were created by the old Python code but aren't in the YAML configs yet:
tabs.test.tstabs.test.tstabs.test.ts,link.test.tsAdhocMetrics.test.ts,advanced_analytics.test.tstabs.test.tstabs.test.ts2. Convert Tabbed Dashboard to YAML
The
tabbed_dashboard.pycreates a special dashboard for testing tab navigation. This should be converted to YAML format with all required charts.3. Apply Dynamic ID Lookup Pattern to More Tests
The pattern introduced in this PR (
getDatasetId(),getChartId()) can be applied to other tests that may have hardcoded IDs, making them more resilient to changes in example data.4. Remove Remaining Python Example Loaders
A few Python modules remain for backwards compatibility (
birth_names.py,world_bank.py). Once the Cypress tests are fully migrated, these can be removed.5. Deprecate Pre-built examples.duckdb
The pre-built
examples.duckdbfile in the apache-superset/examples-data repo is no longer used. It can be removed or marked as deprecated in a follow-up.