feat(ingestion): add MicroStrategy connector by brock-acryl · Pull Request #16992 · datahub-project/datahub

brock-acryl · 2026-04-11T03:32:48Z

Summary

Adds a new DataHub ingestion connector for MicroStrategy, supporting metadata extraction via the MicroStrategy REST API
Ingests projects, folders, dashboards (dossiers), reports, Intelligent Cubes, and Library datasets as DataHub containers, dashboards, charts, and datasets with full SDK V2 entity emission
Extracts lineage from cubes to dashboards/reports and optionally from physical warehouse tables to cubes via SQL parsing, including column-level lineage
Ships with stateful ingestion support for automatic stale entity removal when objects are deleted in MicroStrategy

Key Features

Metadata coverage:

Projects → DataHub containers; folders → nested sub-containers
Dashboards/dossiers → Dashboard entities with embedded chart stubs and ownership
Reports → Chart entities with report-type subtypes and cube/dataset lineage
Intelligent Cubes → Dataset entities with schema (attributes + metrics), SQL view definition, and warehouse upstream lineage
Library datasets → Dataset entities

Lineage:

Report → Cube and Dashboard → Cube lineage via dataSource.id resolution
Warehouse table → Cube lineage via GET /api/v2/cubes/{id}/sqlView SQL parsing (Snowflake, MySQL, Teradata, bare quoting styles)
Column-level lineage via SqlParsingAggregator; warehouse platform auto-detected from /api/datasources with four-tier fallback

Test plan

Unit tests in tests/unit/test_microstrategy_source.py cover: config validation, project/folder/dashboard/report/cube/dataset pattern filtering, warehouse platform detection tiers, cube schema extraction, ownership extraction, cross-project lineage registry resolution, error handling continuity, and API call reduction flags (63 tests)
Integration tests in tests/integration/microstrategy/test_microstrategy_mock.py cover full end-to-end ingest against a mocked REST API with golden file comparison for both standard and warehouse-lineage configurations (2 tests)
All 65 tests pass: pytest tests/unit/test_microstrategy_source.py tests/integration/microstrategy/test_microstrategy_mock.py
Lint clean: ruff check + ruff format pass with no errors
Documentation added at docs/sources/microstrategy/microstrategy.md with all 35 config fields documented

…hance client functionality - Introduced comprehensive documentation for the MicroStrategy connector, detailing capabilities, prerequisites, installation, and configuration options. - Updated `MicroStrategyClient` to require `project_id` for dashboard definitions, ensuring accurate API requests. - Enhanced validation in `MicroStrategyConnectionConfig` to enforce authentication rules for anonymous and credential modes. - Improved metadata extraction logic in `MicroStrategySource` to include project context for dashboards and reports. - Added integration test setup instructions for real data scenarios, including options for trial instances and mock servers.

…or improved metadata extraction - Added detailed concept mapping and lineage extraction information to the MicroStrategy connector documentation. - Introduced new configuration options in `MicroStrategyConfig` for filtering cubes, dashboards, reports, and datasets to optimize API calls. - Implemented error handling for unavailable projects in `MicroStrategyClient`, raising specific exceptions for better debugging. - Enhanced metadata extraction logic to support column-level lineage and warehouse lineage using SQL parsing. - Updated unit tests to validate new configuration options and error handling mechanisms.

… client - Implemented `get_attribute_expression` and `get_metric_expression` methods in `MicroStrategyClient` to fetch human-readable expressions for attributes and metrics. - Introduced `include_field_formulas` configuration option in `MicroStrategyConfig` to control the fetching of attribute and metric expressions. - Enhanced `MicroStrategySource` to utilize the new expression retrieval methods for input fields when `include_report_definitions` is enabled. - Updated input field construction to include expressions as field descriptions, improving metadata visibility.

…eage processing - Removed the synthetic __datasource entity for legacy documents, allowing each chart stub to directly reference its specific warehouse tables. - Updated the logic for emitting embedded chart stubs to utilize per-dataset warehouse lineage, enhancing clarity and accuracy in lineage representation. - Simplified the dashboard metadata extraction process by eliminating unnecessary dataset linking, improving overall performance and maintainability.

…tegy entity configuration - Updated the version of the data-platforms configuration from v6 to v7. - Added configuration for the MicroStrategy data platform, including entity URN, type, aspect name, and logo URL.

…s and improvements - Updated documentation to clarify support for domains, now marked as not supported and requiring manual configuration post-ingestion. - Added new configuration options for filtering cubes, dashboards, reports, and datasets to optimize API calls and reduce unnecessary processing. - Introduced a context manager in `MicroStrategyClient` for managing project headers during API requests, improving code clarity and reliability. - Enhanced error handling in the client to provide more informative logging for authentication and permission errors. - Added integration tests to validate the functionality of the MicroStrategy connector, ensuring comprehensive coverage of entity types and lineage extraction. - Removed deprecated subtype mappings and streamlined the configuration for better maintainability.

…re-fetching - Introduced a new configuration option `max_workers` to control the maximum number of threads for pre-fetching cube metadata, improving ingestion performance. - Updated the MicroStrategy client to utilize per-request headers for project IDs, allowing concurrent API calls without session header conflicts. - Enhanced the cube pre-fetching logic to fetch SQL views and schema data in parallel, reducing overall ingestion time. - Improved documentation to reflect the new configuration and its implications for API rate limits and debugging.

…ility - Changed the return type of the `_deepcopy_wrapper` function from `ExpressionCore` to `Expr` to align with the updated sqlglot library definitions. - Ensured compatibility with cooperative timeout support in the deepcopy implementation.

codecov · 2026-04-11T12:28:24Z

Codecov Report

❌ Patch coverage is 61.42558% with 184 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...c/datahub/ingestion/source/microstrategy/client.py	54.61%	182 Missing ⚠️
...c/datahub/ingestion/source/microstrategy/config.py	96.61%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

alwaysmeticulous · 2026-04-11T12:30:59Z

🔴 Meticulous spotted visual differences in 566 of 1422 screens tested: view and approve differences detected.

Meticulous evaluated ~8 hours of user flows against your PR.

_{Last updated for commit 4c49151 feat(microstrategy): case-aware upstream URN resolution for warehouse li.... This comment will update as new commits are pushed.}

The microstrategy source was registered in setup.py but missing from the generated pyproject.toml, causing the checkLockFile CI task to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-04-11T14:13:21Z

Bundle Report

Changes will increase total bundle size by 6.75kB (0.03%) ⬆️. This is within the configured threshold ✅

Detailed changes

Bundle name	Size	Change
datahub-react-web-esm	22.74MB	6.75kB (0.03%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name	Size Change	Total Size	Change (%)
`assets/index-*.js`	6.75kB	12.49MB	0.05%

…ting - Add `microstrategy` extras to setup.py (usage_common | sqlglot_lib) so validate-plugin-deps can install and import the plugin correctly; sqlparse was missing from the install because the extras key was absent entirely - Regenerate pyproject.toml and uv.lock via updateLockFile - Rename docs/sources/microstrategy/microstrategy.md → microstrategy_pre.md to satisfy docGen naming convention (must be README.md or <plugin>_{pre,post}.md) - Run ruff format on tests/unit/test_microstrategy_source.py (Would reformat) - Run mdPrettierWrite to format README.md and microstrategy_pre.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… file - Fix `_request()` return type to `Any` to resolve 13+ mypy return-value errors - Replace `List[Aspect]` TypeVar usage with `List[Any]` (TypeVar unbound outside generics) - Fix `add_observed_query` to use `ObservedQuery(...)` dataclass instead of kwargs - Call `.as_workunit()` on `gen_metadata()` output (returns MCPW, not MetadataWorkUnit) - Add `Dict[str, Any]` annotation to `_MSTR_TYPE_MAP` to fix arg-type error - Add `# type: ignore[method-assign]` to 24 MagicMock assignments in tests - Fix `client._base_url` → `client.base_url` (correct attribute name) - Add `# type: ignore[union-attr]` to entity.urn and as_workunits() test calls - Rewrite microstrategy_pre.md with H3 baseline (H2 is disallowed in _pre.md) - Create README.md with required Overview + Concept Mapping sections - Create microstrategy_post.md with Capabilities, Limitations, Troubleshooting - Create microstrategy_recipe.yml with minimal working example config - Regenerate microstrategy_mces_golden.json against live demo.microstrategy.com Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rror visibility, dead code - Add `get_workunit_processors()` override to wire `StaleEntityRemovalHandler` into the pipeline (was created but never invoked — deletion detection was broken) - Add `MicroStrategySourceReport` with `report_dropped()` for pattern-filtered folders, dashboards, reports, and cubes - Promote registry lookup failures and report definition fetch failures from DEBUG to WARNING and emit `report_warning()` for operator visibility - Add `aggregator.close()` in `finally` block in `_emit_column_lineage_from_sql` - Replace fragile `assert project_id is not None` guards with explicit `raise ValueError` - Fix `_MSTR_TYPE_MAP` annotation from `Dict[str, Any]` to `Dict[str, Callable[[], Any]]` - Delete five unused documentation-style classes from `constants.py` (~117 lines) - Narrow `_response_json_dict` except to `(ValueError, JSONDecodeError)` in client - Add explanatory comment to `_request() -> Any` return type in client - Promote warehouse platform detection fallback messages from DEBUG to INFO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s_ingestion Mock was patching `get_cube_schema` but production code calls `get_cube()`. The mock never fired — the test was exercising the happy path in disguise and silently passing without testing the failure-recovery path it claimed to cover. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…solidate constants, remove dead client methods - Extract `_AttrFormInfo` NamedTuple and `_iter_attr_forms()` static method to eliminate ~50 lines of parallel attribute-form iteration logic that existed identically in both `_build_input_fields` and `_build_cube_schema_metadata` - Move all iServer error codes and dossier subtype constants from source.py local definitions into constants.py as the single source of truth; import them in source.py to remove the duplicate `ISERVER_PROJECT_UNAVAILABLE` - Delete five unused dead methods from client.py: `get_dashboard_definition` (compatibility shim), `get_model_cube`, `get_model_tables`, `get_model_facts`, `get_lineage_for_object` (all superseded by the sqlView approach) - Update test that tested the deleted `get_dashboard_definition` shim to directly test `get_dossier_definition` (the underlying implementation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove unused `DomainRegistry` import and instantiation — `domain_registry` was created in `__init__` but `get_domain_urn()` was never called anywhere; the DOMAINS capability is correctly annotated `supported=False` - Add `MicroStrategyClient.close()` to release the underlying `requests.Session` connection pool after ingestion completes - Override `MicroStrategySource.close()` to call `client.close()` then `super().close()`, ensuring the HTTP connection pool is always released Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rations catalog - Regenerate datahub.json with microstrategy connector entry (56 lines) containing capabilities, platform name, and support status - Remove api_connector=true flag from microstrategy in integrations_catalog.json; the flag is reserved for third-party API connectors, not native DataHub source plugins, and its presence caused docgen.py to crash with KeyError: 'microstrategy' Fixes CI failures: - ci (3.10/3.11/3.12, testQuick): "Check autogenerated JSON files are up-to-date" - gh-pages: "Build Docs" (KeyError: 'microstrategy' in docgen.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

datahub-connector-tests · 2026-04-12T03:31:34Z

Connector Tests Results

All connector tests passed for commit 4c49151

View full test logs →

To skip connector tests, add the skip-connector-tests label (org members only).

Autogenerated by the connector-tests CI pipeline.

…se, fix container hierarchy **Thread-safe client** - Replace _project_context/session.headers mutation with per-request extra_headers - Add threading.Lock for token refresh serialization - Route DELETE methods through _request() for token refresh safety - Remove 500 from retry status_forcelist (MSTR 500s are permanent app errors) **Parallel prefetch for dashboards, reports, and expression cache** - Parallel dashboard definition + warehouse SQL fetch via ThreadPoolExecutor - Parallel report definition + warehouse lineage fetch - Pre-warm expression cache for field formulas before entity processing **Auto-detect warehouse database and schema from connection strings** - Fetch connection strings via GET /api/datasources/connections/{id} - Parse DATABASE/db/schema params from JDBC/ODBC connection strings - warehouse_lineage_database and warehouse_lineage_schema now optional overrides - Fix _qualify_table_name to handle 2-part names (prepend database for Snowflake) **Fix container hierarchy for SDK V2 entities** - Pass ContainerKey directly to parent_container instead of .as_urn() string - Fixes missing container aspect and empty browsePathsV2 on dashboards/charts/datasets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix type annotation in TestQualifyTableName._make_source to satisfy mypy. Regenerate live integration golden file against demo instance to reflect container hierarchy and browse path changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… lineage When a DataHub graph is available, try original-case and lowercase URN variants against the catalog and return whichever actually exists (same strategy as the SQL schema resolver). When the graph is unavailable, fall back to the new convert_lineage_urns_to_lowercase config flag (default True) so URNs match warehouse-ingested assets like Snowflake. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

brock-acryl added 9 commits March 16, 2026 12:58

(feat) Microstrategy connector

c9ac74b

proper source mappings

40ba9d2

github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Apr 11, 2026

vercel bot had a problem deploying to Preview April 11, 2026 03:35 Failure

brock-acryl added 2 commits April 11, 2026 08:24

Merge branch 'master' into feat-ingestion-microstrategy

b39ec47

github-actions bot deployed to datahub-project-web-react (Preview) April 11, 2026 12:30 View deployment

vercel bot had a problem deploying to Preview April 11, 2026 12:33 Failure

fix(microstrategy): register MicroStrategy entry point in pyproject.toml

7d3b868

The microstrategy source was registered in setup.py but missing from the generated pyproject.toml, causing the checkLockFile CI task to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions bot deployed to datahub-wheels (Preview) April 11, 2026 13:04 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 11, 2026 13:07 View deployment

vercel bot had a problem deploying to Preview April 11, 2026 13:10 Failure

github-actions bot deployed to datahub-wheels (Preview) April 11, 2026 16:27 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 11, 2026 16:30 View deployment

vercel bot had a problem deploying to Preview April 11, 2026 16:33 Failure

github-actions bot deployed to datahub-wheels (Preview) April 12, 2026 01:51 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 12, 2026 01:54 View deployment

vercel bot had a problem deploying to Preview April 12, 2026 01:57 Failure

github-actions bot requested a deployment to datahub-wheels (Preview) April 12, 2026 02:13 Abandoned

github-actions bot deployed to datahub-wheels (Preview) April 12, 2026 02:15 View deployment

github-actions bot deployed to datahub-wheels (Preview) April 12, 2026 02:19 View deployment

github-actions bot deployed to datahub-wheels (Preview) April 12, 2026 02:21 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 12, 2026 02:24 View deployment

vercel bot had a problem deploying to Preview April 12, 2026 02:27 Failure

github-actions bot deployed to datahub-wheels (Preview) April 12, 2026 02:51 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 12, 2026 02:54 View deployment

vercel bot deployed to Preview April 12, 2026 03:04 View deployment

github-actions bot deployed to datahub-wheels (Preview) April 13, 2026 22:11 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 13, 2026 22:13 View deployment

vercel bot deployed to Preview April 13, 2026 22:24 View deployment

github-actions bot deployed to datahub-wheels (Preview) April 14, 2026 02:11 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 14, 2026 02:14 View deployment

vercel bot deployed to Preview April 14, 2026 02:25 View deployment

github-actions bot deployed to datahub-wheels (Preview) April 14, 2026 02:47 View deployment

github-actions bot deployed to datahub-project-web-react (Preview) April 14, 2026 02:50 View deployment

vercel bot deployed to Preview April 14, 2026 03:00 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingestion): add MicroStrategy connector#16992

feat(ingestion): add MicroStrategy connector#16992
brock-acryl wants to merge 22 commits intomasterfrom
feat-ingestion-microstrategy

brock-acryl commented Apr 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

alwaysmeticulous bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 11, 2026 •

edited

Loading

Assets Changed:

Uh oh!

datahub-connector-tests bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brock-acryl commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Test plan

Uh oh!

codecov bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alwaysmeticulous bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bundle Report

Affected Assets, Files, and Routes:

Assets Changed:

Uh oh!

datahub-connector-tests bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Connector Tests Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

brock-acryl commented Apr 11, 2026 •

edited

Loading

codecov bot commented Apr 11, 2026 •

edited

Loading

alwaysmeticulous bot commented Apr 11, 2026 •

edited

Loading

codecov bot commented Apr 11, 2026 •

edited

Loading

datahub-connector-tests bot commented Apr 12, 2026 •

edited

Loading