feat(ingestion): add MicroStrategy connector#16992
feat(ingestion): add MicroStrategy connector#16992brock-acryl wants to merge 22 commits intomasterfrom
Conversation
…hance client functionality - Introduced comprehensive documentation for the MicroStrategy connector, detailing capabilities, prerequisites, installation, and configuration options. - Updated `MicroStrategyClient` to require `project_id` for dashboard definitions, ensuring accurate API requests. - Enhanced validation in `MicroStrategyConnectionConfig` to enforce authentication rules for anonymous and credential modes. - Improved metadata extraction logic in `MicroStrategySource` to include project context for dashboards and reports. - Added integration test setup instructions for real data scenarios, including options for trial instances and mock servers.
…or improved metadata extraction - Added detailed concept mapping and lineage extraction information to the MicroStrategy connector documentation. - Introduced new configuration options in `MicroStrategyConfig` for filtering cubes, dashboards, reports, and datasets to optimize API calls. - Implemented error handling for unavailable projects in `MicroStrategyClient`, raising specific exceptions for better debugging. - Enhanced metadata extraction logic to support column-level lineage and warehouse lineage using SQL parsing. - Updated unit tests to validate new configuration options and error handling mechanisms.
… client - Implemented `get_attribute_expression` and `get_metric_expression` methods in `MicroStrategyClient` to fetch human-readable expressions for attributes and metrics. - Introduced `include_field_formulas` configuration option in `MicroStrategyConfig` to control the fetching of attribute and metric expressions. - Enhanced `MicroStrategySource` to utilize the new expression retrieval methods for input fields when `include_report_definitions` is enabled. - Updated input field construction to include expressions as field descriptions, improving metadata visibility.
…eage processing - Removed the synthetic __datasource entity for legacy documents, allowing each chart stub to directly reference its specific warehouse tables. - Updated the logic for emitting embedded chart stubs to utilize per-dataset warehouse lineage, enhancing clarity and accuracy in lineage representation. - Simplified the dashboard metadata extraction process by eliminating unnecessary dataset linking, improving overall performance and maintainability.
…tegy entity configuration - Updated the version of the data-platforms configuration from v6 to v7. - Added configuration for the MicroStrategy data platform, including entity URN, type, aspect name, and logo URL.
…s and improvements - Updated documentation to clarify support for domains, now marked as not supported and requiring manual configuration post-ingestion. - Added new configuration options for filtering cubes, dashboards, reports, and datasets to optimize API calls and reduce unnecessary processing. - Introduced a context manager in `MicroStrategyClient` for managing project headers during API requests, improving code clarity and reliability. - Enhanced error handling in the client to provide more informative logging for authentication and permission errors. - Added integration tests to validate the functionality of the MicroStrategy connector, ensuring comprehensive coverage of entity types and lineage extraction. - Removed deprecated subtype mappings and streamlined the configuration for better maintainability.
…re-fetching - Introduced a new configuration option `max_workers` to control the maximum number of threads for pre-fetching cube metadata, improving ingestion performance. - Updated the MicroStrategy client to utilize per-request headers for project IDs, allowing concurrent API calls without session header conflicts. - Enhanced the cube pre-fetching logic to fetch SQL views and schema data in parallel, reducing overall ingestion time. - Improved documentation to reflect the new configuration and its implications for API rate limits and debugging.
…ility - Changed the return type of the `_deepcopy_wrapper` function from `ExpressionCore` to `Expr` to align with the updated sqlglot library definitions. - Ensured compatibility with cooperative timeout support in the deepcopy implementation.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
🔴 Meticulous spotted visual differences in 566 of 1422 screens tested: view and approve differences detected. Meticulous evaluated ~8 hours of user flows against your PR. Last updated for commit |
The microstrategy source was registered in setup.py but missing from the generated pyproject.toml, causing the checkLockFile CI task to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bundle ReportChanges will increase total bundle size by 6.75kB (0.03%) ⬆️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
|
…ting
- Add `microstrategy` extras to setup.py (usage_common | sqlglot_lib) so
validate-plugin-deps can install and import the plugin correctly; sqlparse
was missing from the install because the extras key was absent entirely
- Regenerate pyproject.toml and uv.lock via updateLockFile
- Rename docs/sources/microstrategy/microstrategy.md →
microstrategy_pre.md to satisfy docGen naming convention (must be
README.md or <plugin>_{pre,post}.md)
- Run ruff format on tests/unit/test_microstrategy_source.py (Would reformat)
- Run mdPrettierWrite to format README.md and microstrategy_pre.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… file - Fix `_request()` return type to `Any` to resolve 13+ mypy return-value errors - Replace `List[Aspect]` TypeVar usage with `List[Any]` (TypeVar unbound outside generics) - Fix `add_observed_query` to use `ObservedQuery(...)` dataclass instead of kwargs - Call `.as_workunit()` on `gen_metadata()` output (returns MCPW, not MetadataWorkUnit) - Add `Dict[str, Any]` annotation to `_MSTR_TYPE_MAP` to fix arg-type error - Add `# type: ignore[method-assign]` to 24 MagicMock assignments in tests - Fix `client._base_url` → `client.base_url` (correct attribute name) - Add `# type: ignore[union-attr]` to entity.urn and as_workunits() test calls - Rewrite microstrategy_pre.md with H3 baseline (H2 is disallowed in _pre.md) - Create README.md with required Overview + Concept Mapping sections - Create microstrategy_post.md with Capabilities, Limitations, Troubleshooting - Create microstrategy_recipe.yml with minimal working example config - Regenerate microstrategy_mces_golden.json against live demo.microstrategy.com Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rror visibility, dead code - Add `get_workunit_processors()` override to wire `StaleEntityRemovalHandler` into the pipeline (was created but never invoked — deletion detection was broken) - Add `MicroStrategySourceReport` with `report_dropped()` for pattern-filtered folders, dashboards, reports, and cubes - Promote registry lookup failures and report definition fetch failures from DEBUG to WARNING and emit `report_warning()` for operator visibility - Add `aggregator.close()` in `finally` block in `_emit_column_lineage_from_sql` - Replace fragile `assert project_id is not None` guards with explicit `raise ValueError` - Fix `_MSTR_TYPE_MAP` annotation from `Dict[str, Any]` to `Dict[str, Callable[[], Any]]` - Delete five unused documentation-style classes from `constants.py` (~117 lines) - Narrow `_response_json_dict` except to `(ValueError, JSONDecodeError)` in client - Add explanatory comment to `_request() -> Any` return type in client - Promote warehouse platform detection fallback messages from DEBUG to INFO Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s_ingestion Mock was patching `get_cube_schema` but production code calls `get_cube()`. The mock never fired — the test was exercising the happy path in disguise and silently passing without testing the failure-recovery path it claimed to cover. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…solidate constants, remove dead client methods - Extract `_AttrFormInfo` NamedTuple and `_iter_attr_forms()` static method to eliminate ~50 lines of parallel attribute-form iteration logic that existed identically in both `_build_input_fields` and `_build_cube_schema_metadata` - Move all iServer error codes and dossier subtype constants from source.py local definitions into constants.py as the single source of truth; import them in source.py to remove the duplicate `ISERVER_PROJECT_UNAVAILABLE` - Delete five unused dead methods from client.py: `get_dashboard_definition` (compatibility shim), `get_model_cube`, `get_model_tables`, `get_model_facts`, `get_lineage_for_object` (all superseded by the sqlView approach) - Update test that tested the deleted `get_dashboard_definition` shim to directly test `get_dossier_definition` (the underlying implementation) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unused `DomainRegistry` import and instantiation — `domain_registry` was created in `__init__` but `get_domain_urn()` was never called anywhere; the DOMAINS capability is correctly annotated `supported=False` - Add `MicroStrategyClient.close()` to release the underlying `requests.Session` connection pool after ingestion completes - Override `MicroStrategySource.close()` to call `client.close()` then `super().close()`, ensuring the HTTP connection pool is always released Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rations catalog - Regenerate datahub.json with microstrategy connector entry (56 lines) containing capabilities, platform name, and support status - Remove api_connector=true flag from microstrategy in integrations_catalog.json; the flag is reserved for third-party API connectors, not native DataHub source plugins, and its presence caused docgen.py to crash with KeyError: 'microstrategy' Fixes CI failures: - ci (3.10/3.11/3.12, testQuick): "Check autogenerated JSON files are up-to-date" - gh-pages: "Build Docs" (KeyError: 'microstrategy' in docgen.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Connector Tests ResultsAll connector tests passed for commit To skip connector tests, add the Autogenerated by the connector-tests CI pipeline. |
…se, fix container hierarchy
**Thread-safe client**
- Replace _project_context/session.headers mutation with per-request extra_headers
- Add threading.Lock for token refresh serialization
- Route DELETE methods through _request() for token refresh safety
- Remove 500 from retry status_forcelist (MSTR 500s are permanent app errors)
**Parallel prefetch for dashboards, reports, and expression cache**
- Parallel dashboard definition + warehouse SQL fetch via ThreadPoolExecutor
- Parallel report definition + warehouse lineage fetch
- Pre-warm expression cache for field formulas before entity processing
**Auto-detect warehouse database and schema from connection strings**
- Fetch connection strings via GET /api/datasources/connections/{id}
- Parse DATABASE/db/schema params from JDBC/ODBC connection strings
- warehouse_lineage_database and warehouse_lineage_schema now optional overrides
- Fix _qualify_table_name to handle 2-part names (prepend database for Snowflake)
**Fix container hierarchy for SDK V2 entities**
- Pass ContainerKey directly to parent_container instead of .as_urn() string
- Fixes missing container aspect and empty browsePathsV2 on dashboards/charts/datasets
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix type annotation in TestQualifyTableName._make_source to satisfy mypy. Regenerate live integration golden file against demo instance to reflect container hierarchy and browse path changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… lineage When a DataHub graph is available, try original-case and lowercase URN variants against the catalog and return whichever actually exists (same strategy as the SQL schema resolver). When the graph is unavailable, fall back to the new convert_lineage_urns_to_lowercase config flag (default True) so URNs match warehouse-ingested assets like Snowflake. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Key Features
Metadata coverage:
Dashboardentities with embedded chart stubs and ownershipChartentities with report-type subtypes and cube/dataset lineageDatasetentities with schema (attributes + metrics), SQL view definition, and warehouse upstream lineageDatasetentitiesLineage:
dataSource.idresolutionGET /api/v2/cubes/{id}/sqlViewSQL parsing (Snowflake, MySQL, Teradata, bare quoting styles)SqlParsingAggregator; warehouse platform auto-detected from/api/datasourceswith four-tier fallbackTest plan
tests/unit/test_microstrategy_source.pycover: config validation, project/folder/dashboard/report/cube/dataset pattern filtering, warehouse platform detection tiers, cube schema extraction, ownership extraction, cross-project lineage registry resolution, error handling continuity, and API call reduction flags (63 tests)tests/integration/microstrategy/test_microstrategy_mock.pycover full end-to-end ingest against a mocked REST API with golden file comparison for both standard and warehouse-lineage configurations (2 tests)pytest tests/unit/test_microstrategy_source.py tests/integration/microstrategy/test_microstrategy_mock.pyruff check+ruff formatpass with no errorsdocs/sources/microstrategy/microstrategy.mdwith all 35 config fields documented