Releases
v0.4.0
new mcp tools and other improvements
Latest
Compare
Sorry, something went wrong.
No results found
Response Token Budget Management
New TokenCountEstimator class for fast token counting using character-based heuristics
Automatic result truncation via _select_results_within_budget() to prevent context window issues
Configurable token limits :
TOOL_RESPONSE_TOKEN_LIMIT environment variable (default: 80,000 tokens)
ENTITY_SCHEMA_TOKEN_BUDGET environment variable (default: 16,000 tokens per entity)
90% safety buffer to account for token estimation inaccuracies
Ensures at least one result is always returned
Enhanced Search Capabilities
Enhanced Keyword Search :
Supports pagination with start parameter
Added viewUrn for view-based filtering
Added sortInput for custom sorting
Query Entity Support
Native QueryEntity type support (SQL queries as first-class entities)
New query_entity.gql GraphQL query
Optimized entity retrieval with specialized query for QueryEntity types
Includes query statement, subjects (datasets/fields), and platform information
GraphQL Compatibility
Adaptive field detection for newer GMS versions
Caching mechanism for GMS version detection
Graceful fallback when newer fields aren't available
Support for #[CLOUD] and #[NEWER_GMS] conditional field markers
DISABLE_NEWER_GMS_FIELD_DETECTION environment variable override
Schema Field Optimization
Smart field prioritization to stay within token budgets:
Primary key fields (isPartOfKey=true)
Partitioning key fields (isPartitioningKey=true)
Fields with descriptions
Fields with tags or glossary terms
Alphabetically by field path
Generator-based approach for memory efficiency
Error Handling & Security
Enhanced error logging with full stack traces in async_background wrapper
Logs function name, args, and kwargs on failures
ReDoS protection in HTML sanitization with bounded regex patterns
Query truncation function (configurable via QUERY_LENGTH_HARD_LIMIT, default: 5,000 chars)
Default Views Support
Automatic default view application for all search operations
Fetches organization's default global view from DataHub
5-minute caching (configurable via VIEW_CACHE_TTL_SECONDS)
Can be disabled via DATAHUB_MCP_DISABLE_DEFAULT_VIEW environment variable
Ensures search results respect organization's data governance policies
Dependencies
Added cachetools>=5.0.0: For GMS field detection caching
Added types-cachetools (dev): Type stubs for mypy
Performance
Memory efficiency : Generator-based result selection avoids loading all results into memory
Caching : GMS version detection cached per graph instance
Fast token estimation : Character-based heuristic (no tokenizer overhead)
Smart truncation : Truncates less important schema fields first
You can’t perform that action at this time.