Skip to content

Conversation

@readl1
Copy link

@readl1 readl1 commented Jun 26, 2025

  • Add 10 new write operation tools for complete metadata management
  • Create mutations.gql with DataHub GraphQL mutations
  • Implement tag management (create, add, remove, batch operations)
  • Add description update capabilities for entities and fields
  • Implement domain management (create, assign, unassign)
  • Add glossary term management (create, link, unlink)
  • Add owner management functionality
  • Support field-level operations with subResource parameters
  • Update README with comprehensive feature documentation
  • Transform from read-only to full read/write DataHub MCP server

Tools added:

  • create_tag, add_tags, remove_tag, batch_add_tags, batch_remove_tags
  • update_description
  • create_domain, set_domain, unset_domain
  • create_glossary_term, add_terms, remove_terms
  • add_owners

- Add 10 new write operation tools for complete metadata management
- Create mutations.gql with DataHub GraphQL mutations
- Implement tag management (create, add, remove, batch operations)
- Add description update capabilities for entities and fields
- Implement domain management (create, assign, unassign)
- Add glossary term management (create, link, unlink)
- Add owner management functionality
- Support field-level operations with subResource parameters
- Update README with comprehensive feature documentation
- Transform from read-only to full read/write DataHub MCP server

Tools added:
- create_tag, add_tags, remove_tag, batch_add_tags, batch_remove_tags
- update_description
- create_domain, set_domain, unset_domain
- create_glossary_term, add_terms, remove_terms
- add_owners
@hsheth2
Copy link
Contributor

hsheth2 commented Jul 1, 2025

@readl1 thanks for the PR!

My initial reaction is that 10 tools is a lot. In general, I've found that tool calling performance of LLMs can degrade after you pass ~10-20 tools total across all tool providers / MCP servers.

I wonder if we can simplify this down to a smaller set of interfaces that are more capable. For example, maybe all we need is this (with maybe a corresponding remove? or maybe remove is a bool?) and a couple examples of reasonable tool call setups.

def add_tag_or_term(tag_or_term: TagUrn | GlossaryTermUrn, entities: List[Urn])

We don't need a 1:1 mapping between graphql mutations and tools. An important role of the MCP server is to further simplify the interfaces to LLMs and prune down to exactly the set of tools we need. It also is to have some taste and filter down the set of tools we provide.

Less confident in this - but imo create term (and probably create tag too) is an important/complex enough operation that we do probably want people to do it from the datahub UI

The code looks pretty repetitive - any way we can simplify / refactor it to be a bit more streamlined?

Finally - we need some mechanism for people to be able to choose whether or not they want mutations to be allowed. I've been considering this idea of "toolsets" - e.g. you call mcp-server-datahub --toolset discovery and it enables the right subset of tools (e.g. mostly the read-only ones). What the toolsets should be and what tools go in each one is not fully clear to me yet, but it feels like the right direction in my mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants