Skip to content

Conversation

aravindsriraj
Copy link

@aravindsriraj aravindsriraj commented Sep 12, 2025

This pull request introduces a new unified documentation tool for Atlan documentation access, enhances security for documentation retrieval, and updates dependencies to support these features. The changes include both backend implementation and documentation updates to describe the new tool and its usage.

Key changes:

Documentation Tool Addition and Security

  • Added a new async documentation_tool to the MCP server, providing unified access to Atlan documentation with actions for listing sources and fetching content, including domain validation for security. [1] [2]
  • Implemented modelcontextprotocol/tools/docs.py with a DocumentationManager that manages documentation sources, enforces allowed domains, and handles secure fetching and error reporting.
  • Exposed list_doc_sources and fetch_documentation in the tools module and server imports for use by the MCP server. [1] [2] [3]

Documentation and Tooling Updates

  • Updated README.md to document the new "Documentation Tools" section, including usage instructions, available actions, and security features. Also updated the tool access control section to include the new documentation tool. [1] [2] [3] [4]

Dependency and Build Updates

  • Updated dependencies in pyproject.toml to require fastmcp==2.12.3 and add httpx for async HTTP requests. Added a dev dependency group for pre-commit. [1] [2]

… guidance

- Add new llms_txt.py module with domain-secured documentation fetching
- Implement list_doc_sources, fetch_llms_txt_content, fetch_documentation, and add_doc_source functions
- Add MCP tool definitions in server.py with clear 'WHEN TO USE' guidance for clients
- Update tool descriptions to guide MCP clients (Claude, Cursor) on proper usage for Atlan product questions
- Add httpx dependency for async HTTP requests
- Configure default Atlan documentation source with domain security
- All examples updated to reflect Atlan-specific use cases
- Fix trailing whitespace in llms_txt.py and server.py
- Apply Ruff formatting and linting fixes
- All pre-commit checks now passing
- Increase HTTP timeout from 10s to 30s for documentation fetching
- Enable redirects for documentation URLs
- Improve HTTP exception handling with specific error types
- Fix trailing comma syntax in HTTP client configuration
- Remove unsupported request_timeout parameter from FastMCP initialization
- Add Documentation Tools section to Available Tools table
- Include list_doc_sources, fetch_llms_txt, fetch_docs, and add_doc_source tools
- Update Tool Access Control section with documentation tool restrictions
- Organize tools into Asset Management and Documentation categories
- Add usage guidance for documentation tools with domain security notes
@rahul-madaan
Copy link
Contributor

I believe we should try merging all the docs related functions into a single tool rather than having 4 tools here.

- Combine list_doc_sources, fetch_llms_txt, fetch_docs, and add_doc_source into one tool
- Add action-based interface with list_sources, fetch_index, fetch_content, add_source actions
- Simplify API from 4 tools to 1 unified tool while maintaining all functionality
- Update README documentation to reflect single documentation tool
- Streamline tool access control configuration
- Tested successfully with streamable HTTP transport
- Remove add_source action to restrict tool to Atlan documentation only
- Eliminate llms_txt_url and allowed_domains parameters
- Update documentation to reflect Atlan-only focus
- Simplify available actions to list_sources, fetch_index, fetch_content
- Prevent customers from adding external documentation sources
@aravindsriraj
Copy link
Author

@rahul-madaan I've combined all tools into a single doc tool. Thanks

- Remove add_source() method from LLMSTxtManager class
- Remove add_doc_source() function from docs.py
- Update tools/__init__.py imports to exclude add_doc_source
- Remove add_doc_source from __all__ exports
- Ensure Atlan-only documentation access with no external source addition
- Successfully tested with streamable HTTP transport
APIs, integrations, or need any documentation-related assistance.
This powerful unified tool handles all Atlan documentation operations through different actions:
- list_sources: Discover available Atlan documentation sources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just add list sources ? Considering our llms.txt is at the default path we can return this path and it will automatically fetch it and use it. Do we necessarily need to fetch it in out source code and return it?

Copy link
Author

@aravindsriraj aravindsriraj Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But some mcp clients can't fetch content from urls like claude does

…action

Remove complex llms.txt parsing, update FastMCP to 2.12.3, reduce timeouts for better MCP compatibility
Comment on lines 74 to 76
async def _fetch_content(self, url: str) -> str:
"""Fetch content from URL with proper error handling."""
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with @firecast, we can just give out the url and cursor or claude should be able to fetch the relevant context from them.

there is only single source, and we anyways give out all the links in the llms.txt, instead we give it the llms.txt endpoint itself -> let agents do the job.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know claude can fetch content from urls natively. Does cursor can do that too? @Hk669

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's possible in other mcp clients

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hk669 @firecast Just checked. Most of the mcp clients like claude, cursor, copilot and windsurf can fetch content from urls natively. But if the user wants to use the server in a custom mcp client, fetch_content() would be useful in that case. Let me know the next steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets let clients fetch the url. I don't think we should do that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@firecast I've removed fetch_content. Claude is not fetching the llms.txt endpoint and started to do a web search instead. It happens everytime

@aravindsriraj aravindsriraj requested a review from Hk669 September 19, 2025 03:02
Clients should fetch content directly via provided llms.txt URLs
Align with supported version for Atlan MCP server
- Remove beautifulsoup4 dependency, use markdownify directly
- Add comprehensive JavaScript and navigation cleanup
- Achieve 88% token reduction while preserving content links
- Improve security with better domain validation
- Clean up unnecessary return fields for API response
- Update README.md with correct documentation tool information
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants