Skip to content

Conversation

@jmchilton
Copy link
Member

Empower Developers to Build Rich, Automated Tutorials

This PR introduces a "Stories" feature that automatically generates visual documentation from Selenium & Playwright tests and standalone scripts built on galaxy-selenium. Stories interleave screenshots with markdown narrative to create tutorial-quality documentation in markdown, HTML, PDF, and zip formats. The same automation code now serves dual purpose: validating Galaxy functionality through tests while generating tutorials. This approach maintains a single source of truth for both testing and documentation.

Key Features

Dual-Purpose Automation

  • Test documentation: Generate visual test reports showing what tests verify
  • User tutorials: Create user-facing guides from the same automation code
  • Developer docs: Document workflows and features with actual screenshots

Story API

Three new methods added to browser context:

  • screenshot(label, caption=None) - Take screenshot with optional caption for story
  • document(markdown_content) - Add markdown narrative between actions
  • document_file(file_path, caption=None) - Include file contents in story with syntax highlighting

Output Formats

Stories automatically generate four artifact types:

  • story.md - Markdown source with embedded screenshots
  • story.html - Self-contained HTML with styling
  • story.pdf - Publication-ready PDF (requires weasyprint)
  • {story_name}.zip - Complete archive of all artifacts

Clean Architecture

Split across two packages for maximum reusability:

  • packages/selenium: General-purpose story infrastructure (no test dependencies)
    • Reusable in tests, Jupyter notebooks, standalone tutorial scripts
    • Story class, NoopStory pattern, StoryProtocol typing
  • packages/test_selenium: Test framework integration
    • Wires stories into test lifecycle via selenium_test decorator
    • Enabled via GALAXY_TEST_STORIES_DIRECTORY environment variable

Examples Included

Test Suite: Workbook Import Stories

Added test_workbook_import.py with four test cases demonstrating the new workbook functionality from #20288:

  1. test_dataset_import_from_workbook - Import datasets with automatic column mapping
  2. test_simple_list_collection_import - Create list collections from workbooks
  3. test_list_of_pairs_collection_import - Import paired data with automatic pairing
  4. test_nested_list_of_pairs_import - Build complex nested collection structures

These tests serve dual purpose: validating the workbook import feature while generating tutorial documentation.

Run tests with stories:

export GALAXY_TEST_STORIES_DIRECTORY=/tmp/test_stories
pytest lib/galaxy_test/selenium/test_workbook_import.py -v
# Check /tmp/test_stories for generated markdown, HTML, PDF, zip

Tutorial Generator: Rule Builder Guide

Added generate_rule_builder_tutorial.py demonstrating standalone tutorial generation at scale. Generates a story with 10 different examples included and documented (the 4 new workbook examples + 6 older rules examples that power training material tutorials) by reusing test helper methods with added narrative context.

Generate tutorials:

cd packages/selenium
# Outline mode (fast, displays rule structure and builder but doesn't wait for uploads)
uv run python galaxy/selenium/scripts/generate_rule_builder_tutorial.py --galaxy_url http://localhost:8081/ --story-output ./rules_tutorial

# Full mode (complete with uploads and screenshots)
uv run python galaxy/selenium/scripts/generate_rule_builder_tutorial.py --galaxy_url http://localhost:8081/ --story-output ./rules_tutorial --perform-uploads  --timeout-multiplier 6
Screenshot 2025-10-29 at 2 44 54 PM Screenshot 2025-10-29 at 2 45 05 PM

These vs. Training Material

Despite Claude's flowery language and excitement - I think the Training Material remains the repository for much richer, more usable, better supported tutorials and this work will never supplant that - but it does make it really easy to build tutorial outlines that can be readily translated into training materials (a task that can be easily delegated to an agent once the Markdown and screenshots are in place) and it makes it really easy to regenerate all the screenshots for older tutorials generated from these stubs (again a task that could be easily delegated to an agent).

I have a plan to create tutorials for these test cases - this was the compare and contrast of the differences between what I have done here and what needs to be done for the training materials:

Selenium-generated documentation:

  • Purpose: Test validation + reference documentation
  • Audience: Developers and power users
  • Content: Technical explanations of what Galaxy detects and how
  • Screenshots: Focus on rule builder interface states

Final GTN tutorial (to be written):

  • Purpose: Teaching beginners how to use auto-detection
  • Audience: Scientists and researchers new to Galaxy
  • Content: Step-by-step instructions with biological context
  • Additional elements needed:
    • Clearer learning objectives and prerequisites
    • More biological context for the examples
    • Troubleshooting tips
    • Export template workbook demonstration
    • Links to intermediate/advanced tutorials
    • Exercises or challenges for practice

The selenium docs provide the foundation and validation, but the GTN tutorial needs additional pedagogy and narrative.

Implementation Highlights

NoopStory Pattern

Uses null object pattern instead of None checks for cleaner code and type safety:

if GALAXY_TEST_STORIES_DIRECTORY:
    self.story = Story(title, description, output_dir)
else:
    self.story = NoopStory()  # All methods are no-ops

Test Retry Support

Stories reset on test retry to avoid accumulating content from failed attempts:

def reset_driver_and_session(self):
    self.story.reset()  # Clear elements and counters
    # ... reset driver

Dual-Save Screenshots

When both GALAXY_TEST_STORIES_DIRECTORY and GALAXY_TEST_SCREENSHOTS_DIRECTORY are set, screenshots save to both locations for backward compatibility.

Data Helpers

Centralized example data access via galaxy.selenium.stories.data:

from galaxy.selenium.stories.data import WORKBOOK_EXAMPLE_1
self.workbook_upload(WORKBOOK_EXAMPLE_1)  # No fragile relative paths

Dependencies

Required (already in Galaxy)

  • markdown - Already used for markdown to HTML
  • zipfile - Python stdlib

Optional (graceful degradation)

  • weasyprint - PDF generation (warns if unavailable)

Refactored for Cleaner Dependencies

Extracted markdown/PDF utilities from galaxy.managers.markdown_util to galaxy.util.markdown so selenium package doesn't depend on managers layer.

Design Rationale

Why Not Existing Test Reporting Solutions?

We evaluated several existing Python testing libraries before implementing a custom solution:

pytest-html (already in Galaxy)

  • Designed for test pass/fail reports, not narrative documentation
  • No built-in story/step structure for sequential narratives
  • Limited formatting options for tutorial-style content
  • Cannot generate markdown or PDF outputs
  • Inherently test-focused - cannot be reused outside test context for user documentation

Allure Framework

  • Industry-standard with rich interactive UI and screenshot support
  • Requires Java-based report generator (additional infrastructure)
  • Focused on test analytics and reports rather than narrative documentation
  • Less control over output format
  • Not suitable for generating user manuals or tutorials

ReportPortal

  • Enterprise test management platform with ML-powered analytics
  • Requires separate server infrastructure
  • Aimed at test analytics dashboards, not documentation generation
  • Excessive complexity for generating standalone tutorial content

pytest-bdd / Robot Framework

  • Behavior-driven development frameworks with built-in reporting
  • Would require rewriting all existing tests in different format/syntax
  • Framework migration too invasive for this feature
  • Focused on specifications rather than visual documentation

Seleniumbase

  • Complete Selenium framework with built-in reporting and dashboards
  • Would conflict with Galaxy's existing test abstractions
  • Requires significant refactoring of test infrastructure
  • Too opinionated about test structure

Why Not Extend pytest-html?

Galaxy's test architecture uses unittest.TestCase subclasses with pytest as a test runner, not pure pytest tests. Test lifecycle is already managed by the selenium_test decorator. A pytest plugin approach would:

  • Create redundant lifecycle management (pytest hooks + existing decorator)
  • Add coupling to pytest internals when Galaxy uses unittest
  • Be permanently coupled to test framework, preventing reuse for user documentation
  • Introduce debugging complexity with two layers managing the same lifecycle
  • Still require custom code for markdown/PDF generation and narrative structure

What pytest-html provides: Test pass/fail reports with screenshot attachments
What we need: Narrative documentation with sequential screenshots, markdown/HTML/PDF output, story structure, and reusability for user tutorials and manuals

Why Custom Implementation?

Reusability Beyond Testing
The Story class in packages/selenium has zero test framework dependencies, enabling:

  • User manual generation from the same automation code
  • Interactive tutorial creation in Jupyter notebooks
  • Standalone documentation scripts
  • Developer workflow documentation with actual screenshots

Example: Standalone Tutorial Generation

See generate_rule_builder_tutorial.py as an example. This is clearly cleaner without the Pytest pieces involved and/or extra test-focused infrastructure like Allure.

Clean Architecture

  • packages/selenium: General-purpose story infrastructure (no test dependencies)
  • packages/test_selenium: Test framework integration via selenium_test decorator
  • Clear separation allows independent testing and evolution of Story class

Fits Galaxy's Existing Architecture

  • Works seamlessly with Galaxy's unittest-based tests
  • Simple lifecycle management in one place (decorator)
  • No conflicts with existing test infrastructure

Leverages Existing Galaxy Utilities

  • Markdown to HTML: galaxy.util.markdown.to_html()
  • HTML to PDF: galaxy.util.markdown.to_pdf_raw()
  • weasyprint: Already in Galaxy dependencies
  • Screenshot infrastructure: Existing methods and paths

Full Control Over Output

  • Generate exactly the formats needed (markdown, HTML, PDF, zip)
  • Narrative focus: Optimized for sequential documentation, not pass/fail reports
  • Interleaved documentation and screenshots for tutorial quality

Testing

All functionality manually verified:

  • ✅ Story generation in test context
  • ✅ Standalone tutorial script execution
  • ✅ Markdown, HTML, PDF, zip artifact generation
  • ✅ Screenshot embedding and sequential numbering
  • ✅ Test retry handling (story.reset())
  • ✅ Type safety with StoryProtocol
  • ✅ NoopStory pattern (zero impact when disabled)

Future Enhancements

Potential additions (deferred for future PRs):

  • Video recording of test execution
  • Gallery view for browsing test stories
  • CI integration to publish stories as artifacts
  • Visual diff highlighting between test runs

Related Work

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton and others added 17 commits October 29, 2025 14:14
Add a new "Test Stories" feature that automatically generates visual narrative
documentation of test execution, creating markdown, HTML, and PDF artifacts
with interleaved screenshots and documentation.

Core Infrastructure (galaxy.selenium module):
- Add StoryProtocol abstract base class for story implementations
- Add Story class for collecting screenshots and documentation
- Add NoopStory class implementing null object pattern
- Story.finalize() generates markdown, HTML, PDF (via weasyprint), and zip
- Story.reset() clears state for test retries
- General-purpose design allows reuse in tests, tutorials, and documentation

Browser Context API Extensions (galaxy.selenium module):
- Update screenshot() method to accept optional caption parameter
- Add document() method for adding markdown to stories
- Add add_story_arguments() CLI function for opt-in story generation
- add_story_arguments() includes selenium arguments automatically
- Update DriverWrapper to initialize Story when --story-output provided

Test Integration (galaxy_test.selenium module):
- Add GALAXY_TEST_STORIES_DIRECTORY environment variable configuration
- Modify selenium_test decorator to initialize/finalize stories automatically
- Add _create_story_directory() helper for timestamped story directories
- Update _screenshot_path() to save to story directory with sequential numbering
- Override screenshot() to save to BOTH story and screenshots directories when both configured
- Call story.reset() in reset_driver_and_session() to clear state on retry
- Stories finalize on both test success and failure for debugging

Tutorial Example:
- Add generate_upload_rules_tutorial.py demonstrating standalone story usage
- Shows how to create user documentation outside test framework
- Provides template for generating publication-ready tutorials

Design:
- Off by default, enabled via GALAXY_TEST_STORIES_DIRECTORY env var
- Backward compatible - existing tests work unchanged
- Opt-in CLI design - scripts choose to support story functionality
- Degrades gracefully if weasyprint unavailable
- Dual-save: screenshots saved to both locations when both configured
- NoopStory pattern enables strong typing without conditional checks
- Type annotations with StoryProtocol throughout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Automated Test Cases - Bad

AI generated test cases are noisy and generally just bad - this is not that.

A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though.

The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a  lot of good retry logic, adaptive waiting, rich debug messages, etc...).

AI Assistance in Building Test Cases - Good!

The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case.

This PR adds a Claude slash command "/setup-selenium-test-notebook <feature description OR GitHub PR>". It can take a description of the feature to test or a PR.

It will setup a Jupyter notebook with cells filled out for setting up the Selenium enviornment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present and tells the user how to run Jupyter. All this part is based on my prior work in galaxyproject#11177.

The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR.

The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc...

Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc...

I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook.

After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test <notebook path or description>'
- Add Screenshots and Documentation section to CLAUDE.md showing screenshot() with captions and document() helper
- Add Test Stories section explaining how to enable and use the feature
- Update setup-selenium-test-notebook.md to mention document() helper and screenshot captions
- Update .claude/README.md with complete Test Stories documentation and examples
- Add screenshot(label, caption=None) and document(markdown_content) to helper methods list

These updates document the test story abstractions added in bc86e22a279eb5b43edddda5f29e03a63eb33c16.
Move markdown/PDF conversion utilities from galaxy.managers.markdown_util
to galaxy.util.markdown to eliminate dependency of selenium package on
managers layer. This improves separation of concerns and makes these
utilities more broadly reusable.

Changes:
- Extended galaxy.util.markdown with to_html(), to_pdf_raw(), and
  weasyprint_available() functions
- Made markdown and weasyprint imports optional with availability checks
- Added markdown-convert optional dependency to packages/util/setup.cfg
  (includes Markdown and weasyprint packages)
- Updated galaxy.selenium.story to import from galaxy.util.markdown
- Kept backward compatibility wrappers in galaxy.managers.markdown_util

Benefits:
- Proper separation of concerns - selenium doesn't depend on managers
- Static imports possible (no lazy imports needed)
- Utilities reusable across Galaxy without managers dependency
- Optional dependencies properly declared in setup.cfg

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Refactor galaxy.selenium.story into galaxy.selenium.stories package to
support better organization with data files and helper functions.

Changes:
- Created lib/galaxy/selenium/stories/ package structure:
  - stories/__init__.py: Re-exports Story, NoopStory, StoryProtocol
  - stories/story.py: Moved from story.py
  - stories/data/__init__.py: Helper functions for accessing example files
  - stories/data/examples/: Workbook example files (moved from ../examples/)

- Added data helper functions in stories/data/:
  - get_data_directory(): Returns path to examples directory
  - get_example_path(filename): Returns absolute path to example file
  - WORKBOOK_EXAMPLE_1-4: Convenience constants for workbook examples

- Updated all imports from galaxy.selenium.story to galaxy.selenium.stories:
  - lib/galaxy/selenium/cli.py
  - lib/galaxy_test/selenium/framework.py

- Updated test_workbook_import.py to use data helpers:
  - Import WORKBOOK_EXAMPLE_* constants from stories.data
  - Replaced self.get_filename() calls with direct constants
  - Removed fragile relative path constants

Benefits:
- Centralized, discoverable location for example data files
- No more fragile relative paths in tests
- Package structure better supports multiple files and data
- Easy to discover available example files via constants
- Type-safe with absolute paths from helper functions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This completes Phase 7.4 of the Test Stories implementation plan.

Changes:
- Added document_file() method to GalaxySeleniumContext in lib/galaxy/selenium/context.py
  - Reads file contents and documents them as markdown code blocks
  - Shows only filename (not full path) for cleaner documentation
  - Supports optional caption for contextual explanation
  - Gracefully handles file read errors
  - Uses document() internally to add to story

Benefits:
- Self-contained tutorials that show actual data file formats
- Better understanding of test/tutorial data requirements
- No need for users to find/download example files separately
- Consistent API across tests and standalone tutorial scripts
- Flexible with optional captions for contextual explanations

Updated TEST_STORIES_PLAN.md to mark Phase 7 as complete.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant