Empower Developers to Build Rich, Automated Tutorials #21199

jmchilton · 2025-10-30T02:09:39Z

Empower Developers to Build Rich, Automated Tutorials

This PR introduces a "Stories" feature that automatically generates visual documentation from Selenium & Playwright tests and standalone scripts built on galaxy-selenium. Stories interleave screenshots with markdown narrative to create tutorial-quality documentation in markdown, HTML, PDF, and zip formats. The same automation code now serves dual purpose: validating Galaxy functionality through tests while generating tutorials. This approach maintains a single source of truth for both testing and documentation.

Key Features

Dual-Purpose Automation

Test documentation: Generate visual test reports showing what tests verify
User tutorials: Create user-facing guides from the same automation code
Developer docs: Document workflows and features with actual screenshots

Story API

Three new methods added to browser context:

screenshot(label, caption=None) - Take screenshot with optional caption for story
document(markdown_content) - Add markdown narrative between actions
document_file(file_path, caption=None) - Include file contents in story with syntax highlighting

Output Formats

Stories automatically generate four artifact types:

story.md - Markdown source with embedded screenshots
story.html - Self-contained HTML with styling
story.pdf - Publication-ready PDF (requires weasyprint)
{story_name}.zip - Complete archive of all artifacts

Clean Architecture

Split across two packages for maximum reusability:

packages/selenium: General-purpose story infrastructure (no test dependencies)
- Reusable in tests, Jupyter notebooks, standalone tutorial scripts
- Story class, NoopStory pattern, StoryProtocol typing
packages/test_selenium: Test framework integration
- Wires stories into test lifecycle via selenium_test decorator
- Enabled via GALAXY_TEST_STORIES_DIRECTORY environment variable

Examples Included

Test Suite: Workbook Import Stories

Added test_workbook_import.py with four test cases demonstrating the new workbook functionality from #20288:

test_dataset_import_from_workbook - Import datasets with automatic column mapping
test_simple_list_collection_import - Create list collections from workbooks
test_list_of_pairs_collection_import - Import paired data with automatic pairing
test_nested_list_of_pairs_import - Build complex nested collection structures

These tests serve dual purpose: validating the workbook import feature while generating tutorial documentation.

Run tests with stories:

export GALAXY_TEST_STORIES_DIRECTORY=/tmp/test_stories
pytest lib/galaxy_test/selenium/test_workbook_import.py -v
# Check /tmp/test_stories for generated markdown, HTML, PDF, zip

Tutorial Generator: Rule Builder Guide

Added generate_rule_builder_tutorial.py demonstrating standalone tutorial generation at scale. Generates a story with 10 different examples included and documented (the 4 new workbook examples + 6 older rules examples that power training material tutorials) by reusing test helper methods with added narrative context.

Generate tutorials:

cd packages/selenium
# Outline mode (fast, displays rule structure and builder but doesn't wait for uploads)
uv run python galaxy/selenium/scripts/generate_rule_builder_tutorial.py --galaxy_url http://localhost:8081/ --story-output ./rules_tutorial

# Full mode (complete with uploads and screenshots)
uv run python galaxy/selenium/scripts/generate_rule_builder_tutorial.py --galaxy_url http://localhost:8081/ --story-output ./rules_tutorial --perform-uploads  --timeout-multiplier 6

These vs. Training Material

Despite Claude's flowery language and excitement - I think the Training Material remains the repository for much richer, more usable, better supported tutorials and this work will never supplant that - but it does make it really easy to build tutorial outlines that can be readily translated into training materials (a task that can be easily delegated to an agent once the Markdown and screenshots are in place) and it makes it really easy to regenerate all the screenshots for older tutorials generated from these stubs (again a task that could be easily delegated to an agent).

I have a plan to create tutorials for these test cases - this was the compare and contrast of the differences between what I have done here and what needs to be done for the training materials:

Selenium-generated documentation:

Purpose: Test validation + reference documentation
Audience: Developers and power users
Content: Technical explanations of what Galaxy detects and how
Screenshots: Focus on rule builder interface states

Final GTN tutorial (to be written):

Purpose: Teaching beginners how to use auto-detection
Audience: Scientists and researchers new to Galaxy
Content: Step-by-step instructions with biological context
Additional elements needed:
- Clearer learning objectives and prerequisites
- More biological context for the examples
- Troubleshooting tips
- Export template workbook demonstration
- Links to intermediate/advanced tutorials
- Exercises or challenges for practice

The selenium docs provide the foundation and validation, but the GTN tutorial needs additional pedagogy and narrative.

Implementation Highlights

NoopStory Pattern

Uses null object pattern instead of None checks for cleaner code and type safety:

if GALAXY_TEST_STORIES_DIRECTORY:
    self.story = Story(title, description, output_dir)
else:
    self.story = NoopStory()  # All methods are no-ops

Test Retry Support

Stories reset on test retry to avoid accumulating content from failed attempts:

def reset_driver_and_session(self):
    self.story.reset()  # Clear elements and counters
    # ... reset driver

Dual-Save Screenshots

When both GALAXY_TEST_STORIES_DIRECTORY and GALAXY_TEST_SCREENSHOTS_DIRECTORY are set, screenshots save to both locations for backward compatibility.

Data Helpers

Centralized example data access via galaxy.selenium.stories.data:

from galaxy.selenium.stories.data import WORKBOOK_EXAMPLE_1
self.workbook_upload(WORKBOOK_EXAMPLE_1)  # No fragile relative paths

Dependencies

Required (already in Galaxy)

markdown - Already used for markdown to HTML
zipfile - Python stdlib

Optional (graceful degradation)

weasyprint - PDF generation (warns if unavailable)

Refactored for Cleaner Dependencies

Extracted markdown/PDF utilities from galaxy.managers.markdown_util to galaxy.util.markdown so selenium package doesn't depend on managers layer.

Design Rationale

Why Not Existing Test Reporting Solutions?

We evaluated several existing Python testing libraries before implementing a custom solution:

pytest-html (already in Galaxy)

Designed for test pass/fail reports, not narrative documentation
No built-in story/step structure for sequential narratives
Limited formatting options for tutorial-style content
Cannot generate markdown or PDF outputs
Inherently test-focused - cannot be reused outside test context for user documentation

Allure Framework

Industry-standard with rich interactive UI and screenshot support
Requires Java-based report generator (additional infrastructure)
Focused on test analytics and reports rather than narrative documentation
Less control over output format
Not suitable for generating user manuals or tutorials

ReportPortal

Enterprise test management platform with ML-powered analytics
Requires separate server infrastructure
Aimed at test analytics dashboards, not documentation generation
Excessive complexity for generating standalone tutorial content

pytest-bdd / Robot Framework

Behavior-driven development frameworks with built-in reporting
Would require rewriting all existing tests in different format/syntax
Framework migration too invasive for this feature
Focused on specifications rather than visual documentation

Seleniumbase

Complete Selenium framework with built-in reporting and dashboards
Would conflict with Galaxy's existing test abstractions
Requires significant refactoring of test infrastructure
Too opinionated about test structure

Why Not Extend pytest-html?

Galaxy's test architecture uses unittest.TestCase subclasses with pytest as a test runner, not pure pytest tests. Test lifecycle is already managed by the selenium_test decorator. A pytest plugin approach would:

Create redundant lifecycle management (pytest hooks + existing decorator)
Add coupling to pytest internals when Galaxy uses unittest
Be permanently coupled to test framework, preventing reuse for user documentation
Introduce debugging complexity with two layers managing the same lifecycle
Still require custom code for markdown/PDF generation and narrative structure

What pytest-html provides: Test pass/fail reports with screenshot attachments
What we need: Narrative documentation with sequential screenshots, markdown/HTML/PDF output, story structure, and reusability for user tutorials and manuals

Why Custom Implementation?

Reusability Beyond Testing
The Story class in packages/selenium has zero test framework dependencies, enabling:

User manual generation from the same automation code
Interactive tutorial creation in Jupyter notebooks
Standalone documentation scripts
Developer workflow documentation with actual screenshots

Example: Standalone Tutorial Generation

See generate_rule_builder_tutorial.py as an example. This is clearly cleaner without the Pytest pieces involved and/or extra test-focused infrastructure like Allure.

Clean Architecture

packages/selenium: General-purpose story infrastructure (no test dependencies)
packages/test_selenium: Test framework integration via selenium_test decorator
Clear separation allows independent testing and evolution of Story class

Fits Galaxy's Existing Architecture

Works seamlessly with Galaxy's unittest-based tests
Simple lifecycle management in one place (decorator)
No conflicts with existing test infrastructure

Leverages Existing Galaxy Utilities

Markdown to HTML: galaxy.util.markdown.to_html()
HTML to PDF: galaxy.util.markdown.to_pdf_raw()
weasyprint: Already in Galaxy dependencies
Screenshot infrastructure: Existing methods and paths

Full Control Over Output

Generate exactly the formats needed (markdown, HTML, PDF, zip)
Narrative focus: Optimized for sequential documentation, not pass/fail reports
Interleaved documentation and screenshots for tutorial quality

Testing

All functionality manually verified:

✅ Story generation in test context
✅ Standalone tutorial script execution
✅ Markdown, HTML, PDF, zip artifact generation
✅ Screenshot embedding and sequential numbering
✅ Test retry handling (story.reset())
✅ Type safety with StoryProtocol
✅ NoopStory pattern (zero impact when disabled)

Future Enhancements

Potential additions (deferred for future PRs):

Video recording of test execution
Gallery view for browsing test stories
CI integration to publish stories as artifacts
Visual diff highlighting between test runs

Related Work

Builds on workbook import feature from Empower Users to More Pragmatically Import Datasets & Collections From Tables #20288
Four test cases validate and document that feature
Tutorial generator shows same code serving tests and documentation

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Add a new "Test Stories" feature that automatically generates visual narrative documentation of test execution, creating markdown, HTML, and PDF artifacts with interleaved screenshots and documentation. Core Infrastructure (galaxy.selenium module): - Add StoryProtocol abstract base class for story implementations - Add Story class for collecting screenshots and documentation - Add NoopStory class implementing null object pattern - Story.finalize() generates markdown, HTML, PDF (via weasyprint), and zip - Story.reset() clears state for test retries - General-purpose design allows reuse in tests, tutorials, and documentation Browser Context API Extensions (galaxy.selenium module): - Update screenshot() method to accept optional caption parameter - Add document() method for adding markdown to stories - Add add_story_arguments() CLI function for opt-in story generation - add_story_arguments() includes selenium arguments automatically - Update DriverWrapper to initialize Story when --story-output provided Test Integration (galaxy_test.selenium module): - Add GALAXY_TEST_STORIES_DIRECTORY environment variable configuration - Modify selenium_test decorator to initialize/finalize stories automatically - Add _create_story_directory() helper for timestamped story directories - Update _screenshot_path() to save to story directory with sequential numbering - Override screenshot() to save to BOTH story and screenshots directories when both configured - Call story.reset() in reset_driver_and_session() to clear state on retry - Stories finalize on both test success and failure for debugging Tutorial Example: - Add generate_upload_rules_tutorial.py demonstrating standalone story usage - Shows how to create user documentation outside test framework - Provides template for generating publication-ready tutorials Design: - Off by default, enabled via GALAXY_TEST_STORIES_DIRECTORY env var - Backward compatible - existing tests work unchanged - Opt-in CLI design - scripts choose to support story functionality - Degrades gracefully if weasyprint unavailable - Dual-save: screenshots saved to both locations when both configured - NoopStory pattern enables strong typing without conditional checks - Type annotations with StoryProtocol throughout 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Automated Test Cases - Bad AI generated test cases are noisy and generally just bad - this is not that. A semi-automated approach I looked into was recording a script and having an AI convert it into Selenium commands - it wasn't very promising at all though. The selectors chosen don't really seem great and obviously most test cases can be bootstrapped and setup with a huge mountain of existing test helpers we've already had - reducing all of that to just sequences of Selectors would result in a ton of duplication, unreadable code, and less robustness (our helpers have a lot of good retry logic, adaptive waiting, rich debug messages, etc...). AI Assistance in Building Test Cases - Good! The semi-automatic approach that I think is more promising is to have the AI agent setup a rich environment for manually testing the UI and then provide a mechanism for turning that exploration directly into a test case. This PR adds a Claude slash command "/setup-selenium-test-notebook <feature description OR GitHub PR>". It can take a description of the feature to test or a PR. It will setup a Jupyter notebook with cells filled out for setting up the Selenium enviornment and talking with Galaxy. It tells the user about the config file they need to setup if it isn't present and tells the user how to run Jupyter. All this part is based on my prior work in galaxyproject#11177. The agent will pull down the PR description and try to come up with an idea for how to test it. The manual testing instructions we already provide are great for this. It will also "research" the code base and find related tests and will provide potentially relevant code from existing tests as Markdown comments right in the notebook - so you have a good idea of what helpers and components are already implemented that might help with the task of testing the PR. The agent seems smart enough to reason about when a managed history annotation is needed and how to deal with user login, etc... Developing in Jupyter is nice because it can sustain a persistent connection to the browser automation application. You don't have to re-run the whole test - you can work a line or two at a time with cells and preserve progress and just re-run what is needed as components are annotated, etc... I think the screenshots are a cool part of the framework we have - and these will appear right inside the notebook. After the notebook test case is ready go, claude seems pretty good at converting it directly to a test case. This can be done with '/extract-selenium-test <notebook path or description>'

- Add Screenshots and Documentation section to CLAUDE.md showing screenshot() with captions and document() helper - Add Test Stories section explaining how to enable and use the feature - Update setup-selenium-test-notebook.md to mention document() helper and screenshot captions - Update .claude/README.md with complete Test Stories documentation and examples - Add screenshot(label, caption=None) and document(markdown_content) to helper methods list These updates document the test story abstractions added in bc86e22a279eb5b43edddda5f29e03a63eb33c16.

Move markdown/PDF conversion utilities from galaxy.managers.markdown_util to galaxy.util.markdown to eliminate dependency of selenium package on managers layer. This improves separation of concerns and makes these utilities more broadly reusable. Changes: - Extended galaxy.util.markdown with to_html(), to_pdf_raw(), and weasyprint_available() functions - Made markdown and weasyprint imports optional with availability checks - Added markdown-convert optional dependency to packages/util/setup.cfg (includes Markdown and weasyprint packages) - Updated galaxy.selenium.story to import from galaxy.util.markdown - Kept backward compatibility wrappers in galaxy.managers.markdown_util Benefits: - Proper separation of concerns - selenium doesn't depend on managers - Static imports possible (no lazy imports needed) - Utilities reusable across Galaxy without managers dependency - Optional dependencies properly declared in setup.cfg 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Refactor galaxy.selenium.story into galaxy.selenium.stories package to support better organization with data files and helper functions. Changes: - Created lib/galaxy/selenium/stories/ package structure: - stories/__init__.py: Re-exports Story, NoopStory, StoryProtocol - stories/story.py: Moved from story.py - stories/data/__init__.py: Helper functions for accessing example files - stories/data/examples/: Workbook example files (moved from ../examples/) - Added data helper functions in stories/data/: - get_data_directory(): Returns path to examples directory - get_example_path(filename): Returns absolute path to example file - WORKBOOK_EXAMPLE_1-4: Convenience constants for workbook examples - Updated all imports from galaxy.selenium.story to galaxy.selenium.stories: - lib/galaxy/selenium/cli.py - lib/galaxy_test/selenium/framework.py - Updated test_workbook_import.py to use data helpers: - Import WORKBOOK_EXAMPLE_* constants from stories.data - Replaced self.get_filename() calls with direct constants - Removed fragile relative path constants Benefits: - Centralized, discoverable location for example data files - No more fragile relative paths in tests - Package structure better supports multiple files and data - Easy to discover available example files via constants - Type-safe with absolute paths from helper functions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This completes Phase 7.4 of the Test Stories implementation plan. Changes: - Added document_file() method to GalaxySeleniumContext in lib/galaxy/selenium/context.py - Reads file contents and documents them as markdown code blocks - Shows only filename (not full path) for cleaner documentation - Supports optional caption for contextual explanation - Gracefully handles file read errors - Uses document() internally to add to story Benefits: - Self-contained tutorials that show actual data file formats - Better understanding of test/tutorial data requirements - No need for users to find/download example files separately - Consistent API across tests and standalone tutorial scripts - Flexible with optional captions for contextual explanations Updated TEST_STORIES_PLAN.md to mark Phase 7 as complete. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

jmchilton and others added 17 commits October 29, 2025 14:14

Bug fix for recent browser automation timeout changes.

e9b5534

Selenium test cases and fixes for workbook uploads.

a284ca0

Update test case to use new document_file story helper.

3c02646

Generate 'latest' story symlink the way we do for test errors.

c90edbe

Refactor NavigatesGalaxyMixin into support package.

0168e7f

Split out tutorials examples for reuse outside tests.

1ed727e

Revise language for generated tutorials.

21a9d7c

Refactor existing rule builder test examples into reusable stories.

3e48cec

Revise constants...

d1967d5

Fill in documentation for older upload stories.

d56625c

Replace story CLI example with working rule builder tutorial generator.

aa38751

jmchilton added kind/bug kind/enhancement area/testing area/testing/selenium labels Oct 30, 2025

jmchilton mentioned this pull request Oct 30, 2025

Implement a workflow for AI-assisted Selenium test case creation. #21041

Closed

2 tasks

jmchilton marked this pull request as ready for review October 30, 2025 14:27

github-actions bot added this to the 26.0 milestone Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Empower Developers to Build Rich, Automated Tutorials #21199

Empower Developers to Build Rich, Automated Tutorials #21199

Uh oh!

jmchilton commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Empower Developers to Build Rich, Automated Tutorials #21199

Are you sure you want to change the base?

Empower Developers to Build Rich, Automated Tutorials #21199

Uh oh!

Conversation

jmchilton commented Oct 30, 2025

Empower Developers to Build Rich, Automated Tutorials

Key Features

Dual-Purpose Automation

Story API

Output Formats

Clean Architecture

Examples Included

Test Suite: Workbook Import Stories

Tutorial Generator: Rule Builder Guide

These vs. Training Material

Implementation Highlights

NoopStory Pattern

Test Retry Support

Dual-Save Screenshots

Data Helpers

Dependencies

Required (already in Galaxy)

Optional (graceful degradation)

Refactored for Cleaner Dependencies

Design Rationale

Why Not Existing Test Reporting Solutions?

Why Not Extend pytest-html?

Why Custom Implementation?

Testing

Future Enhancements

Related Work

How to test the changes?

License

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant