Skip to content

PDD-CLI Bug: Uses Text Matching Instead of Semantic Selectors #573

@jiaminc-cmu

Description

@jiaminc-cmu

PDD-CLI Bug: Uses Text Matching Instead of Semantic Selectors

PDD-CLI generates E2E tests using getByText() for accessibility elements that should use semantic selectors like getByRole(). This creates less robust, less accessible tests.

Why this matters: Tests don't validate accessibility structure and are more fragile than semantic selectors.

Concrete Example

For a dashboard page with headings:

// PDD generated test (WRONG):
test('displays dashboard sections', async ({ page }) => {
  await page.goto('/dashboard');
  
  // Uses text matching
  await expect(page.getByText('Analytics')).toBeVisible();
  await expect(page.getByText('Recent Activity')).toBeVisible();
});

Better approach using semantic selectors:

// Should generate (CORRECT):
test('displays dashboard sections', async ({ page }) => {
  await page.goto('/dashboard');
  
  // Uses semantic role-based selectors
  await expect(page.getByRole('heading', { name: 'Analytics' })).toBeVisible();
  await expect(page.getByRole('heading', { name: 'Recent Activity' })).toBeVisible();
});

What went wrong: PDD used getByText() which matches any text content. Using getByRole('heading') would:

  1. Verify the element is actually a heading (better accessibility validation)
  2. Be more specific (avoid matching "Analytics" in paragraph text)
  3. Follow Playwright best practices

Impact: Tests work but are less robust and don't validate semantic HTML structure.

Why PDD Makes This Mistake

PDD-CLI currently:

  • Defaults to text matching as simplest approach
  • Doesn't analyze DOM structure to find semantic roles
  • Doesn't prioritize accessibility testing

But it should:

  1. Use semantic selectors first (role, label, placeholder)
  2. Reserve text matching for non-semantic content
  3. Validate accessibility structure through selectors

How to Prevent This in PDD-CLI

What PDD should do differently:

  1. Prefer semantic selectors hierarchy:

    • getByRole() - for headings, buttons, links, forms
    • getByLabel() - for form inputs
    • getByPlaceholder() - for inputs with placeholders
    • getByTestId() - for non-semantic elements
    • getByText() - last resort only
  2. Parse component DOM structure: Identify heading levels, button roles, form structure.

  3. Generate accessibility-validating tests: Tests should enforce semantic HTML.

Example improvement:

Current: See "Analytics" text → generate getByText('Analytics')

Improved: See "Analytics" text → parse DOM structure:
        → <h2>Analytics</h2> found
        → Generate: getByRole('heading', { name: 'Analytics' })
        → Validates both text AND semantic structure

Severity

P3 - Low Priority

  • Frequency: High - common in all E2E tests
  • Impact: Tests less robust, accessibility not validated
  • Detectability: Low - tests pass but could be better
  • Prevention cost: Low - selector hierarchy is well-defined

Category

test-generation

Related Issues


For Contributors: Discovered throughout frontend/e2e/crm.spec.ts, improved in commit 34a651d5 to use semantic selectors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions