accessibility 11 min read

Accessibility Automation Guidelines: What to Automate vs. What Requires Human Testing

A practical priority matrix for accessibility testing automation. Learn which WCAG checks to automate first, which require manual testing, and how to balance both in your CI/CD pipeline.

Illustration for Accessibility Automation Guidelines: What to Automate vs. What Requires Human Testing

One of the most common questions I get after my “Inclusive by Default” talks is: “Which accessibility checks should we automate first?”

The answer isn’t “automate everything”—and it’s definitely not “accessibility can’t be automated.” The truth is strategic: some checks are perfect for automation, others require human judgment, and knowing the difference will save you time while improving your coverage.

This guide provides a practical priority matrix for accessibility testing automation, helping you decide what to automate in your CI/CD pipeline versus what needs manual testing or user research.

The Automation Reality Check

Let’s start with an honest assessment:

Automated tools can catch approximately 30-40% of accessibility issues.

That sounds low, but here’s the key insight: that 30-40% includes the most common, most easily fixable issues—the low-hanging fruit that creates the foundation for accessibility.

The remaining 60-70% requires human judgment: Is this alt text meaningful? Is this navigation order logical? Does this interaction pattern make sense?

The Priority Matrix

I’ve organized accessibility checks into four quadrants based on two factors:

  1. Automation reliability (Can a tool reliably detect this?)
  2. Impact (How critical is this for users?)

Quadrant 1: Automate First (High Reliability + High Impact)

These checks are reliable, impactful, and should run on every build.

CheckWCAG CriterionWhy Automate
Missing accessible names4.1.2100% detectable—element either has a name or doesn’t
Missing form labels1.3.1, 3.3.2Programmatically determinable
Color contrast ratios1.4.3Mathematical calculation
Invalid ARIA attributes4.1.2Validatable against spec
Duplicate IDs4.1.1Simple DOM check
Missing language attribute3.1.1Presence check on <html>
Missing page titles2.4.2Presence check on <title>
Broken ARIA references4.1.2ID existence validation

Implementation:

# conftest.py - Run these on EVERY test
import pytest
from axe_selenium_python import Axe

@pytest.fixture(autouse=True)
def check_critical_accessibility(driver, request):
    """Automatically check critical accessibility on every test."""
    yield  # Run the test first

    # Only check on page-level tests (not API tests, etc.)
    if hasattr(request, 'param') and request.param.get('skip_a11y'):
        return

    axe = Axe(driver)
    axe.inject()

    # Check only the most reliable, high-impact rules
    results = axe.run(options={
        'runOnly': {
            'type': 'rule',
            'values': [
                'label',           # Form labels
                'button-name',     # Button accessible names
                'link-name',       # Link accessible names
                'image-alt',       # Image alt text presence
                'color-contrast',  # Color contrast
                'aria-valid-attr', # Valid ARIA attributes
                'duplicate-id',    # Duplicate IDs
                'html-has-lang',   # Language attribute
                'document-title',  # Page title
            ]
        }
    })

    violations = results['violations']
    if violations:
        # Fail the test with clear details
        violation_summary = "\n".join([
            f"- {v['id']}: {v['description']} ({len(v['nodes'])} instances)"
            for v in violations
        ])
        pytest.fail(
            f"Critical accessibility violations found:\n{violation_summary}"
        )

Quadrant 2: Automate with Caution (Medium Reliability + High Impact)

These can be automated but produce false positives. Use them for flagging, not failing builds.

CheckWCAG CriterionChallenge
Heading hierarchy1.3.1Tool detects skipped levels, but some designs legitimately skip
Link purpose2.4.4Tool detects “click here,” but context matters
Focus order2.4.3Tool can trace order, but logic requires judgment
Keyboard traps2.1.2Detection is possible, but false positives in modals
Autocomplete attributes1.3.5Tool can check presence, but appropriateness varies

Implementation:

# helpers/cautious_checks.py

def check_heading_hierarchy(driver) -> dict:
    """
    Check heading hierarchy - flags issues but explains context.

    Returns warnings, not failures, because skipped headings
    may be intentional in some designs.
    """
    script = """
    const headings = document.querySelectorAll('h1, h2, h3, h4, h5, h6');
    const levels = Array.from(headings).map(h => parseInt(h.tagName[1]));

    const issues = [];
    let prevLevel = 0;

    levels.forEach((level, index) => {
        // Check for skipped levels (h1 -> h3 without h2)
        if (level > prevLevel + 1 && prevLevel !== 0) {
            issues.push({
                type: 'skipped_level',
                from: prevLevel,
                to: level,
                element: headings[index].outerHTML.substring(0, 100),
                message: `Heading level skipped from h${prevLevel} to h${level}`
            });
        }
        prevLevel = level;
    });

    // Check for multiple h1s (usually wrong, but not always)
    const h1Count = levels.filter(l => l === 1).length;
    if (h1Count > 1) {
        issues.push({
            type: 'multiple_h1',
            count: h1Count,
            message: `Found ${h1Count} h1 elements (typically should be 1)`
        });
    }

    // Check for no h1 at all
    if (h1Count === 0) {
        issues.push({
            type: 'no_h1',
            message: 'No h1 element found on page'
        });
    }

    return {
        headings: levels,
        issues: issues,
        recommendation: issues.length > 0
            ? 'Review heading structure manually'
            : 'Heading structure appears correct'
    };
    """
    return driver.execute_script(script)


def warn_on_heading_issues(driver):
    """
    Issue warnings (not failures) for heading structure issues.

    Use in CI to flag for review without blocking deployment.
    """
    results = check_heading_hierarchy(driver)

    if results['issues']:
        import warnings
        for issue in results['issues']:
            warnings.warn(
                f"Accessibility Review Needed: {issue['message']}",
                UserWarning
            )

    return results

Quadrant 3: Manual Testing Required (Low Reliability + High Impact)

These are critical but cannot be reliably automated. Schedule regular manual reviews.

CheckWCAG CriterionWhy Manual
Meaningful alt text1.1.1Tool checks presence, not meaning
Logical reading order1.3.2Requires understanding content
Consistent navigation3.2.3Requires cross-page comparison
Error identification3.3.1Requires understanding context
Input purpose1.3.5Requires understanding field purpose
Resize/reflow1.4.4, 1.4.10Requires visual inspection
Motion/animation safety2.3.1, 2.3.3Requires content understanding

Implementation Strategy:

## Manual Accessibility Testing Checklist

### Before Each Release

Run through these checks on key user journeys:

#### Alt Text Quality (WCAG 1.1.1)

- [ ] Do images have alt text that describes their purpose?
- [ ] Are decorative images marked with `alt=""`?
- [ ] Do complex images have extended descriptions?

#### Reading Order (WCAG 1.3.2)

- [ ] Does content make sense when CSS is disabled?
- [ ] Is the DOM order logical for screen readers?
- [ ] Do modals/overlays announce in correct sequence?

#### Error Messages (WCAG 3.3.1)

- [ ] Are errors clearly described?
- [ ] Is it clear how to fix each error?
- [ ] Are errors associated with their fields?

#### Cognitive Load

- [ ] Is navigation consistent across pages?
- [ ] Are instructions clear and not overwhelming?
- [ ] Can users easily recover from mistakes?

Quadrant 4: User Research Required (Low Reliability + Variable Impact)

These require testing with actual users who have disabilities.

CheckWCAG CriterionWhy User Testing
Screen reader experienceMultipleReal-world usage patterns
Cognitive accessibility3.1.x, 3.3.xUser understanding varies
Motor accessibility2.1.xIndividual needs vary
Assistive technology compatibility4.1.xAT behavior varies

Implementation Strategy:

## User Testing Program

### Quarterly User Testing Sessions

Partner with users who rely on:

- Screen readers (JAWS, NVDA, VoiceOver)
- Voice control (Dragon, Voice Control)
- Switch devices
- Screen magnification

### Key Questions

1. Can you complete the core user journey?
2. Where did you encounter friction?
3. What was confusing or unclear?
4. What worked well?

### Metrics to Track

- Task completion rate
- Time to complete tasks
- Error rate
- Satisfaction score
- Qualitative feedback themes

Building Your Automation Pipeline

Here’s how to structure your CI/CD pipeline with these quadrants in mind:

Stage 1: Build-Time Checks (Blocking)

# .github/workflows/accessibility.yml

name: Accessibility CI

on: [push, pull_request]

jobs:
  accessibility-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Run critical accessibility tests
        run: npm run test:a11y:critical
        # These tests FAIL the build:
        # - Missing form labels
        # - Missing button/link names
        # - Color contrast below 4.5:1
        # - Invalid ARIA
        # - Duplicate IDs

      - name: Run cautious accessibility checks
        run: npm run test:a11y:review
        continue-on-error: true # Don't fail, but report
        # These tests WARN but don't fail:
        # - Heading hierarchy
        # - Link text quality
        # - Focus order suggestions

Stage 2: Pre-Release Checks (Warning)

accessibility-review:
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/main'
  steps:
    - name: Generate accessibility report
      run: npm run test:a11y:full

    - name: Upload report
      uses: actions/upload-artifact@v3
      with:
        name: accessibility-report
        path: reports/accessibility/

    - name: Comment on PR with findings
      uses: actions/github-script@v6
      with:
        script: |
          const report = require('./reports/accessibility/summary.json');
          const comment = `## Accessibility Review Required

          **Automated findings:** ${report.violations} issues
          **Manual review items:** ${report.reviewItems} items

          Please review the [full report](${report.reportUrl}) before release.`;

          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: comment
          });

Stage 3: Scheduled Comprehensive Audits

weekly-audit:
  runs-on: ubuntu-latest
  if: github.event_name == 'schedule'
  steps:
    - name: Run full Lighthouse audit
      run: npx lhci autorun

    - name: Run axe comprehensive scan
      run: npm run test:a11y:comprehensive

    - name: Run Pa11y on all pages
      run: npm run test:pa11y:sitemap

    - name: Create tracking issue
      if: failure()
      uses: actions/github-script@v6
      with:
        script: |
          github.rest.issues.create({
            owner: context.repo.owner,
            repo: context.repo.repo,
            title: 'Weekly Accessibility Audit - Issues Found',
            body: 'See attached report for details.',
            labels: ['accessibility', 'automated-audit']
          });

The Decision Framework

When deciding what to automate, ask these questions:

1. Can the check be expressed as a binary pass/fail?

  • ✅ “Does this button have an accessible name?” → Automate
  • ❌ “Is this alt text meaningful?” → Manual

2. Is the check consistent across contexts?

  • ✅ “Is color contrast at least 4.5:1?” → Automate
  • ❌ “Is this navigation order logical?” → Manual (context-dependent)

3. What’s the false positive rate?

  • < 5% false positives → Automate as blocking
  • 5-20% false positives → Automate as warning
  • 20% false positives → Manual review only

4. What’s the cost of missing it?

  • Legal/compliance risk → Automate what you can, manual review the rest
  • User experience impact → Prioritize based on user journey importance

Based on my experience, here’s the tool combination that gives the best coverage:

ToolBest ForUse In
axe-coreComprehensive rule set, low false positivesCI/CD blocking
LighthousePerformance + accessibility combinedWeekly audits
Pa11yPage-level scanning, sitemap crawlingScheduled audits
WAVEVisual feedback during developmentDeveloper tooling
IBM Equal AccessEnterprise compliance reportingQuarterly audits

Integration Example

# test_comprehensive_accessibility.py

import pytest
from axe_selenium_python import Axe
from pa11y import Pa11y
import subprocess
import json


class TestComprehensiveAccessibility:
    """
    Comprehensive accessibility test suite.

    Run weekly or before major releases.
    """

    def test_axe_full_scan(self, driver):
        """Run full axe-core scan with all rules."""
        driver.get("https://yoursite.com")

        axe = Axe(driver)
        axe.inject()
        results = axe.run()

        # Separate critical from moderate issues
        critical = [v for v in results['violations']
                   if v['impact'] in ['critical', 'serious']]
        moderate = [v for v in results['violations']
                   if v['impact'] in ['moderate', 'minor']]

        # Fail on critical, warn on moderate
        if critical:
            pytest.fail(f"Critical accessibility issues: {len(critical)}")

        if moderate:
            pytest.warns(f"Moderate accessibility issues: {len(moderate)}")

    def test_lighthouse_accessibility_score(self, driver):
        """Check Lighthouse accessibility score meets threshold."""
        result = subprocess.run([
            'lighthouse',
            'https://yoursite.com',
            '--output=json',
            '--only-categories=accessibility'
        ], capture_output=True, text=True)

        report = json.loads(result.stdout)
        score = report['categories']['accessibility']['score'] * 100

        assert score >= 90, f"Lighthouse accessibility score {score} below 90"

    def test_pa11y_pages(self):
        """Run Pa11y on key pages."""
        pages = [
            'https://yoursite.com/',
            'https://yoursite.com/login',
            'https://yoursite.com/checkout',
            'https://yoursite.com/contact'
        ]

        for page in pages:
            result = subprocess.run([
                'pa11y', page, '--reporter', 'json'
            ], capture_output=True, text=True)

            issues = json.loads(result.stdout)
            errors = [i for i in issues if i['type'] == 'error']

            assert len(errors) == 0, f"Pa11y errors on {page}: {len(errors)}"

Summary: Your Accessibility Automation Strategy

  1. Automate the fundamentals (Quadrant 1) - Run on every build
  2. Flag the uncertain (Quadrant 2) - Warn but don’t block
  3. Schedule manual reviews (Quadrant 3) - Before each release
  4. Invest in user testing (Quadrant 4) - Quarterly minimum

The goal isn’t 100% automation—it’s efficient coverage that catches the detectable issues automatically while reserving human attention for what truly requires human judgment.

Resources


This article is part of my “Inclusive by Default” series on building accessibility into test automation. For the technical implementation of accessibility helpers, see Building Accessibility into Your Selenium Test Automation.

RC

Ruby Jane Cabagnot

Accessibility Cloud Engineer

Building inclusive digital experiences through automated testing and AI-powered accessibility tools. Passionate about making the web accessible for everyone.

Related Topics:

#accessibility testing #test automation #WCAG #manual testing #CI/CD #quality assurance #testing strategy