Steven's Knowledge

Accessibility Testing

A layered discipline — what automated tools can catch, what they can't, and where humans still have to look

Accessibility Testing

Most accessibility bugs are invisible to sighted developers using a mouse. A missing form label, a button with no accessible name, a focus trap that locks keyboard users out — these are obvious to anyone using a screen reader or keyboard, and invisible to anyone who isn't. The role of automated accessibility testing is to make that invisible layer continuously visible.

The catch: automated tests catch roughly 30–40% of accessibility issues. The rest require manual testing — keyboard navigation, screen reader interaction, cognitive load assessment, real-user feedback. A team that runs axe and ships is doing better than a team that runs nothing, and worse than a team that treats accessibility as a layered practice.

This page is about that layered practice: where automation belongs, what it misses, and what the manual layers look like.

What Automated Tools Catch

The well-documented set:

  • Missing or empty alt attributes on images.
  • Buttons and links with no accessible name (icon-only with no label).
  • Form inputs with no associated label.
  • Insufficient color contrast against the background.
  • Invalid ARIA usage — attributes on the wrong elements, undefined roles, broken references.
  • Improper heading hierarchy (skipping from h1 to h4).
  • Duplicate IDs.
  • Missing language attribute on the <html> element.
  • Tab order issues detectable from DOM (positive tabindex values).
  • Form fields missing autocomplete attributes for common inputs.

These are the structural, deterministic issues. A tool can scan the DOM and tell you with high confidence whether they exist.

What Automated Tools Miss

The categories that require a human:

  • Whether the alt text is meaningful. alt="image" passes the rule; it's useless to a screen reader user.
  • Whether the label makes sense. <button>Click here</button> passes; out of context it tells the user nothing.
  • Whether the focus order is logical. The DOM order may be valid; the visual order may make the page jump around.
  • Whether interactive widgets behave accessibly. A custom dropdown might have the right ARIA roles but not handle keyboard navigation.
  • Whether dynamic content is announced. A form error appears; does a screen reader user know about it?
  • Whether the experience is usable with only a keyboard.
  • Whether timing is reasonable (toasts that disappear in 2 seconds; sessions that time out without warning).
  • Whether cognitive load is manageable (jargon, multi-step processes, error recovery).

A site can pass every automated check and still be unusable. The score is a starting line, not a finish.

The Layered Approach

A serious accessibility practice has several layers, each catching what the others can't.

Layer 1: Lint-time

Linters that flag accessibility issues in source code, before the test suite runs.

  • eslint-plugin-jsx-a11y (React)
  • vue-eslint-config-a11y
  • angular/eslint accessibility rules
  • axe-linter (cross-framework)

Catches static issues at the IDE / commit level: missing alt, button with no children, anchor missing href. Cheapest layer — runs in seconds.

Layer 2: Unit / component tests

Run axe (or equivalent) against rendered component output in unit tests.

import { render } from '@testing-library/react';
import { axe, toHaveNoViolations } from 'jest-axe';

expect.extend(toHaveNoViolations);

test('LoginForm has no accessibility violations', async () => {
  const { container } = render(<LoginForm />);
  const results = await axe(container);
  expect(results).toHaveNoViolations();
});

Catches DOM-level issues per-component, before they integrate with anything else. Each component carries its own accessibility verification.

Layer 3: Integration / E2E

Run axe scans against rendered pages in E2E tests. Tools: @axe-core/playwright, cypress-axe, @axe-core/puppeteer.

test('checkout page is accessible', async ({ page }) => {
  await page.goto('/checkout');
  await fillCart(page);
  const violations = await new AxeBuilder({ page }).analyze();
  expect(violations.length).toBe(0);
});

Catches issues that only appear in real assembly: a focus trap that misbehaves only when the modal is opened after the autocomplete is used.

Layer 4: Keyboard navigation tests

Explicit E2E tests that exercise the page using only the keyboard:

test('checkout flow is keyboard accessible', async ({ page }) => {
  await page.goto('/checkout');
  await page.keyboard.press('Tab');
  await expect(page.locator(':focus')).toHaveText('Add to cart');
  await page.keyboard.press('Enter');
  // ... walk the entire flow with Tab, Enter, Escape, arrow keys
});

Catches issues automation can't see: a button that's reachable but a click on Enter doesn't trigger it, a modal that traps focus inappropriately, a custom widget that doesn't respond to arrow keys.

Layer 5: Screen reader testing

Manual testing with VoiceOver (macOS), NVDA (Windows), JAWS (Windows), or TalkBack (Android). No automated equivalent.

  • Frequency: every major flow, every release.
  • Owner: someone trained, with the tool installed and configured.
  • Recording: capture audio of important flows; share with the team.

This is the layer most teams skip. It's also where the most user-impactful issues hide.

Layer 6: User testing with disabled users

People who actually use assistive technology daily. Provides feedback no automated or manual tool can: usability, friction, cognitive overhead, real-world device behavior.

  • Cost: real budget, real recruiting.
  • Frequency: at least once per significant feature; ongoing for accessibility-critical products.
  • Output: qualitative feedback that drives design changes, not bug tickets.

Tooling

ToolLayerNotes
axe-coreEngineThe de facto standard scanner; powers most other tools
eslint-plugin-jsx-a11yLintReact-specific; catches issues at IDE level
axe DevToolsBrowser extManual scanning in dev tools
jest-axeUnit/componentWraps axe for use in Jest/Vitest tests
@axe-core/playwrightE2EPlaywright integration
cypress-axeE2ECypress integration
LighthouseE2E / auditIncludes accessibility audit; less thorough than axe
Pa11yCLI / CIMulti-URL crawler; uses axe or HTML_CodeSniffer
Accessibility InsightsBrowser extMicrosoft's detailed audit tool, includes manual test guidance
NVDA / VoiceOver / JAWSManualScreen readers themselves
Tota11y / SiteImproveBrowser extVisualizations of accessibility properties

Axe is the engine to standardize on; most other tools either wrap it or compete with it less effectively. Lighthouse's accessibility score is a floor, not a target — passing it doesn't mean the site is accessible.

Standards and Conformance

The standards a team typically conforms to:

  • WCAG 2.1 AA — the de facto baseline for most products. Required by most regulations (Section 508 in the US, EAA in Europe, AODA in Ontario).
  • WCAG 2.1 AAA — higher bar; rarely required, often not feasible for all content.
  • WCAG 2.2 — the current version (as of 2023); adds a few criteria around target size and focus appearance.
  • Section 508 (US federal) — references WCAG; mostly equivalent.
  • EN 301 549 (EU) — references WCAG; required by EAA.

Automated tools test against subsets of these. Passing axe is roughly "WCAG 2.1 AA on the issues axe can detect," which is meaningful but not complete conformance.

If conformance matters legally, document the testing approach (tool versions, manual procedures, scope). A VPAT (Voluntary Product Accessibility Template) is a standard format.

Where to Place the Checks in CI

A layered structure that doesn't bog down PR checks:

  • Lint: every save, every commit. Fast.
  • Unit/component axe: every PR. Per-component scope keeps it fast.
  • E2E axe: every PR for critical flows; nightly for full coverage.
  • Keyboard navigation E2E: nightly or per-release. Slower, more brittle.
  • Manual screen reader: scheduled — per release at minimum, per feature for accessibility-sensitive features.

What kills accessibility CI: putting everything in PR checks. The suite gets slow, flakes start (a focus race condition; a transition that wasn't there before), and the team starts approving with "small a11y violations, we'll fix later" — which never happens.

Common Failure Modes

"Axe says 0 violations, we're accessible"

Axe catches the structural subset. Sites with zero axe violations are routinely unusable with a screen reader. Treat the score as table stakes, not the destination.

Accessibility tested by sighted developers only

Without screen reader testing, an entire category of bugs is invisible to the team. The cheapest improvement is for one person on the team to learn VoiceOver and use it weekly.

Lint rules disabled because "they're annoying"

jsx-a11y/click-events-have-key-events flags clickable divs. Annoying because the rule is right — the div isn't keyboard accessible. Disabling the rule doesn't make the bug go away.

Manual remediation, no prevention

A bug report comes in; the team fixes that one issue; the pattern that caused it stays in the codebase. Two months later, the same bug appears in a new component. Fix the pattern (custom button → real button; aria-hack → semantic HTML) and add a rule that catches it.

"Accessibility" as a checkbox at the end

Accessibility brought in just before launch finds dozens of issues that should have been caught at component design time. Each is a small fix; together they delay launch by weeks. Build it in: lint at IDE, test per component, design with accessibility from the start.

The <nav>, <main>, <aside> semantic elements exist; the team uses <div> because of CSS habit. Screen reader users lose the ability to jump between landmarks. The fix is semantic HTML, and lint rules that flag the alternative.

One-shot audit, no ongoing testing

A consultant audits the site; gives a report; team fixes the items. Six months later, regressions everywhere. Accessibility is continuous, not a one-time event.

Pre-merge Checklist

Before a feature is "done" from an accessibility perspective:

  • Did automated tools (lint + axe) run, with zero violations?
  • Did someone tab through the entire flow with no mouse? Does focus order make sense? Can every interactive element be activated?
  • Did someone hear it with a screen reader? Does each control announce a meaningful name and state?
  • Do form errors get announced? Do dynamic updates (loading, success, error) reach assistive technology?
  • Does the page have a meaningful heading structure that a screen reader user could navigate?
  • Are color and contrast adequate, including in error states and disabled states?
  • Are interactive targets large enough for touch? (WCAG 2.2 requires 24×24px minimum.)

If the answer to "did anyone use this with a screen reader" is no, accessibility testing isn't done — it's automated. Accessibility is what users with disabilities experience, not what your test suite reports.

On this page