Testing Guide

Automated vs manual testing:
what each catches

Q: If automated tools only catch 30-50%, are they worth using?

Yes. Catching 30-50% of issues automatically at scale is valuable and frees manual testing for areas that need human judgment.

Q: Which automated accessibility testing tools are best?

axe-core by Deque is the industry standard with the lowest false positive rate. WAVE, Lighthouse, and Pa11y are also reputable.

Q: How often should I run automated scans?

Ideally as part of every deployment. At minimum, scan weekly or before major releases.

Q: What screen readers should I test with manually?

NVDA with Firefox/Chrome, VoiceOver with Safari, and JAWS for enterprise users. Test desktop and mobile.

Q: Can AI fully automate accessibility testing?

Not yet. AI can augment testing but accessibility requires understanding human experience and context.

Automated tools catch 30-50% of accessibility issues. Here's what that means for your testing strategy — and why both approaches are necessary.

Start Automated Scanning See the Breakdown

The Data

The ~40% reality

Industry research consistently shows that automated accessibility testing has significant limitations.

30-50%

of accessibility issues found by automated testing

Industry research consensus

~70%

of WCAG criteria require human judgment to properly evaluate

UsableNet analysis

total WCAG 2.2 success criteria (Levels A, AA, AAA combined)

W3C specification

Why the gap exists

Automated tools are excellent at checking objective, technical criteria: Is there alt text? Does color contrast meet ratios? Is there a form label? But accessibility is ultimately about whether real people can use your content — and that often requires human judgment.

Questions like "Is this alt text actually meaningful?" or "Can a screen reader user understand this interaction flow?" can't be answered by checking code alone. They require context, interpretation, and testing with actual assistive technologies.

Automated Testing

What automated tools reliably catch

These are the categories where automated scanning excels — issues with clear, objective pass/fail criteria.

Missing text alternatives

Automated tools reliably detect when required text is missing entirely.

Images without alt attributes
Form inputs without associated labels
Buttons and links without accessible names
ARIA labels that reference missing IDs

Color contrast failures

Programmatic color analysis can measure precise contrast ratios against WCAG requirements.

Text that fails WCAG contrast ratios (4.5:1 for normal, 3:1 for large)
Link text indistinguishable from surrounding text
Focus indicators with insufficient contrast
Graphics and UI components below 3:1 ratio

Structural HTML issues

DOM analysis reveals structural problems that affect assistive technology parsing.

Heading hierarchy problems (skipped levels)
Missing language declarations
Duplicate IDs causing ARIA reference failures
Tables without proper headers

ARIA implementation errors

Automated tools can validate ARIA syntax and structure against the specification.

Invalid ARIA attribute values
Required ARIA properties that are missing
Conflicting roles and properties
ARIA references pointing to non-existent elements

Automated scanning is valuable precisely because it catches these issues quickly, consistently, and at scale.

Manual Testing

What requires human judgment

These areas can't be fully evaluated by automated tools — they require testing with real assistive technologies and human assessment.

Keyboard navigation flow

Can users navigate logically through the page using only a keyboard?

Tab order follows visual reading order
Focus doesn't get trapped in modals or widgets
Custom components are fully keyboard operable
Skip links work and target correct content

Why manual: Automated tools can detect if elements are technically focusable, but can't assess whether the navigation experience makes sense to a real user.

Screen reader coherence

Does the content make sense when read aloud sequentially?

Content flows logically without visual context
Interactive elements announce their purpose clearly
Dynamic content changes are communicated appropriately
Complex layouts maintain meaning when linearized

Why manual: Screen readers interpret pages differently than visual rendering. Only testing with actual screen readers reveals how content is experienced.

Focus management

Is focus handled correctly during dynamic interactions?

Focus moves to modal when opened
Focus returns appropriately when modal closes
Focus moves to error messages after form validation
Single-page app navigation manages focus on route changes

Why manual: Proper focus management requires understanding user intent and interaction context — something automated tools cannot infer.

Dynamic content accessibility

Are changes to the page communicated to assistive technology users?

Loading states announced via live regions
Form validation errors read aloud
Notifications don't interrupt user tasks inappropriately
Infinite scroll or lazy loading handled accessibly

Why manual: The timing, frequency, and appropriateness of announcements requires human judgment about user experience.

Cognitive load assessment

Is the content understandable and the interface predictable?

Instructions are clear before complex interactions
Error messages help users understand how to fix issues
Navigation is consistent across pages
Time limits are appropriate or can be extended

Why manual: Cognitive accessibility is about comprehension and mental load — concepts that require human evaluation.

Alt text quality

Does alternative text actually convey the image's meaning?

Alt text describes the image's purpose in context
Decorative images are marked as such
Complex images have adequate descriptions
Alt text isn't redundant with surrounding content

Why manual: Automated tools can detect presence of alt text, but can't evaluate whether it's meaningful, accurate, or appropriately concise.

WCAG 2.2

New criteria: automatable or not?

WCAG 2.2 added 9 new success criteria. Here's how they break down for automated testing.

Criterion	Level	Automatable?	Notes
2.4.11 Focus Not Obscured (Minimum)	AA	Partial	Can detect some cases where sticky elements might obscure focus, but can't reliably test all scroll/focus combinations.
2.5.7 Dragging Movements	AA	No	Requires testing whether single-pointer alternatives exist and work correctly — needs manual interaction testing.
2.5.8 Target Size (Minimum)	AA	Yes	CSS dimensions can be measured programmatically. One of the more automatable new criteria.
3.2.6 Consistent Help	A	Partial	Can detect presence of help mechanisms, but determining "same relative order" across pages requires human verification.
3.3.7 Redundant Entry	A	No	Requires understanding form purpose and whether data should be auto-populated — context-dependent assessment.
3.3.8 Accessible Authentication	AA	No	Evaluating whether authentication requires "cognitive function tests" needs human judgment about the task's nature.

Key takeaway: Of the 6 new Level A and AA criteria in WCAG 2.2, only one (Target Size) is fully automatable. The rest require partial or full manual testing. This pattern — where newer criteria often address more nuanced accessibility concerns — suggests that manual testing will remain essential even as automated tools improve.

Strategy

A practical approach

How to combine automated and manual testing for effective accessibility coverage.

Use automated scanning for continuous monitoring

Run automated scans regularly — ideally as part of your CI/CD pipeline. This catches regressions quickly and ensures new code doesn't introduce obvious accessibility issues. It's efficient, consistent, and scalable.

Catches ~40% of issues automatically
Provides consistent baseline across pages
Identifies issues to prioritize for manual review

Target manual testing at high-impact areas

You can't manually test everything with limited resources. Focus manual testing on critical user paths, complex interactive components, and pages that automated scanning flags as problematic.

Test key user journeys (signup, checkout, core features)
Review complex widgets (modals, carousels, custom forms)
Validate with actual screen readers (NVDA, VoiceOver, JAWS)

Document and track everything

Maintain records of both automated scan results and manual testing findings. This creates an audit trail that demonstrates ongoing compliance efforts — important for legal teams and regulators.

Track issues from discovery through remediation
Document known limitations and planned fixes
Show progress over time with historical data

Our Approach

Why we built inclly this way

Our tool philosophy is shaped by what automated testing can and can't do.

Automated scanning to catch the ~40%

We use axe-core, the industry-standard accessibility testing engine, to scan for issues that can be reliably detected programmatically. This catches missing alt text, contrast failures, structural issues, and ARIA errors.

AI-powered prioritization for what needs attention

Not all issues are equally important. We help prioritize based on severity, impact, and frequency — so you know where to focus manual testing efforts and remediation work.

Honest flagging of what requires human judgment

We don't pretend automated tools can catch everything. Our reports clearly indicate which issues are confirmed violations versus which areas need manual review. We tell you what we can't test.

Automated scanning is the essential first step — you can't fix what you don't know about. But it's not the complete solution. inclly fits into a broader accessibility strategy, not a replacement for one.

Frequently asked questions

Common questions about accessibility testing approaches.

If automated tools only catch 30-50%, are they worth using?

Catching 30-50% of issues automatically, at scale, with no manual effort is valuable. Automated scanning catches the low-hanging fruit quickly and consistently, freeing your team to focus manual testing on the areas that actually need human judgment. The tools complement each other.

Which automated accessibility testing tools are best?

axe-core (by Deque) is the industry standard and powers most commercial accessibility scanners including inclly. It has the lowest false positive rate and most comprehensive rule coverage. WAVE, Lighthouse, and Pa11y are also reputable tools with different strengths.

How often should I run automated scans?

Ideally, as part of every deployment through CI/CD integration. At minimum, scan weekly or before major releases. Continuous scanning catches regressions early when they're cheapest to fix. Point-in-time audits miss issues introduced between scans.

What screen readers should I test with manually?

For comprehensive coverage: NVDA (free, Windows) with Firefox or Chrome, VoiceOver (built into macOS/iOS) with Safari, and JAWS (paid, Windows) if your audience includes enterprise users. At minimum, test with one screen reader on desktop and one on mobile.

Can AI fully automate accessibility testing?

Not yet. AI can help with tasks like suggesting alt text or identifying potential issues, but accessibility ultimately requires understanding human experience and context. AI tools can augment testing but can't replace the judgment calls that manual testing provides.

Should I aim for zero automated test failures before manual testing?

Fixing automated test failures first is efficient — these are often the easiest issues to address. But don't wait for perfection. Manual testing can uncover more serious issues that automated tools miss. A balanced approach addresses both in parallel.

Continue Learning

Explore related guides to build a complete accessibility testing strategy.

Start with automated scanning

Catch the issues that automated tools can find. Get clear reports that tell you what's broken, what's flagged for manual review, and where to focus your efforts.

Scan Your Site Free View WCAG Checklist

Automated vs manual testing:
what each catches

The ~40% reality

Why the gap exists

What automated tools reliably catch

Missing text alternatives

Color contrast failures

Structural HTML issues

ARIA implementation errors

What requires human judgment

Keyboard navigation flow

Screen reader coherence

Focus management

Dynamic content accessibility

Cognitive load assessment

Alt text quality

New criteria: automatable or not?

A practical approach

Use automated scanning for continuous monitoring

Target manual testing at high-impact areas

Document and track everything

Why we built inclly this way

Frequently asked questions

If automated tools only catch 30-50%, are they worth using?

Which automated accessibility testing tools are best?

How often should I run automated scans?

What screen readers should I test with manually?

Can AI fully automate accessibility testing?

Should I aim for zero automated test failures before manual testing?

Continue Learning

Testing Tools Comparison

Manual Testing Guide

WCAG 2.2 AA Checklist

Accessibility Audit Trail Guide

Start with automated scanning

Automated vs manual testing:what each catches

The ~40% reality

Why the gap exists

What automated tools reliably catch

Missing text alternatives

Color contrast failures

Structural HTML issues

ARIA implementation errors

What requires human judgment

Keyboard navigation flow

Screen reader coherence

Focus management

Dynamic content accessibility

Cognitive load assessment

Alt text quality

New criteria: automatable or not?

A practical approach

Use automated scanning for continuous monitoring

Target manual testing at high-impact areas

Document and track everything

Why we built inclly this way

Frequently asked questions

If automated tools only catch 30-50%, are they worth using?

Which automated accessibility testing tools are best?

How often should I run automated scans?

What screen readers should I test with manually?

Can AI fully automate accessibility testing?

Should I aim for zero automated test failures before manual testing?

Continue Learning

Testing Tools Comparison

Manual Testing Guide

WCAG 2.2 AA Checklist

Accessibility Audit Trail Guide

Start with automated scanning

Automated vs manual testing:
what each catches