A potential client recently asked us: “Why does Product X find so many things that Tenon doesn’t find?”. It was a good question to ask, because all testing tools are different have their own unique approaches to testing. A cursory glance at the results the potential client showed us highlighted a handful of differences. Product X did find a handful of legitimate things that Tenon did not. However, twice as many things Product X found were also false positives. In order to understand the differences between Tenon and Product X, I reviewed a list of tests they have documented on their website and went through each one to see how Tenon fared.
Summary of results
|Need research – Likely good||10||9.26%|
|Need research – Likely bad||26||24.07%|
- We test
- This means we have a test aimed at checking the same thing.
- Need test
- This means we do not currently check for that thing but agree we should and that we can do so with a high degree of certainty
- Need research – likely good
- This means we do not currently check for that thing. If doing so is possible with a high degree of certainty, we’ll probably add it
- Need research – likely bad
- This means we do not currently check for that thing and consider it highly unlikely that it is possible to do so with a high degree of certainty.
- Highly questionable
- This means that we think performing such a test with any worthwhile certainty is extremely unlikely. This may also mean we think the test is just without merit, outdated, or misguided.
- This means that they used the same exact wording for a test more than once. Note: in the above table, the percentages add up to more than 100% because we don’t count duplicates as part of the whole.
- Event handling
- These are tests aimed at testing event handling
A note about "questionable" tests: One example of such a test is checking for alternative content within the
object tag. This sort of test misunderstands how browsers (and therefore assistive technologies) handle
object. The only time this content is exposed to users is if the user’s browser doesn’t support the embedded content, which is incredibly unlikely in modern browsers. Tests like this are likely to confuse testers and developers.
Based on the assessment above, 43% of their tests risk accuracy issues that can lead to the types of “false positives” that the accessibility community have been highly critical of since the days of Bobby.
Its 12 tests relating to event-handling only check for old-school HTML event handler attributes and not events bound by DOM scripting. That said, those tests are valuable tests because we still find sites which use this method for events. Tenon does not yet have any tests for event handling and we should.
As of this writing: Tenon currently has 62 distinct tests. Product X has 108. There is a direct correlation between 24 tests. We’re OK not having as many tests, as our focus is more aimed at quality over quantity. However, product X has 16 tests that we agree we should add and again another 12 tests for event handling we should add as an interim solution while we finalize more robust events-based testing. Finally, they do have another 10 tests that are a good idea, if they can be done reliably.
The correlation of only 24 test means that we also have 38 that they don’t have. When you count the total potential tests we don’t have, that also adds up to 38. All-in-all, a legitimate delta of 76 tests. This is probably less significant than it seems, because it might be a case where one or more of the unaccounted-for tests (on either side) are otherwise covered by another test. For instance, we have a test for blink and marquee in one test whereas they list them separately.
If you’re planning on spending money on an accessibility testing tool, you owe it to yourself, your organization, and your end users to spend the time necessary to determine which tool is right for you. Analyze each tool for its accuracy, relevance, and ability to fit in with your organization. There are many things that Tenon does that no other tool can do, and we’re committed to broadening that gap even further. But ultimately you’re the one who must decide which tool gives you the best value and meets your expectations the most. Part of that decision involves comparing the test results and, as you can see, there may be very big differences.