One of the more significant challenges in doing accessibility testing with an automated tool is trying to get an idea of how the page actually performs for users. The truth is, the only truly reliable thing a tool can claim is that it found X-number of instances where the code failed the tool’s specified tests. Assuming all of those tests reliable, the end user is still left with little more than a list of issues. Recently one of our customers asked “How do we know if we’re Compliant or not?” Although we’ve already discussed compliance in a previous post, we recognize that this is something users are often concerned with. Our gut reaction is to say that you’re not “compliant” if you have any errors, but that’s admittedly not very practical. Determining compliance is a lot trickier than that and in some cases there may even be exemptions that apply that no tool can be aware of.
In the very early days, Tenon did generate a “Grade” of sorts. We stripped the feature and for good reason. In our opinion, grading is not something an automated tool should do, because getting a good grade is likely to mislead the user into believing that their system is accessible. In reality, a good grade by a tool merely means you’ve passed those tests which the automated tool can perform. It doesn’t mean that you’ve passed all of other testing, such as manual testing, that is also required. An automated tool’s job is to find errors. That’s it. Anything beyond that is over reaching.
That being said, determining a grade can be valuable in prioritization efforts. It makes sense to focus your remediation efforts on pages that have the lowest grade. In that context, let’s discuss how you can use Tenon’s API response data to “Grade” a page.
Every response from Tenon includes a globalStats
node. (See: Overview of the Tenon API Response) This section includes two important values: allDensity
, which is the global average percentage of errors per KB of document source, and stdDev
which is the standard deviation in all allDensity
across all tested pages. These data points exist to inform the grade calculation which, ultimately, will be a score of how your page performs against all other tested pages on the web.
Here’s what the relevant globalStats
information looks like:
"globalStats": {
"errorDensity": "152",
"warningDensity": "12",
"allDensity": "164",
"stdDev": "396"
},
Using the information above, here’s the math behind generating a percentage grade:
max = allDensity + (3*stdDev);
min = 0;
if(score >= max){
return 0;
}
else if(score <= min){
return 100;
}
else{
return 100 - ((score/ max) * 100);
}
These percentages can then be used to provide a letter grade. This table is based upon common letter grades in the United States:
Percent | Letter Grade |
---|---|
98 – 100 | A+ |
94 – 97 | A |
90 – 93 | A- |
87 – 89 | B+ |
83 – 86 | B |
80 – 82 | B- |
77 – 79 | C+ |
73 – 76 | C |
70 – 72 | C- |
67 – 69 | D+ |
63 – 66 | D |
60 – 62 | D- |
60 | F |
What about a gut-check?
At this point, Tenon has tested almost 600,000 distinct URLs. This is a high-enough number for the below information to be statistically significant. This chart represents the distribution of issue density among tested pages. As a pure gut-check, if your page’s density falls on the higher end of percentages, it is performing significantly worse than most other pages on the web and you can assume that if most of your pages have high density, you’re probably facing higher-than-normal risk.
Stat | # of Pages | Pct. of Pages |
---|---|---|
Pages with 0% error Density | 33417 | 8% |
Pages with 1-10% error Density | 5928 | 1% |
Pages with 11-20% error Density | 96077 | 23% |
Pages with 21-30% error Density | 61665 | 15% |
Pages with 31-40% error Density | 53234 | 13% |
Pages with 41-50% error Density | 38011 | 9% |
Pages with 51-60% error Density | 34042 | 8% |
Pages with 61-70% error Density | 24763 | 6% |
Pages with 71-80% error Density | 20379 | 5% |
Pages with 81-90% error Density | 19336 | 5% |
Pages with 91-100% error Density | 0 | 0% |
Pages with 100%+ error Density | 0 | 0% |
Why Density?
Some may wonder why we’ve used Density as our measurement. This is because errors-per-kilobyte is going to be more reliable than a count of raw issues. A raw issue count can be deceiving. If you have 10 errors on a small page then that page is assumed to be performing worse than a larger page with the same number of errors.