skip to navigation

PDF and other non-HTML files - accessibility

"PDF documents can be difficult to navigate on screen and should not be used as alternatives for HTML or content written for the Web. PDF documents should be used sparingly, mostly as printable versions of Web material."

 - From the UK DTI "Writing for the Web" Checklist

The status of PDFs, especially whether they can be used as an accessible replacement for HTML Web pages, is an issue that most authorities seem to avoid, perhaps because they don't want to upset Adobe.

The WAI started a guide to accessibility techniques for PDFs but appears to have abandoned it, unfinished, possibly because the latest generation of PDFs is so much better than the earlier ones.

The UK Disability Rights Commission (DRC) has accidentally given some guidance on the issue. Its report, The Web: Access and Inclusion for Disabled People was first published as a PDF, with no HTML version - even six weeks later - though an RTF was soon added. That tells us something about the acceptability of the PDF file type, at least in accessibility terms. If the DRC can tolerate publishing material exclusively as PDF, even for a limited amount of time, everybody else has a reasonable excuse for doing so.

The RNIB has issued guidance on PDFs, effectively saying - use all PDF accessibility features but also provide alternative formats. It's worth reading for the suggested PDF enhancements and linking techniques.

Run-of-the-mill PDFs certainly do not conform to WAI guidelines. To be fair, Adobe has put in a lot of effort to making the PDF format more accessible (otherwise it would have been barred from US government use). Yet the amount of work involved in manually converting a standard PDF into an "accessible" PDF appears to be very similar to the amount of work required to turn a PDF into HTML. Since HTML is clearly the superior format, it's questionable whether it would ever be worthwhile to improve the accessibility of a standard PDF.

Simple PDFs are easy to convert to accessible HTML, but more complicated PDFs containing lots of important graphs and tables are not. Most of the automatic conversion software is inadequate, producing messy HTML code that mimics the original PDF layout.

One useful technique with PDFs is to use bridging pages - coded in HTML - that summarise the PDF content and offer a link to the full item. This assists users in assessing the information without committing themselves to a big download.

Excel files

Excel file are not accessible, but are commonplace on the Web for presenting large amounts of data, mainly because nobody can come up with a decent alternative format.

It's relatively easy to convert Excel files to PDF and RTF, so if you want to improve the accessibility of your data, you might like to supply it in these alternative formats as well as Excel.

Converting to HTML is trickier, and for large files might not even be worthwhile. The conversion available within Excel produces Microsoft proprietary HTML that isn't accessible and is hard to clean up (though it's worth investigating the "Compact HTML" option). A better solution is to import material from Excel into Dreamweaver MX, which produces decent HTML.

PowerPoint files

PowerPoint files are not at all accessible. The Microsoft conversion to HTML produces truly abysmal code. PowerPoint files are often full of style and little substance, so it may not be necessary (or cost-effective) to fully replicate them as HTML. Extracting the text might be sufficient. And in some cases, they might be so meaningless that maybe they don't need an alternative format of any kind.

Word files

This is a proprietary format and in that sense it is inaccessible, though Word does offer accessibility features and if these are used the accessibility is good. Most HTML pages start off as Word pages, so there isn't much excuse for mounting Word documents on the Web instead of HTML - the conversion is commonplace.

One decent excuse is when a Word document is a form that the user needs to fill in and return. There's no way round this, and HTML is clumsy format for forms. As long as the user can get the same form by regular post, there's no reason to avoid mounting it on the Web as a Word document.

RTFs

The format is not proprietary, and in that sense it is relatively accessible, as long as the files don't contain graphics - especially the PowerPoint object graphics that Microsoft Word likes to put into them. The main limitations are that RTFs can't contain language information or table header markup - and both these are needed for good accessibility. It's relatively easy to turn RTFs into HTML, so that has to be the preferred option.

On the other hand, the Disabilities Research Commission published its own report on Web site accessibility in RTF format before it published it in HTML format, so we can gather from this that RTF is considered to be a reasonably accessible format.

The bridging page technique described in the PDF section above can also be useful for Word and RTF files.

 

This Tinhat page is valid XHTML to WAI Triple-A standard