Welcome to the May 2017 issue of the CommonLook Accessibility Newsletter!
This month we’re examining the basic requirements of an accessible PDF document.
Not sure about your level of document compliance download our Free CommonLook PDF Validator.
As always we welcome your suggestions for upcoming topics and would like to know what you think of CommonLook’s Accessibility Newsletter.
The CommonLook Newsletter Team
We often talk about an organization or documents being “compliant” with legislation and/or specific standards. Most of us “know” what that means. This article is a refresher for those of us with years of experience and a starting point for those new to the field of PDF and/or document accessibility.
For PDF documents, being compliant with any standard or guideline, whether it be WCAG or PDF/UA means that someone who cannot decode or read the visual representation of the document can still determine the document content and the structure of that content. This is done through “tagging” the document.
No matter which standard or guidelines you use, there are some basic elements to an accessible tagged conforming PDF document.
The document itself must have a language attribute.
Although we would like to identify specific dialects and regional pronunciations when assigning the language attribute, this can create a barrier for the person using a screen reader or Text-to-Speech tool.
The choice of which synthesized voice used is a personal one. After all, someone using screen reading or Text-to-Speech tools listen to the computer for many hours a day. If a person uses a British voice with British pronunciations and the language attribute for a document says it is American English, the person who chooses or uses the British pronunciations will be forced to use an American sounding voice with American pronunciations of words. This is often disorienting as text is not pronounced and heard in a familiar way. It can take quite a while to adapt to listening to content using pronunciations that someone is not used to hearing.
When assigning the language attribute, use plain language assignments like English, French, Spanish, Portuguese and German. Avoid using language attributes like EN-CA (Canadian English), EN-US (American English) or EN-AU (Australian English)…see how complex it can get in pronouncing words?!
If a paragraph or section is in a different language, the language attribute is changed for that specific content. This can be done for an individual tag or a <Part> or <Sect> tag.
If a word or phrase is in a different language, the <Span> tag is used to isolate the word or phrase and the language attribute is changed for the <Span> tag.
What this means to the person reading the document using either a screen reader or Text-to-Speech tool is that the content/text will be pronounced using the appropriate speech synthesizer. For example, if the document is in English and a paragraph is in Spanish, the English voice will be used until the screen reader or Text-to-Speech tool comes across the tag with the Spanish language attribute at which point a Spanish voice/pronunciation is used. Once the adaptive technology finishes that paragraph and leaves that tag, the voice/language reverts back to English, or the core language for the document.
It is the same for a word or phrase that has the <Span> tag to allow the identification of a different language. The part of the sentence that is in English will be read using an english synthesized voice, the Spanish voice will read/pronounce the Spanish word or phrase and then the English voice takes over again to continue reading the sentence.
The document must be in a logical reading order.
Adaptive technology such as screen readers or Text-to-Speech tools read the tags tree. As you move down the tags Tree, the content should flow in the order in which it is read in the document. This can be verified using the Highlight Content in the tags Tree which should be turned on by default. In some cases, like a brochure, this may not be the same order as it appears in the document. Someone reading through the document has to be able to make sense of what they are reading.
The tags must be correct for the type of content they are assigned to.
Content must have the correct tags. For example, headings must use the <Hx> tag where “x” represents a heading level. The heading levels must be sequential and cannot skip levels. Headings cannot be used for paragraphs of text. Headings are navigational elements in a PDF document.
Bookmarks in a PDF document should mirror the headings and heading levels in the document.
Some adaptive technologies are able to get a list of headings in a document to allow for quick navigation to a specific topic. For those using adaptive technology that is not able to provide a list of headings or move through headings in a PDF document, Bookmarks are an essential navigational tool. Bookmarks can be used by everyone and let the person reading the document quickly find a topic without having to go back to a table of contents (if there is one in the document).
Each paragraph must have a <P> or Paragraph tag. While a single <P>tag can include the individual lines for a paragraph, one paragraph has one <P> tag. Multiple paragraphs cannot “share” the same <P> tag.
Lists must be tagged correctly
Lists must have a parent <L> tag with the correct <LI>, <Lbl> and <LBody> tags under it.
The correct use of the list tag is important for anyone using a screen reader or Text-to-Speech tool because a list structure identifies a relationship between pieces of content. Adaptive technology can announce that a person is entering a list, how many items are in the list, each listed item and when the person is leaving the list.
Tables must be tagged correctly.
Speaking of relationships, tables must also be correctly tagged in order for adaptive technology to be able to provide information about column and row titles (also known as table headers); and how a number in a data cell is related to those column and row titles.
Consider what you would discern and decode if all you heard was “$525, cell B6.”
By correctly tagging a table and identifying table header cells, someone using adaptive technology would be able to get the following information: “2014, Jesse Doe, $525, cell B6.” By adding the year of sales and the salesperson, the number has a relationship to something other than the cell coordinates. The table header information can be supported with a caption/summary and hopefully surrounding content.
Alt Text for Images and Links
The last “basic” item is Alt Text for images and links. Images that are decorative can be put in the background so that they are ignored by adaptive technology. Alt Text is given to images that support the content of the document. Links are provided with Alt text so that when someone using a screen reader or text-to-Speech tool navigates by link they hear something like “CommonLook Global Access” rather than https://commonlook.com/accessibility-software/commonlook-pdf/.
Consider this barrier when it comes to links: if all links begin with http…then a person using a screen reader or Text-to-Speech tool who can get a list of links will have to listen to all of the characters in all of the links in order to find the one that they want. By adding Alt text to links, someone can press C for CommonLook Global Access and quickly move to the first item that begins with C. By repeatedly pressing the letter C, they can cycle through specific links until they find the CommonLook one. This is a much faster and less frustrating way of locating the link you want.
These are the very basic elements of a tagged accessible conforming PDF document. These are the must have’s on a list of PDF remediation techniques and elements. As we can see from the benefits to the person who is reading the PDF document, all of the basics listed here provide navigation, context and content in a way that is easy to read through.
Most screen readers and some Text-to-Speech tools have a tool that lets someone get a list of links or a list of graphics/images. This allows for more granular access to specific types of content.
As a last word, when starting with an untagged PDF document, Adobe Systems has an established hierarchy of tasks to perform, a process to go through before you add tags to a document:
- If the document is a scanned image of a document, perform OCR (Optical Character Recognition) which is found under Enhanced Scan in the Tools Task pane Adobe Acrobat DC.
- If the document is a fillable form, add the form fields which are found under the Prepare Form tools in the Tools Task Pane in Acrobat Pro DC.
- If the document has links, add the links to the document using the Link tools which are found in the Edit PDF tools in the Tools Task Pane in Acrobat DC. You can use the Create web addresses from URL’s to start with and then manually add any links the automated tool may have missed (such as links beginning with www instead of http).
AFTER all of these items have been asked, answered and dealt with, THEN you tag the PDF document.
Join our weekly webinars that explore leading tools and approaches for managing PDF and Section 508 or WCAG 2.0 AA accessibility at the author, quality assurance and enterprise levels.