On this page: Text running together | Soft hyphens | Repeated characters | Demonstrations

Page menu, Fix Common Problems highlighted. PDF files aren’t all made alike. The PDF format is very flexible, allowing PDF creation software a great deal of latitude in going about the process of making a PDF file. Some software (without mentioning any names) makes genuinely poor-quality PDF files, other software might do a good job but make some unfortunate choices. Some of these problems may not be apparent to conventional users who simply look at the page onscreen but turn the PDF into an incomprehensible mush when processed by assistive technology. 

 While we can’t account for every clever way there is to screw up a PDF file, CommonLook PDF does include a number of fixes for some of the most common problems we encounter. These corrections are available through the Page >Fix Common Problems menu item in both the Logical Structure Editor and Verifiy and Remediate modes.

Common Problems

Run this tool if you’ve noticed any of the following problems in your PDF file. Note that the Fixing Common Problems tool works on a a page-by-page basis. It does not run against the whole document, but must be invoked as part of page-based verification.

Text Running Together

We’ve noticed that many PDF files fail to include space characters at the end of text runs located at the rightmost position of a line. When space characters aren’t present assistive technology will read the two words at the end of one line and the beginning of the next as a single word.

CommonLook PDF detects and corrects this condition across all pages of the currently open document.

Soft Hyphens

While it’s common practice in printing to use a soft hypen to split words between two lines, not all files include the correct encoding to eliminate the hyphen from the text processed by assistive technology. As a result, some software may read such hyphens as real characters, ie, as part of the word.

CommonLook PDF 4.2.4 detects and corrects soft hyphens across all pages of the currently open document by creating new tags with appropriate alt text attributes.

Repeated characters used for presentation

Repeated characters are a sequence of dots, underscores or dashes used for a layout purpose (ie, to provide a border, or fill space between name and page number in a table of contents. Some screen readers may read the repeated characters which inconveniences users.

CommonLook PDF detects repeated sequence of non-alphanumeric characters exceeding a certain number (set by default to 3) and replaces it with an empty string in Actual Text at the appropriate parent tag level.

Video Demonstration: Common Problems tool

Fixing Common Problems with CommonLook

Video Demonstration: Cleanup Tags tool

Cleaning up empty tags during remediation

Back to the Top