PDF files aren’t all made alike. The PDF format is very flexible, allowing PDF creation software a great deal of latitude in going about the process of making a PDF file. Some software (without mentioning any names) makes genuinely poor-quality PDF files, other software might do a good job but make some unfortunate choices. Some of these problems may not be apparent to conventional users who simply look at the page onscreen but turn the PDF into an incomprehensible mush when processed by assistive technology.
While we can’t account for every clever way there is to screw up a PDF file, CommonLook PDF does include a number of fixes for some of the most common problems we encounter. These corrections are available through the Page >Fix Common Problems menu item in both the Logical Structure Editor and Verifiy and Remediate modes.
Run this tool if you’ve noticed any of the following problems in your PDF file. Note that the Fixing Common Problems tool works on a a page-by-page basis. It does not run against the whole document, but must be invoked as part of page-based verification.
We’ve noticed that many PDF files fail to include space characters at the end of text runs located at the rightmost position of a line. When space characters aren’t present assistive technology will read the two words at the end of one line and the beginning of the next as a single word.
CommonLook PDF detects and corrects this condition across all pages of the currently open document.
While it’s common practice in printing to use a soft hypen to split words between two lines, not all files include the correct encoding to eliminate the hyphen from the text processed by assistive technology. As a result, some software may read such hyphens as real characters, ie, as part of the word.
CommonLook PDF 4.2.4 detects and corrects soft hyphens across all pages of the currently open document by creating new tags with appropriate alt text attributes.
Repeated characters are a sequence of dots, underscores or dashes used for a layout purpose (ie, to provide a border, or fill space between name and page number in a table of contents. Some screen readers may read the repeated characters which inconveniences users.
CommonLook PDF detects repeated sequence of non-alphanumeric characters exceeding a certain number (set by default to 3) and replaces it with an empty string in Actual Text at the appropriate parent tag level.