Why Web Managers Hate (and thus ignore) PDF

Home / Why Web Managers Hate (and thus ignore) PDF

[[include:_PageDate name=why-web-content-managers-ignore-pdf]]  Return to Logical Structures

A bar chart. See following table for data.

Values given are millions of files. Source: Google “filetype:X” search.

2011-04 288 156 48 12
2012-01 746 511 89 5.2
2012-08 1300 225 236 16

Many websites are repositories of PDF files, and the volume of PDF is growing.

From product manuals to textbooks, from account statements to application forms to annual reports, PDF is everywhere.

If you only listened to web content managers you might imagine that PDF doesn’t exist at all. PDF is the WCM blind-spot.

Why do so many content professionals ignore PDF even though the format occupies such a large volume of content on so many websites?

The vast majority just don’t think about it. Heavy though their PDF collections may be, their perspective is blinkered by the (substantial) limitations of today’s content management systems which tend to focus on HTML/CSS and JavaScript.

PDF is treated as a dead-end resource similar to an image. Never mind that a PDF might contain 6, 64, 643 or 643,000 pages of vital text and graphics, a form loaded with JavaScript, or whatever; the CMS doesn’t know or care.

Authors of PDF files are rarely held to account for the quality, usability or accessibility of the PDFs they produce. That’s why we see broken files, long documents without bookmarks, huge files posted without Fast Web View enabled, files with oversize pages still showing the printer’s crop-marks, missing fonts, files with no metadata, flat-out broken files that won’t even open, and on and on.

These are not isolated instances; these are typical errors encountered everyday on the websites of Fortune 500 companies.

Asking web content managers to take responsibility for the PDFs they post means asking them to consider the whole website, not merely view it through the soda-straw of their CMS. What’s it going to take?

The Web is Much More than HTML/CSS & JavaScript

Clearly, the web is far more than HTML & CSS. Even so, HTML-centric developers thus far have been allowed to largely define “content” to suit their technical expertise, not their customer’s needs.

I’m going to get into the question of why people use PDF in general in another piece. Here I want to focus on two key factors about PDF that contribute to the extraordinary invisibility of the People’s Document Format in the web content world.

So, why are things this way? I’ve chosen somewhat provocative headings to inspire a little soul-searching. Or flames. Or both! We’ll have to see.

PDF is Democratic, Content Management Systems are Authoritarian

PDF is fundamentally and profoundly democratic technology because it puts document creation and appearance control firmly in the hands of the author (and whoever controls the templates they use).

By contrast, HTML renders differently based on OS, browser and (sometimes) the day of the week. Web folks consider “appearance” to be their domain; something to be managed (and changed) via CSS. For many reasons, this just “isn’t ok” for lots and lots of content. Thus we have PDF, and it’s not going anywhere.

PDF is Accountable, HTML/CSS/JavaScript can always be Fixed

Post a PDF and others will download it locally or email it or otherwise abstract it from the website. The document just exists without servers, CSS – anything. You can’t tweak the style-sheet and you can’t redirect. A PDF is a record; it’s a fact. Some think this is a limitation; actually it’s one of several quiet but nonetheless killer features of the PDF format.

HTML deployed on the web lasts until someone updates the server (or converts the HTML to PDF in order to “record” it).

PDF is PDF, our CMS Understands HTML

While authors always come up with PDF files including pages from any old source; InDesign, CAD, old documents, scanned pages, PowerPoint and more, each and every one of those oddly-sourced PDF files is manageable by a single piece of software: your favorite PDF editing application.

Website CMS software tends to work well with HTML, allows management of CSS and doesn’t really differentiate between other types of objects, much less take advantage of the richness within.

What’s a Web Content Manager To Do?

From the web content management standpoint, the PDFs they post are generally Someone Else’s Problem. That’s the flip-side of democracy. I get that.

Even so, while any CMS is (of course) capable of posting any bucket of bytes online, your software and your approach almost certainly doesn’t encourage you to consider PDF files as “web content” no matter how critical the material or how large a proportion of the site’s actual content PDFs represent.

PDF can’t stay in the black box. See the graph above – PDF files are and will continue to accumulate on websites for all sorts of good reasons.

Instead of ignoring the pile of PDFs accumulating on their servers, content managers must take an active role in ensuring that the PDF files they post meet relevant standards for appearance, quality and accessibility. Most of the time, this means helping document authors understand how to produce better PDF files.

PDF is a huge part of the web, even if web content managers usually don’t create PDF files directly. Expertise in managing PDF content should be part of the core skill-set for web content managers.

Return to Logical Structures