Use these libraries to create accessible PDFs

Electronic accessibility abstract concept vector illustration.

Introduction

The need for businesses and organizations to create accessible PDFs is inevitable. Read what library your IT needs to deploy so that your organization can create accessible PDFs right away.

Accessible documents and the law

The legal situation regarding documents, including PDFs, is country-specific. But the trend in Western countries is obvious. Almost all countries already have laws or guidelines that cover the topic of accessibility.

In the U.S., it is regulated by Section 508. In the EU, the deadline has already passed for every EU member state to include accessibility in its laws.

Of course, the laws are general and do not only apply to files such as PDFs. However, it is clear that the importance and urgency of accessibility has increased in the digital world as well.

While each country may have its wording and requirements for accessibility, there is still some sort of guideline that digital product creators follow. These are essentially the “Web Content Accessibility Guidelines” (WCAG).

Who needs accessible PDFs?

Anyone who has read the guidelines or laws will see that it is primarily public institutions such as public authorities that are required to make their websites and digital services accessible. 

However, companies such as banks, transport companies, electricity providers, etc., whose products are used by a wide range of people, should also make their online services accessible now or in the near future.

Even if your company is not directly addressed by the legal requirements, the topic will not pass you by without leaving a trace. On the one hand, it is certainly in your interest to reach as many customers as possible, and on the other hand, positive expectations quickly take hold. If you later attract negative attention because your website or documents are inaccessible, this can quickly generate negative publicity on social media.

In other words, the issue of accessibility will affect all businesses that have online offerings in the foreseeable future.

While web pages can be made accessible well through their underlying HTML structure, there are some difficulties to overcome with PDF. Originally, PDFs were nothing more than a collection of instructions that described where to place which colors and shapes, much like an image.

What makes a PDF accessible?

Before we look at the technical implementation of accessible PDF, we should recall the thinking behind accessibility. Accessible documents are about making the content accessible to as many people as possible. At the same time, certain people have the most diverse limitations. The following examples are only meant to hint at the range.

  • Limited field of view. For these people, it makes sense to always place related content in proximity to each other.
  • Weakness in contrast. These people have difficulty distinguishing some colors and shades of gray. Therefore, avoid colorful fonts on colorful backgrounds.
  • Blurred vision. These people are forced to zoom in greatly on all views.
  • Blindness. These individuals rely on a screen reader. The screen readers can only interpret structured content.

While some barriers in a PDF can be removed by a clever choice of design and layout, other barriers need to be addressed by adding specific meta-information.

As was mentioned earlier, PDFs mainly have instructions on where and what to draw. Screen readers have difficulty reading content from a PDF in a structured form because they simply cannot know how the content relates to each other or what it is supposed to depict. That’s why tags have been introduced into PDFs. These additional tags are based on the structure of HTML. That is, if a heading is to be displayed in the PDF, this heading gets a h1, to h6 tag. If a text section is printed, it gets a p tag. Analogous to tables in HTML, the individual table sections are also tagged with tbody, tr, td, etc.

With this information, screen readers immediately know what type of content is in the document. This makes it much easier for users to navigate between headings, lists, or tables.

How to create accessible PDFs automatically?

Most PDFs that a company or government agency sends out are machine-generated. Companies have different templates and replace placeholders with customer data. The result is a PDF personalized to the user. In this case, the creation of PDFs is done by developed programs.

In this or similar way, most files sent by companies are created. It is all the more surprising that there are only a handful of products for creating accessible PDFs on the server side.

The following table lists some libraries that can be used to create accessible PDFs.

LibrarySupported programming languagesPricesConvert HTML to accessible PDFs
.NET PDF Library von Syncfusion.NETfrom $995 per developer per yearyes
PDFlibC, C++, Java, .NET, Objective-C, Perl, PHP, Python, RPG, Rubyfrom 1390 €no
PDFixC#, C++, java, python
or
all via CLI
1320 € per yearno
pd[4]ml.NET, Java
or
all via CLI
from 220 €yes
iText 7.NET, Javafrom $0
(Open Source Community Edition)
yes
openhtmltopdfJava
or
all via CLI*
$0
(Open Source)
yes
Libraries for creating accessible PDFs
As of: September 2022

* For the tool openhtmltopdf written in Java, I created the wrapper Html2PdfUa. With the wrapper, you can create HTML to accessible PDFs directly from the CLI.

As we can see, the choice of libraries is very limited. Furthermore, working with libraries like PDFLib is extremely cumbersome. The library allows you to influence everything, but even the simplest things have to be implemented in an extremely cumbersome way. The main difficulty arises when you want to insert dynamic content. In the nature of dynamic content, you don’t always know how big it is. This in turn affects when and where a page will wrap. This is where PDFLib becomes extremely unwieldy. In the worst case, you are forced to calculate yourself how much space your content will take up and place the page breaks selectively.

You can reach your goal much more comfortably and quickly if you create an HTML template and convert it into a PDF. The library tags the PDF as specified in the HTML. With some CSS “page-break” rules, you can also easily control where the pages should break. 

Of course, the comfort of use comes with a few costs. Because you have to rely on the library and cannot influence everything in the PDF. It might be tricky with things that do not exist in a normal HTML. For example, tables of contents with reference to the page or footnotes. It is therefore worthwhile to have a look at the documentation beforehand. With openhtmltopdf and Html2PdfUa these things could be implemented well. 

If you just want to create simple documents with some text, you are well served with HTML-based PDF libraries. Otherwise, you should take a close look at your requirements and check the documentation for possible solutions.

Accessible PDF checker

As mentioned above, accessibility is extremely versatile. There are plenty of things to consider. A great help are on the one hand the guidelines of w3c and on the other hand tools to check for accessibility.

The tool for checking PDFs for accessibility that has become the most established so far is called PAC3. PAC3 is free of charge and therefore should definitely be used when creating your accessible PDFs.

An example of an accessible PDF created with openhtml2pdf can be downloaded from my GitHub repository and tested with PAC3.

Conclusion

Perhaps you have already seen the small symbol on various pages. Behind this icon are settings for accessibility / user assistance.

We have noticed that the importance of this subject will increase. As a result, you will see this symbol more and more often. So, it is better to deal with unavoidable issues sooner rather than later.

In this article, I have dealt exclusively with accessible PDFs. This is because there are not many solutions to this issue yet. However, there is movement in the development. Among others, there is a project group working on creating accessible PDFs from LaTeX, which is already partially possible. On the other hand, Chromium also has the possibility to create accessible PDFs during export. Unfortunately, the solutions are not yet fully developed and can only be used productively to a limited extent.

If you already want to convert your PDF templates to accessible versions, I recommend using one of the libraries above.

For technical implementation of the HTML templates for openhtmltopdf I am at your disposal.

Published
Categorized as Allgemein