From XML to Accessible PDF: Full Automation with XSL-FO and PDF/UA

Creating accessible PDFs sounds simple, but WCAG, PDF/A-1a, and PDF/UA are ruthless about structure, semantics, and metadata.
A single mistake in tag order, table structure, or missing metadata — and the PDF is no longer accessible or archival-grade.

In complex publication chains, with documents from many sources, manual fixes simply don’t scale.

At Elk Solutions we automate the entire process. Our pipeline starts with XML as the uniform source — no matter where that XML comes from: databases, STOP/TPOD, legal systems, CMS, custom files, or specialized schemas.

The Problem: The Illusion of “Complete” PDF Export

From XML to Accessible PDF: Full Automation with XSL-FO and PDF/UA - probleem

Many organizations rely on manual or semi-automatic PDF export. Even with decent tools, structural issues persist:

Accessibility becomes a “nice to have”, while WCAG and archiving standards make it mandatory.

Reality:

The Solution: XML as Source + XSL-FO as Layout

From XML to Accessible PDF: Full Automation with XSL-FO and PDF/UA - oplossing

Where traditional pipelines “cut and paste” PDFs, we take a different route.
We use XML as the foundation — structured, predictable, semantically rich.

Then we render XML to XSL-FO, which gives us pixel-perfect and semantically correct output.

XSL-FO gives full control over:

  1. Document structure (headings, paragraphs, sections)
  2. Lists (bullets, nesting, numbering)
  3. Tables (headers, body, cell relations)
  4. Alt text and descriptions
  5. Metadata for PDF/A-1a and PDF/UA
  6. Reading order and logical structure

Because XML is the source, it doesn’t matter how it was created.
Everything becomes uniform, repeatable, and 100% controllable.

The Engine: Smart Structure Recognition & Validation

Our transformation engine does more than “convert”.
It understands structure and fixes what’s missing.

Examples:

These steps are built around international standards:

By baking this knowledge into the engine, accessibility is not a bolt-on step but integral to the publication flow.

The Result: Automatic, Valid PDFs

1. Perfectly Tagged PDF/A-1a + PDF/UA

Each PDF is delivered with:

2. Scalable, Fast, Repeatable

Applies to:

— always identical output. No dependency on individual users or Office setups.

3. Automatic Validation Chain

We validate by default with:

  1. veraPDF — official open validator for PDF/A
    https://verapdf.org
  2. PAC (PDF Accessibility Checker) — strictest PDF/UA validator
    https://pac.pdf-accessibility.org/en
  3. Matterhorn Protocol — full PDF/UA checklist

This is not optional: every PDF passes this chain before it leaves the pipeline.

Conclusion: Why This Works

Starting from XML and using XSL-FO for consistent layout yields a publication chain that:

Accessible PDFs don’t have to be a last-minute fire drill.
They can be guaranteed output of a smart, structural pipeline.

Don’t start at the PDF.
Start at the structure — and everything else follows.