|
This function scans the document, performs a semantic analysis
on it and tags it for accessibility purposes using techniques
broadly based around the PDF/UA standard and Section 508
compliance.
User accessibility standards like PDF/UA rely on Tagged PDF.
This standard was initially put forward by the Association for
Information and Image Management (AIIM) but was later adopted via
the ISO in the form of ISO 14289-1 Document management applications
-- Electronic document file format enhancement for
accessibility.
Strictly speaking Section 508 refers to the application rather
than the document but of course this means that the producers of
applications which consume documents are in a good position to
mandate that documents must conform to certain standards so that
the application can provide appropriate information. This is what
people generally refer to when they talk about PDFs as being
Section 508 compliant. Appropriate tagging achieves that aim.
Tagging allows people who have disabilities to have the content
in a PDF presented to them via different mechanisms, For example an
accessible PDF would provide information on page structure to allow
a PDF reader to speak the content of the document. However there
are a variety of assistive technologies available, ranging from
readers to magnifiers to navigational aids.
Tagged PDFs are the same as normal PDFs but they have been
annotated with metadata in the form of PDF tags. This metadata is
required because PDF documents contain good layout information but
little semantic structure. The tags that are required supply this
semantic structure. The way they are inserted and operate is
defined in the Adobe PDF Specification. The types of tags that are
used and the way they are used are defined by accessibility
standards such as PDF/UA.
The semantic analysis provided by ABCpdf is based around reading
order and results in the the logical structure of the PDF being
determined. Content that is regarded as irrelevant is tagged as
being an artifact in line with the PDF/UA standard.
Images present a particular challenge as automated processes do
not find it easy to generate descriptions from bitmaps. However if
you know what the different images represent you can tag each
PixMap
object dictionary with a "XXAlt" entry referring to a StringAtom.
When the MakeAccessible function is called these entries will be
picked up and used and then deleted.
The MakeAccessible function will result in any existing tagged
content being discarded. This is necessary because, while it is
possible to determine if a document is already tagged, it is not
possible to determine if it has been correctly tagged. Many PDF
consumers do not understand tags and will not update the metadata
appropriately if they operate on such documents.
To determine whether a document has been tagged you can find the
"MarkInfo" entry in the Doc.Catalog as
a DictAtom and
then get the "Marked" entry of that as a BoolAtom. The
default is false.
You may wish to set the SaveOptions.CompressObects
property to true, to reduce output size on save.
|