|
Assembly-Level Approach: The Atoms
Namespace.
In the context of ABCpdf, an Atom is
a class that represents one of the eight basic immutable data types
defined by the PDF specification. Think of them as
the building
blocks or primitive
values from which a PDF document is constructed.
Every complex structure in a PDF - a dictionary of properties,
an array of coordinates, a stream of drawing commands - is
ultimately built from simple atoms such as:
- Booleans
- Numbers
- Strings
- Names
- Arrays
- Dictionaries
Each IndirectObject and each Element contains an Atom - most
commonly a Dictionary atom.
The IndirectObject specialization provides you with useful
operations you can call into. It adds features to items defined in
the PDF specification.
The Element structure ensures you only insert valid combinations
of atoms and provides you with a menu of choices. It constrains you
to the PDF specification.
However you do not absolutely need to have that help. If you
want to operate outside the box you can operate on the raw atoms.
You can do things like:
-
Creating Custom Annotations: Adding sticky
notes, links, or form fields with specific properties not directly
supported by the high-level API.
-
Setting Rare PDF Attributes: Manipulating
esoteric or custom entries in the document catalog or page
dictionaries.
-
Debugging: Inspecting the precise internal
structure of a PDF object to understand why a document is not
behaving as expected.
-
Advanced Manipulation: Programmatically
altering the structure of a PDF after it has been created.
You can also use OpAtoms - one of the
most powerful and low-level features in ABCpdf - allowing you to
directly manipulate the content
streams that define what is drawn on a PDF page.
So what is a content stream? A PDF page doe
snot contain images and text in the way a Word document does.
Instead, it contains a series
of instructions written in a compact,
postfix-notation language. This program is called
a content stream. A graphics operator in this
language is a keyword
(like m for move or BT for Begin
Text) followed by its required operands (like coordinates). For
example, the instruction "100 200 m" means "move the
current point to (100, 200)".
An OpAtom in ABCpdf is an object that
represents a single one of these instructions or operators within a
content stream. You can take your page content stream and turn it
into an array of atoms. Then you can read through these atoms
looking for particular drawing instructions.
Because of the way that this approach is designed it is
incredibly fast and efficient. So using this power and efficiency
you can do things like determine the colors and color spaces that
are being used in a document or redact all text matching a
particular pattern.
In summary, Atoms are the raw assembly language of
PDFs. While you can build entire documents with the
high-level Doc methods
(AddText, AddImage), Atoms give you direct access to the
instruction set, letting you write, read, and modify PDF pieces and
drawing commands one operation at a time.
|