Welcome back to our AFP Series, where we help you understand the richness of AFP, the terminology and techniques required to index and transform it, and its importance in archive migration.
AFP, developed by IBM in the 80’s, originated as a proprietary page description language using the all-points-addressable concept to enable the printing of text and images on mainframe attached printers. It has evolved to become a coordinated set of document creation, viewing, archiving and printing standards. AFP is now maintained and enhanced by the AFP Consortium and ISO 18565:2015, and has been published to document and define the use of AFP as an archive data stream.
AFP is an alphabet soup of acronyms. The main AFP standard, the Mixed Object Document Content Architecture (MO:DCA) reference, describes the AFP objects and how they interact and operate together. This standard is supported by seven other object content standards that provide detail on the more complex sets of operators. There are numerous options and two compliant products are not guaranteed to process the same input data stream in the same manner.
MO:DCA Mixed Object Document Content Architecture
AFP GOCA Graphics Object Content Architecture for AFP
BCOCA Bar Code Object Content Architecture
CMOCA Color Management Object Content Architecture
FOCA Font Object Content Architecture
IOCA Image Object Content Architecture
MOCA Metadata Object Content Architecture
PTOCA Presentation Text Object Content Architecture
IPDS Intelligent Printer Data Stream
The majority of AFP files follow the object structure described in the standards. A print file comprises a resource group and a number of documents, each comprising a number of pages. These pages correspond to the natural reading order, with the layout on the physical printed page determined by the Form Definition resource. Each page comprises a number of text and graphic objects, which adhere to the appropriate standard, and may in turn reference resources e.g. fonts or images.
Not all valid, printable AFP is easy to process and repurpose. Consider the following possible examples:
Each page of the document is an image, a facsimile, with no actual characters, words or fonts referenced in the AFP file. Therefore optical character recognition technology is required to extract data.
The AFP file comprises PDF files embedded within an AFP object container
Missing font metadata or metadata not matching the font characters
Dynamically created custom fonts comprising characters from multiple fonts
With a deeper understanding of the AFP objects and their relationships, and knowledge of their use within an archive you can populate your current archives without changing your document creation platforms. You can provide the ePresentment and eDelivery services required by internal and external customers and have the knowledge to migrate legacy archives onto next generation platforms, bringing numerous benefits to your business and driving new value from old data.
Next week we dive deeper into AFP releasing some technical demos to help you further. Sign up here to receive notifications for these and other post in this series.