Archival standards lie in two distinct areas. First there are the working practices and processes of archiving, which involve using effective classification schemes, subject taxonomies and meta-data indexes to ensure that what we store can also be found. The second area is archival standards, dealing with the media or more accurately the format that is used to preserve content.
Maintaining Utility of Paper
When examining the connection between media and format, there is a strong desire to maintain the utility and characteristics of paper. Culturally and physiologically we still have a strong bond with the size and characteristics of paper. Indeed, we have found digital ways of preserving this form in new technologies like the Amazon Kindle and PDF documents that deliver content as paged material.
Standards and Formats
The tension between formats that give the experience of using paper can be seen in the many archiving standards that endure today. For example, printing processes produced reports in ASCII line data had presentation qualities with very limited opportunities. Advanced Function Printing (AFP) was introduced by IBM to allow high-end printers to produce output with higher quality and more advanced characteristics such as graphics, fonts and barcodes. The techniques even allowed forms to be customized although programming was often required. Soon the ability to address all parts of the page (known as All Points Addressable) and other formats such as Xerox's Metacode, Hewlett Packard's PCL and Adobe’s PostScript were introduced. These formats made it possible to add rich graphics and presentation qualities to printed documents that were never before possible.
Supporting the need for Archiving
The challenge with these formats is that they do very little to support the needs of long term archiving. Page descriptor languages are unsuitable formats for archiving and often use verbose syntax that specialized for processing by a printer and not optimized for archival. This situation changed in 1990 when Adobe published the Portable Document Format (PDF) standard. PDF proved popular because it provided a standard way in which (page based) documents could be distributed digitally. This coincided with the growth of the internet and PDF took shape as a ubiquitous standard. Its popularity and utility was eventually recognized by the International Standards Organization and PDF/A was recognized as an international standard for archiving documents in 2004.
PDF and PDF/A provide a de-facto standard for digital document preservation with full fidelity of the original. It is universally available through web browsers and mobile devices and supports extensions that enable accessibility for those with disabilities. It guarantees authenticity through the use of signatures and uses advanced compression techniques to ensure it is as small as possible. Where previous generations of archival technology were hampered by a lack of universal standards ISO standard PDF/A is a next generation standard.
XML is also an important archival standard. Where PDF lends itself readily to the archival of paged based documents, XML does the same for data. Before the advent of XML there were many methods for transferring data which include CSV files, custom mark-up languages, binary files and more – but there was no guaranteed way of transferring data between applications and archives. XML defines a set of rules for encoding documents in a format which is both human-readable and machine-readable. Archiving XML ensures that data is represented and understood in well-defined ways, which makes it ideal for archiving.
XML and PDF/A are the current standards for archival. Recent research suggest that 75% of organizations are already using PDF and a further 32% use PDF/A. In the future, companies must address the challenge that many legacy archives use older file formats and applications which are no longer available. Migration to PDF/A and XML is a must. The clear trend is towards investment in archiving practices that leverage these standards.