November 16, 2016

Understanding Your Archive: Storage Requirements, Pt. 1

Welcome back to our Technical Series on Advanced Function Presentation (AFP). In recent weeks we have looked at the early days of archive technology and the legacy challenges that businesses face, the evolution of AFP from its origins as an IBM standard, and we have also explored the building blocks of AFP and how its standards interact and operate together. We have released a set of recordings to accompany this series and these can be viewed here.

In this week’s article we will build on your understanding of AFP and start to take a closer look at the storage requirements for a Customer Communications Archive. We believe that by understanding your requirements and the storage options available to you, you will be able to optimize archive performance and reduce overall costs.

What is a Customer Communications Archive?

Customer communications are typically transactional documents, generated by line of business systems, retained in an archive to provide an accurate record of a customer at a point in time. They include bills, statements, policy documents etc.
A percentage of these documents will be used by customer services to answer queries, others will be required for audit or compliance purposes, some will be accessed through e-presentment tools by internal and external customers and others will not be looked at again. In each of these cases however the documents share the same characteristics in that they must be referenceable but are static and never updated.

Archive Data Types

By considering the storage requirements for these documents and the types of required data you will begin to understand how efficiencies and cost savings can be achieved. The 3 data types required within an archive are:

Metadata: This describes the content of the document and is typically stored in a series of database tables for indexing and fast searching when finding documents. It frequently comprises a number of fields extracted from the document. With the increasing use of full text indices for data analytics these databases can be large. This data must support multiple queries and search patterns, each resulting in one or more document identifiers in order for documents to be retrieved and presented to the user. The management and optimisation of this data is delegated to the database.

Document Data: This contains the personalized data relevant to the customer. Depending on the generating system, documents are either stored individually, as each is created in its own file (full burst) or as part of a larger file, (non burst). Each approach handles the resource data in a different way.

Resource Data: This is source data shared by many documents within a daily load and across the archive. It is comprised of fonts, images and other artifacts used by multiple documents.

The next blog in this series will explore in detail the different options that are available when loading documents into an archive and explain the benefits and limitations of each.