Automated data input technologies have a relatively long history - dating back to the days when the first optical reading systems were developed to recognize stylized symbols drawn according to templates. Since that time, they have evolved to support a vast industry, utilizing a large set of very different technologies.
The traditional machine-readable form processing technologies of today are well-established. A large choice of systems capable of processing many types of machine-readable forms is now available. Today's advanced systems can accurately capture machined printed and handwritten characters and process thousands of documents per day. ABBYY FormReader is one of the leading products in the field, capable of handling both printed and hand-printed forms (see http://www.formreader.com or contact ABBYY for a whitepaper and additional information on ABBYY form processing technology).
Yet while today's form processing systems are very advanced, they are still limited in functionality. For example, the task of processing semi-structured documents, or forms and documents on which the sizes and locations of fields of key pieces of data varies from document to document, still remains the most challenging task in data capture. While the demand for solutions to address this area is extremely high, form processing programs have not been flexible and intelligent enough to process these types of documents without extensive customization and system training. Access to an easy-to- deploy,cost-effective solution for processing such documents as invoices, order forms, legacy forms, and template-based contracts has been, until now, virtually inaccessible by a large audience.
For these types of cases, even when full-text documents are being handled, the ultimate aim is to extract a particular set of fields, or key pieces of information, from a given page. We will refer to such documents as flexible forms.
Flexible Forms
forms normally contain data that is required or requested by the organization using the form. This variable data can include such things as names, addresses, and monetary amounts. On traditional forms, key pieces of data can be found in exactly the same fields, located on exactly the same position on the page, in the same sized field, from document to document. Processing this type of document is relatively simple if the form and the form template are designed carefully. The system simplyneeds to match the scanned form with the template to know what information to extract and how to extract it.
read more
More...