Motivation for launching the project by the customer: the need to scale the customer’s business revealed limitations in the current solution: the business model of the solution used to automate document flow using humans had a high cost and low scaling capabilities.
Description of the initial situation:
High efficiency of document flow with a minimum number of errors involves storing electronic versions of documents and rapid digitization of printed versions of documents.
Processing of printed versions of documents involves converting images into an electronic structured representation with high accuracy.
The business process used for this is not suitable for scaling, since manual labor is used to recognize printed versions of documents.
For this reason, there were significant risks for the planned business growth.
Project goals:
creation of a system for recognizing the main fields of documents with a sufficient level of quality
creation of a tool for generating new document formats.
MIL Team solution: the use of the team’s existing solutions in the field of detection and recognition of text in images based on neural network models made it possible to quickly implement a solution with the required level of quality.
Results: Solutions for recognition automation have been built: Checks and receipts; Invoice; Work order; Contracts.
To build the model we used:
Marked up document template;
Document reference fields with their characteristics;
Set of document images.
Simulation results:
OCR character extraction model;
Model for searching text blocks Text Detection;
Table Detection model;
Model for constructing an electronic version of a document.