Prev Case

Enabling Document Recognition and Automating Invoice Processing

Next Case

Industry: Healthcare, Business, AI/ML


  • Automation of manual invoice processing by 8–9 times
  • Parallel processing of 12–15 documents per minute in comparison to aggregating a single document for 5–10 minutes
  • Achieved recognition precision of 70%

Technologies Used: JavaScript, PHP, MySQL, Selenium, Tesseract OCR

Methodology: Agile


Based in the USA, the customer provides invoice processing services to healthcare organizations. Under the healthcare system of the USA, insurance is partly covered by the government and partly by the private insurance companies. The customer is engaged in aggregating the invoices submitted by medical institutions to estimate the share to be covered by the state and insurers.


The submitted invoices—sent to the customer as scanned PDF documents—should be uploaded to the third-party system for further analysis and invoice split. However, all the organizations used different templates to formalize and structure the documents, which significantly complicated the process of recognition. Furthermore, some documents were incomprehensible even to a human eye.

When the customer turned to HQSoftware, all the information available through the submitted invoices was added to the third-party database manually, which dramatically slowed down the processing and required a lot of staff to be engaged. This resulted in immense amount of documents left unprocessed over the period of four years.

Collaborating with HQSoftware, the customer wanted to automate document recognition and upload, while achieving unification across document templates sent for estimation.


Powered by machine learning, the delivered solution is an application that features a set of pre-trained templates capable of recognizing the information available through the submitted PDF invoices and organizing it into an e-form with editable fields. The solution supports a variety of cloud file hostings, such as Google Drive, Dropbox, etc., as well as can be customized to support any other hosting of choice.

1 -

Once the documents are uploaded to the delivered application, it automatically transfers the data to the third-party system. However, this system did not have an API and was only accessible through a website. For this reason, developers at HQSoftware utilized Selenium WebDriver to emulate user interaction with a website and enable upload in the background mode.

Employing Tesseract OCR, our engineers ensured that the system recognized and distinguished between five types of document templates with precision of 70%. To achieve such high result, experts at HQSoftware created a collection of most common templates submitted by healthcare institutions and performed up to 30 training iterations over a document type.

For the cases when an invoice lacked some particular information, such as an address, our team delivered an algorithm that allowed for checking other invoices submitted by the organizations, matching the missing data with the appropriate institution, extracting the necessary information from the available documents, or removing the duplicates.

If the solution was unable to recognize the document, the system would transfer the file to a separate folder, mark it as for manual processing, and send notification to an administrator.


Partnering with HQSoftware, the customer was able to automate the process of invoice recognition with a precision of 70%. By digitizing the workflow, the company enhanced document processing by 8–9 times. The automation allowed for parallel processing of 12–15 documents per minute, while it previously took 5–10 minutes to aggregate a single invoice.

Read more about our Data analytics services.

Check Out Other Works

See How We Approach Business Objectives

Customer Service for S7 Airlines 353x235 -
Developing a System That Improves Customer Service of S7 Airlines
IoT data visualization 353x235 -
IoT: A Data Visualization Solution for Sensor-Based Smart Skin
gesture based mobile game with ML 353x235 -
A Gesture-Based Mobile Game With Machine Learning
Kick Off With Your Project Today

    *Required Fields

    Attach File

    We are open to seeing your business needs and determining the best solution. Complete this form, and receive a free personalized proposal from your dedicated manager.

    Sergei Vardomatski 100x100 -

    Sergei Vardomatski