Classifying and extracting data from semi-structured documents for review and downstream system updates

Classifying and extracting data from semi-structured documents for review and downstream system updates

Use Case Description

The finance department of a company is handling incoming documents such as invoices, purchase orders, and utility bills. A team of three people manage the incoming documents on a daily basis. The documents are sent in different formats such as PDF, scanned images and photographs. Some of the documents may also contain handwritten information. Further, in PDF files, the PDF document may contain more than one document. The finance team go through these documents, reorder them according to the type of the document, and the company/ person who shared the file into a shared drive. If the PDF file contains multiple types of documents, the team also splits the file based on the above descrived criteria.

The purpose of reordering the documents is to ease out the downstream activities and maintain clear references on the documents. Further, they extract key information from different documents, and send it to approval based on few business logics. The approvals are done my their immediate manager. Few of the busniess rules require reaching out to the person who sent the document to obtain more information, or fix errors in case there are any issues. Once the verification is done, the data is stored in two different applications. Purchase order information is stored in SAP, and another internally developed web-based ERP system for their tracking purposes. However, the details of the invoices, and utility bills are stored only in the internal ERP system.

Challenge in the existing process is that there is a lot of manual effort involved due to the data volume, and the processing required. Further, the business team complains about data accuracy issues, and delays in downstream activities due to delays in data entry and approval.

The proposed RPA solution captures all incoming documents by monitoring a specified email account. All the received documents are downloaded into a specified folder and passed into a Document Understanding process to classify and extract the required information. The robot automatically performs the validation of the extracted data and business logic. The users are only notified in case the robot require manual intervention in deciding what actions to perform according to the business rules. The robot further process the data by updating the downstream SAP and internal ERP system automatically.


Invoice PO.pdf

Other information about the use case

Industry categories for this use case: Finance, Logistic

Skill level required: Advanced

UiPath Products that were used: UiPath Studio, UiPath Action Center, UiPath AI Center, UiPath Document Understanding, UiPath Orchestrator

Other applications that were used: SAP, Internal ERP

Other resources: DevCon 2020:

YouTube Playlist:


UiPath Academy Links:

What is the top ROI driver for this use case?: Accelerate growth and operational efficiency