PDF Document Form, Data Extraction, (Intelligent) Form Extractor

VanjaV · August 14, 2020, 7:47am

Hi,

I need to extract data regarding:

Specifications
Typical Properties
Technical Features
Application
from attached documents as examples:
ACURE-500_EN_A4.pdf (198.6 KB)
ACURE-510-100_EN_A4.pdf (194.1 KB)
ACURE-510-170_EN_A4.pdf (195.4 KB)

The middle of the document is set in two columns.
Sometimes there is only text to be extracted, another time a table.
The content ad 1-4 (as anchors) varies in length.

How can I do it with UiPath? I have already tried using Intelligent Form Extractor,
but don’t find the functionality to support it. Am I using it wrong?

Thx for any suggestions,
Vanja

shetanshudhar · August 14, 2020, 10:50am

@VanjaV Please create multiple taxonomy and train your intelligent classifier and then intelligent form extractor on all 3types of PDFs. The more types/variations of documents , you will train the easier it will be for Document Understanding to extract data correctly.
Please have a look at the video below and make sure you are using all components and following all steps.

VanjaV · August 14, 2020, 4:16pm

@shetanshudhar Very big thx for advice.

I don’t know the logic of UiPath ML Trainer. Let’s say:

I define 1st template with Technical Features and corresponding text box (custom area) size 6 wide and 2 tall.
Then I define 2nd template with Technical Features and corresponding text box (custom area) size 6 wide and 4 tall.
If workflow receives 3rd type of document (that is not defined as template) with Technical Features and corresponding text box (custom area) size 6 wide and 3 tall, how will it respond? Will it be recognized?

shetanshudhar · August 17, 2020, 11:46am

@VanjaV it will only recognise the templates for which it has been trained. for others it might recognize some content but it would not be a 100%.

VanjaV · August 18, 2020, 7:25am

shetanshudhar thx.

Sorry, but I don’t get it. Why do you have to train the documents, if it only works with templates?

To get all possible combinations in mentioned case I need to prepare around 30-50 different templates.
Then, I have same type of document to be applied for more than 100 companies, which multiplies the templates number by indefinite number of combinations.

Why can’t I just define the custom area between two paragraph headers (as anchors) either as text or a table? What is the difference between Form Extractor and Intelligent Form Extractor?

Topic		Replies	Views
Document Understanding, examples other than Invoices Document Understanding studio , document_understanding	2	1823	November 5, 2020
How to use the Intelligent OCR for any PDF(other than invoice ) ? Both by Regex and Machine Learning Extractor? Studio uiautomation , activities	7	2476	September 4, 2020
Hi, all how to extract multiple page PDF data through document understanding, as i'm trying to do but unable to get expected output Studio studio , question , output_panel	12	3095	December 23, 2022
Unstructured PDF Document Understanding	13	2198	April 19, 2022
Using Form Extractor but shows not extracted in Present Validation Station Document Understanding form-extractor , invoices	5	892	July 7, 2023

PDF Document Form, Data Extraction, (Intelligent) Form Extractor

Related topics