Invoice data extraction using document undertading

Christodoulos · June 14, 2023, 3:55pm

Greetings community, i would like your input in a project i am trying to do.

So, i have some invoices(pdf) that are multiple pages.
All of them have a summary in the first page (Some information and a small table 2,3 rows with the categories of the expenses and the amounts as well as a total amount after the table).
The rest of the pages are those categories in details (2,3 lines of text at the top and then a table with expenses 1 by 1).

The problem is with the detailed pages because the tables are not clearly written, there are no headers, no lines to separate the rows just spacing and the ML extractor seems not be able to identify them correctly.
In the template setup the ids that come up to match with the taxonomy fields are very wrong.

I cannot use a form extractor because the placement of the table depends on its size…

The pdf is separated into its pages and i made 2 taxonomy types for the 1st page and the rest.
Do i just need more invoices to train the ML or have i approached this wrong?

Any input is welcome , thanks!

Nitya1 · June 14, 2023, 5:19pm

Hi @Christodoulos

Training the ML model with more invoices can definitely help improve the extraction accuracy. The model needs to learn from a diverse range of invoices to better understand the patterns and structures of the tables on the detailed pages.
Consider alternative extraction approaches: Since the tables on the detailed pages are not clearly defined, you may need to explore alternative extraction methods.

Thanks!!

Christodoulos · June 15, 2023, 7:20am

Thanks for the input!

With other methods you mean without document understanding extraction? Because i tried all the extractors in that scope.
I will try to get my hands into more training samples and if that doesn’t work am thinking trying to manipulate the string returned from the digitilization.

Thanks again!

Christodoulos · June 16, 2023, 12:26pm

Do you know how i can train further the existing public endpoints provided from UiPath?

system · June 26, 2023, 6:19am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Need help data extraction in PDF Invoice Help activities	0	886	August 28, 2019
Only tables extraction from scanned pdf Activities ocr , table	3	653	March 22, 2023
Invoice Process : using ML Extractor Help studio	3	881	October 11, 2019
Issue in Table data extraction using Document understanding Activities orchestrator , activities , document_understanding	8	1713	May 20, 2022
Invoice lines extraction from document understanding packages Document Understanding activities , question , document_understanding	1	1152	October 6, 2020

Invoice data extraction using document undertading

Related topics