Unable to extract data from Invoices

I am trying to extract the following data:

  • InvoiceNo
  • InvoiceDate
  • Order Information
    • ItemNo
    • Description
    • Quantity
    • Price
  • SubTotal
  • GST
  • Total

i tried using Document understanding but its not working, its not able to recognize any attribute, if i use get text, how will i get the table data?

Can anyone please help me out, I am attaching sample invoices.JaneDoe_01092020_130792.pdf (17.6 KB) JoeyTribbiani_01102020_281092.pdf (25.7 KB) MonicaGeller_09052020_87654.pdf (15.9 KB) RachelGreen_04042020_40874.pdf (20.6 KB)

Hi Aishwarya,

First use CV Screen Scope then use CV get text. It is more reliable if the position of element on your PDF is not changing then you can use CV activities.

Hope it will work

how will i get the data from the table, and also i need to add all this data in one excel

Hi @Aishwarya_Bhargava
Can you please share your current approach?
Have you used IntelligentOCR activities or the Document Understanding module available through ML models on Cloud?

so till now i tried 2 approaches

  1. document understanding
  2. data scrapping

(both are fail)

Project2.zip (89.9 KB)

Aishwarya,

Instead of going into more complexity simple use Computer vision activities install CV activities package into your studio and use first CV Screen scope then use CV get text. I have seen your pdf files and they are stable so no need for using complex method like ML models on Cloud etc…

will the get text activity work for the table from which i need to get data, one of the table is in 2 pages

The reason for me to ask about ML cloud was the same as @ghazanfar stated. These documents do not warrant a complicated approach.

Although, CV activities are designed for a remote environment like Citrix.

IntelligentOCR, on the other hand, can produce stellar results without much upfront effort or time investment.
You can also easily map tables for extraction with relative ease. See this section for a demo of table selection during IntelligentOCR training.

Also, you have all the resources you need to get started (including the project that you can readily use!)

the problem i am facing with document understanding is that, that the none of the data is getting recognized and captured to be verified in human validation center

Just try it.

okay i will try and share the output

Hi @Aishwarya_Bhargava

There is nothing wrong with your code/approach.

You just need to make one change.
image
The output of Extraction Scope - extractionResults is not being passed to the Validation Station.
I made that one change and could see results as shown above.
image

The ML extractor (or any other extractor which you use to train) should then pass the extraction details to validation. Otherwise, the extraction step is basically not doing anything productive as its results are never used.

I hope that clears the issue you have.

1 Like

yes that clears a lot of things thankyou
but i have a question, how can i get the table extracted, like i want all the rows information, how can i get it, and in some of the cases the table is spread in 2 pages

In my reply above, I have quoted the link to Intelligent Form Extractor.
Please scroll down to the section Configuring a template with table selection.

The gifs provided in this section explain how to easily extract a table.
For multi-page documents, I haven’t worked much on those. Although going by info in the same article, you should be able to use ‘Page 1 Matching Info’ and ‘Page 2 Matching Info’ to your advantage.

In my experience, when the documents get complicated, such as tables spanning across multiple pages, an advanced data extraction platform such as ABBYY or UiPath’s Document Understanding module is ideal to achieve maximum accuracy.

IntelligentOCR is still developing, but some of its limitations will probably stay the same, given that Document Understanding piece is quite capable of achieving these results accurately.

@Ioana_Gligan May we have an expert weigh in here? :slight_smile:
Cheers!

I got the expected result using document understanding.

Thankyou everyone for the help :slight_smile:

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.