PDF - Invoice Data extraction only of product name and Quantity

Hi Team,

How are you?
I came across a query as I am trying to do some automation on CSR.

I have to extract the details like Item description, Quantity and Price from the PDF file, can you please advise how IntelligentOCR can help me on this as I am trying to do with this only but not able to create the workflow.

If you have any other suggestion for me on how to extract the details, please let me know.

Below is an example how the data is in PDF and it may vary

@lakshman
@Palaniyappan
@RishiVC1

image

2 Likes

Hi @Rahulsinha

I think you can make use of the Intelligent OCR for this one. They have Position based extractors which you can easily configure to capture the data in the grids if the format of the document is the same all times.

First you need to do is to create a template using the taxonomy manager and classify the document. Next use the position based extractors if the structure is the same. You can also try to use the machine learning extractor to see how it can extract your fields…

How you build the workflow, you can refer to this sample workflows and create your own…

2 Likes

@Lahiru.Fernando
Thanks, Here I have two queries
1- How to use Taxonomy Manager, I have read about it in the portal but I am not clear on it as after it creates the JSON file what next needs to be done.—> Can you please explain in deatails

2- If it will be position based then it might be a problem as different PDF may have different format in that case how it will pick.

Thanks!

1 Like

@Rahulsinha

You can get a much better understanding on how to use the Intelligent OCR part and building workflows, if you try to follow the 2019.04 updates course in the UiPath academy. It explains well on how to use the taxonomy manager and how to build the intelligent OCR workflows…

For your second question, if the position is changing time to time along with the formats, I would suggest to give a try with Machine Learning extractor or the regex based extractor or the combination of these…

Go through the course first to get a better understanding…

1 Like

@Lahiru.Fernando
This course is really helpful but I am not able to analyse how may time we need to define to the extractor. I mean how it can be resolved and also as the output it is not giving the data which I have selected in the PDF.
Kindly let me know if my query is not clear

1 Like

Hello @Rahulsinha,

The machine learning extractor is using a pre-trained model that does not adjust with the manual corrections submitted by the users. So it does not learn at the moment.

Ioana

1 Like

Thanks for the response. :slight_smile:
Could you please advise in my query How should I process on this.
I need to extract the PO number, Date and the data which is available in the table, how can I extract all these data in an excel sheet.

1 Like

Could you send me a real document to have a look at?

1 Like

Sorry, not be able to share the real document but here is the copy(kind of) of it.

1 Like

Hello @Rahulsinha,

Without a few real life documents, I cannot propose a solution. Try looking into the Regex Based Extractor and the Position Based Extractor if you have just a few variations of where / now the data appears.

Ioana

2 Likes

its not working :frowning:
How to extract all the data available in the Table from PDF

1 Like

@Rahulsinha ,

Please share the data in the text format or txt format.

Just read pdf text activity and put the data in a variable and then share the data here.

Based on that , a solution can be purposed.

Thanks
Minal

1 Like

Hi Ioana,
I am facing a issue with the ML extractor, its able to fetch all details of a invoice but somehow not able to fetch invoice qty details which is in a table format.
Initially I thought it could be a pdf issue but its not working for any pdf.
Is this is a known issue?

Thanks
Abhishek

1 Like

Hello @abhishek.tyagi, and Welcome to our forum!

Any chance you could share a sample file with me, in private, for analysis? The workflow failing to extract the quantity would also be of great help.

meanwhile, do check out our examples here: How to use the IntelligentOCR Package - do these samples, once you upgrade the packages, work for you?

Ioana

Hi Ioana,

Thanks for the reply, but I think I resolved the issue by myself.

Abhishek