Read Label value (different label name) from different format pdf

I am starting with Uipath and I want to read the Invoice Number from different format pdfs where

  • Invoice Number could be mentioned as Invoice Number or Invoice No. or Invoice # or Invoice#
  • Invoice Number could be in any place in the pdf as formats are different.

Can I do this using regex? or should I use some Intelligent OCR tool?

If the invoice number can be at different places then Regex might not work…

But if thats the only place Invoice is mentioned in a page then Regex may work considering we can label Invoice #or No or Number …

Or you try Document Understaing using ML extractor which can identify this if its on anywhere on the page …

Can I use a combination of Document Understanding (Using ML Extractor) and Regex to achieve this.

@mojuneja Yes…You can use multiple extractor in Document Understaing ( inside Document Extraction scope activity).

@prasath17 Thank you. I was looking for a way to implement this using only one template. This helps. I will dig more into Document Understanding.

1 Like

@mojuneja - but you get this same type of document in more numbers right?

I don’t have a count on the invoices of different types but the target is to achieve below in a single template

  1. Invoice number is mentioned with different labels in different invoices like Invoice Number or Invoice No. or Invoice # or Invoice#
  2. Invoice number is not in a fixed place

Hi @mojuneja

You can try uisng regex extractor by providing regex as below

        (?=Invoice No.).*|(?=Invoice 
 Number).*|(?=Invoice#).*

It can extract the invoice number even if the invoice number is in the above as u told

Or

U can try with machine learning extractor too as told by @prasath17

It would be also a good option

Regards

Nived N :robot:

Happy Automation :relaxed::relaxed:

@NIVED_NAMBIAR Do you mean that with regex we can extract the invoice number if the invoice number is not in the same place in different invoices (other than labels being different)?

Yes @mojuneja

Using the regex it will identify the pattern and find the required data correspondingly

Thanks @NIVED_NAMBIAR. I will do the setup and try this.

hi naveed how can i use this expression in workflow using is match activity or any other way please share workflow i appreciate

Hi @Aleem_Khan

In document understanding , there is an extractor named as regex extractor which helps to extract the data based on regex

U can check out video on document understanding of how to extract the text based on regex extractor

Regards

Nived N :robot:

Happy Automation :relaxed::relaxed:

Same issue I’ve around 15 templates . Please suggest any approach to work with different pdf templates, to fetch eight different values of the pdf , field is same but in different templates and in different languages