Extract data from pdf document

I have a pdf document which is an Invoices and I want to extract the information like Invoice No, Issue Date, Total Amt,Tax Amt etc. Can anyone please tell how I can automate this using UiPath.

@waseem

  1. use Read PDF Text activity to read data from PDF file and will give you output as String.

  2. And then use String manipulation methods or Regex to get required details from it.

Hi
we can either use READ PDF or READ PDF with OCR activity where pass the filepath of pdf as input and get the output with a variable of type string named str_output
–now use a ASSIGN activity where we can either use REGEX or SPLIT or SUBSTRING method to get the term we want
this can done only after reviewing the output of the pdf that we get from pdf activities

Cheers @waseem

Hey,Can you help me out on how to use Sub string to extract data eg: subtotal,Tax

1 Like

sure
can i have a string obtained from the pdf if possible with those terms in it
so that i can come up with a expression

cheers @waseem

BILL_2020_0002.pdf (32.0 KB)

from this I need to extract Reference no i.e 2020/0002 and PO no i.e P00013

1 Like

Fine
I don’t have my system
Can you do me a favour
Kindly read this pdf with READ PDF or READ PDF with OCR and share that string output so that i can give you the expression based on it
Cheers @waseem

is this text fixed Vendor Bill BILL/ 2020/0002

here you go
check this sample workflow i’ve used your sample pdfReadPDF.xaml (6.0 KB)

No this is Not fixed all invoices have different reference number

is that format is fixed?
4digits/4digits ??

Yes the format is fixed

to get that value you can use this below regex
\d{4}/\d{4}

Hi @waseem

Check this

\d{1,4}\d{1,4}

Thanks
Ashwin S

1 Like

Thankyou :grinning: :grinning:

working??
@waseem

I tried using Sub string and it worked.
Thankyou for your help.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.