How to capture the data from the scanned PDF with different nature

I want to capture the Invoice ID, Date and customer id etc from the PDF’s with different nature.
Anyone have any idea on this?

Please see the attached file as reference.
All files will be PDF files.

Invoice 1.pdf (28.8 KB)

Hello @vaibhav2.chavan,

I work for a company where we have many different supplier invoices so i’m in similar boat to you. The way i got round it was as follows:

Get all PDF’s into one folder. then do a For Each loop on each file in directory.
Use a Read PDF activity (might have to DL it from the manage activities area) and set the output to String type variable (Lets call this variable OUTPUT).
Then use a string split method to isolate the desired string…Think of it as finding an anchor word (a word that maintains the same position relative to the desired word for each instance of same supplier invoice) Then splitting the rest of the String variable to narrow down / isolate the word you’re after.

Here is what the Read pdf OUTPUT variable looks like for your invoice:

INVOICE
1 Main Road
Johannesburg
South Africa DATE
leon@robopro.co 2017/09/29
TERMS
Net 30 Days
Mellicent Ivoshin
Dynazzy
37 Carpenter Court
Sinilian First
560-390-2703
mivoshincp@gravatar.com
DESCRIPTION QTY (hours) UNIT PRICE () AMOUNT ()
Service Fee 6 200,00 1 200,00
Additional Services 7 75,00 525,00
1 725,00
If you have any questions about this invoice, please contact
[Leon, leon@robopro.co]
2170
CUSTOMER ID
INVOICE #
Thank you for your business! TOTAL
279
BILL TO

For your attached invoices i found the following splits for you:

INVOICE = Output.ToString.Split({“co.]”+vbCrLf},2,StringSplitOptions.RemoveEmptyEntries)(1).Split({vbCrLf},2,StringSplitOptions.RemoveEmptyEntries)(0).Trim

DATE = Output.ToString.Split({“DATE”+vbCrLf},2,StringSplitOptions.RemoveEmptyEntries)(1).Split({" "},2,StringSplitOptions.RemoveEmptyEntries)(1).Split({vbCrLf},2,StringSplitOptions.RemoveEmptyEntries)(0).Trim

I hope you can use this to find the rest of the variables you’re after.

Cheers
MikeB

1 Like

@MikeBlades - Thanks for your response. I will try and let you know in case of any issues.

HI
though we use regex it can handle only different templates of pdf, but if the text format or if its position changes then regex wont be able to handle it buddy
did we try with ABBY FLEXI capture on this
hope this video could give you some insights

Cheers @vaibhav2.chavan