Pdf automation to extract specific value

Hello all,

I have un scanned pdf’s in a folder and I want to extract specific data from those pdf’s. But the condition is that, the data that I want to extract the key word varies for each pdf. Example in 1 pdf I have the key word as Amount, in 2nd one I may have as Due in 3rd I may have Total, so irrespective of case-sensitivity and different key words I should extract the value and store in text file.

Pls help

Are you using document understanding?

This sounds like something you would want to use document understanding with Intelligent Extractor.

@SenzoD no I’m not using document understanding.

Mh, you will have a hard time ensuring your solution works without using DU.

  1. Anyways, install the UiPath pdf activities
  2. Use the read pdf
  3. then you will need regex to get the value you want, an example would be: (?:Amount|Due|Total):?\s+([^\s]+)

image

I want to be more helpful but I do not have enough information on your use case, if you need help with regex, you can run the read pdf and paste the output as text and let me know which values you want out of the text from the pdf.

@SenzoD
“(?i)(?<!Sub\s)(?<!TOTAL\sDISCOUNT\s)(?<!Tax\sTotal\s)\b(TOTAL\s+DUE|Total\s+Amount|Amount\s+Due|TOTAL|Total|Amount\s+Due)\sį\s([0-9,]+.\d{2})”

This is my regex, which is unable to fetch the exact values.

share a sample of the data that your regex is unable to pick values from

The output PDF extraction is below, I need to store Total 6300.00 in FindTotal string value but FindTotal stored with 6000.00.
MR Systems Invoice BILLED TO: SSK Stores, Bandar 258, street, INVOICE DETAILS: Chennai-600 001 Invoice # 582222 Date of Issue 05/27/2024 Due Date 05/31/2024 ITEM/SERVICE DESCRIPTION QTY/HOURS RATE AMOUNT Dell Keyboard 10 200 2000 Dell Mouse 10 100 1000 SDD 10 300 3000 TERMS Sub Total 6000.00 Discount 00.00 Tax Rate 5% Tax Total 300.00 Total 6300.00

@bc265810a045804d705fa4a36

it would be good if you can give multiple samples instead of one…that way we can give the correct regex which fits

but looks like this is a best case for document understanding as you have many variations

cheers

Need Total 6300.00
Invoice1: MR Systems Invoice BILLED TO: SSK Stores, Bandar 258, street, INVOICE DETAILS: Chennai-600 001 Invoice # 582222 Date of Issue 05/27/2024 Due Date 05/31/2024 ITEM/SERVICE DESCRIPTION QTY/HOURS RATE AMOUNT Dell Keyboard 10 200 2000 Dell Mouse 10 100 1000 SDD 10 300 3000 TERMS Sub Total 6000.00 Discount 00.00 Tax Rate 5% Tax Total 300.00 Total 6300.00

Need TOTAL DUE 1022.49
Invoice2: Fish Friends INVOICE in the Life fishbowl 12,3rd,Siruseri Phone BILL TO: SHIP TO: Welliam Plank Quinn Campbell Downtown Pets Downtown Pets Ramu Theater 321 Sycamore velampalayam Albany, NY 54321 96314-9638 (+91) 96321 - 46450 COMMENTS OR SPECIAL INSTRUCTIONS: Live animals handle - with care SALESPERSON P.O. NUMBER REQUISITIONER SHIPPED VIA F.O.B. POINT TERMS Diego Sagese 789 Jens Martensson Air express Warehouse Due on receipt QUANTITY DESCRIPTION UNIT PRICE TOTAL 10 Exotic fish 65.00 650.00 100 Standard goldfish 3.00 300.00 SUBTOTAL 950.00 SALES TAX 47.50 SHIPPING & HANDLING 24.99 TOTAL DUE 1022.49

Hi @bc265810a045804d705fa4a36

Try this

System.Text.RegularExpressions.Regex.Matches(Input.ToString,"((?<=((Total)|(TOTAL DUE))\s+)\d+[.,]?\d+)").Last.ToString

Regards,