Extract specific fields from PDF file

Dear All

My PDF is consisted of several pages of forms (all in same formats). Each page contains “Duty Amount”.

(Page 1)
Registration No : AAB112233
Customer Name: ZZZKKY
Duty Amount: 10 Dollars
Sales Amount: 15 Dollars
Ref No: 098765

(Page 2)
Registration No : AAB115533
Customer Name: ZOOYYY
Duty Amount: 28 Dollars
Sales Amount: 91 Dollars
Ref No: 123456

(Page n)
Registration No : PPP113333
Customer Name: RRZYYY
Duty Amount: 65 Dollars
Sales Amount: 129 Dollars
Ref No: 123456

Would like to extract only 10, 28, 65. In this case, how should this be done? Regular Expressions, Split, combination of both or completely different way?

Thanks in advance!

Hi @u2018dem0528

You can try with Regex



HI @u2018dem0528

In the For each activity use this expression


Try with this expression inside the For each activity


Check out this XAML file

MatchesRegex1.xaml (10.9 KB)



Hi @u2018dem0528

The simplest way would be to use a inbuilt activity Text to left/right. Just give your text that is to left and right (Duty Amount: and Dollars). This will give you the amounts. Or use expression as below

System.Text.RegularExpressions.Regex.Match(str,“(?<=Duty Amount: )\d*”).Tostring


str.Split({“Duty Amount:”},2,StringSPlitOptions.TrimEntries)(1).Split({“Dollar”},2,StringSPlitOptions.TrimEntries)(0)

Use any of this in for loop and you will have your solution


Terrific! It worked! However, when I applied the same logic to the different but similar document containing SST Amount (MYR) : (many spaces) XX.XX, nothing was picked up.


Is this because of brackets? or many spaces before amounts?
I attempted to use the below, but nothing was returned.


Hi @u2018dem0528

You can try with this expression

System.Text.RegularExpressions.Regex.Match("SST Amount (MYR) :     73.69","(?<=SST\s*Amount\s*\WMYR\W\s:\s*)\d.*").Tostring


Thank you so much, it worked well!

Thank you so much for suggesting 2 different ways, I will study both!

