How to extract required data from text file which has duplicate values?

Hi Team,


Value Date Bank Value Date Deal No Instrument Amount Balance

Bank : ABJBJHDBHJA Account Number : ABCD
Currency : CHN Account Name : ABCDEFG
10/1/2022 10/1/2022 OB 0.00 0.00

Closing balance : 0.00

Bank : ABCDEFG Account Number : HJGJHGJHGJH
Currency : INR Account Name : JHGJHGJHGJ
10/1/2022 10/1/2022 OB 0.00 0.00

Closing balance : (268,235.15)

Bank : TEST Account Number : gdjhagdjhsagd
Currency : EUR Account Name : fgjahgdjhasg
10/1/2022 10/1/2022 OB 0.00 0.00

Closing balance : (50,000.30)

Bank : qwrtqwe Account Number : dgjasgjdhsa
Currency : USD Account Name : bdgjshagdjhasgdagsjh
10/1/2022 10/1/2022 OB 0.00 0.00

Closing balance : (40,000.60)

Printed on 01/3/2020 Page 1 of TEST

This is the sample PDF and I have extract PDF file and output of the PDF file which is store in text file.
I need in All Bank, Account Number, Currency, Account Name, Closing balance from text the file. is it possible?

I am able to extract only one value 1 but I need all the data. I believe we need to loop but I am not sure how to loop?

Hi @Reddy_1

Follow the below steps

Say the text is stored in variable(str)

  1. Drag a for loop and change the type argument to string and in the ‘List of items’ pass str.Split(Environment.NewLine(),StringSplitOptions.None)
  2. Inside the for loop add a if condition to check if the contains ‘Account Number’ like CurrentItem.Contains(“Account Number”)
  3. In the True side add the assign activity and inside add command to extract the account number using another split str.Split(“Account Number”,2, StringSplitOptions.None)(1).Trim

This should give you all the account numbers



Hope the following sample helps you.

mc = System.Text.RegularExpressions.Regex.Matches(strPdf,"Bank\s+:\s+(?<BANK>.*?)\s+Account Number\s+:\s+(?<ACCOUNTNUMBER>.*?)\s+Currency\s+:\s+(?<CURRENCY>.*?)\s+Name\s+:\s+(?<NAME>.*?)\r?\n[\s\S]+?Closing balance\s+:\s+(?<CLOSINGBALANCE>[.,\d()]+)") (2.8 KB)


1 Like

First you need split each part (1,2,…)using split method and indexof() method after that apply regex to fetch value.

Thanks for quick reply.

Every thing working fine only problem is for the currency it was taking the next field name.

Like we need only EUR but are getting EUR Account


Sorry, I had a mistake. Can you try the following?

 System.Text.RegularExpressions.Regex.Matches(strPdf,"Bank\s+:\s+(?<BANK>.*?)\s+Account Number\s+:\s+(?<ACCOUNTNUMBER>.*?)\s+Currency\s+:\s+(?<CURRENCY>.*?)\s+Account\s+Name\s+:\s+(?<NAME>.*?)\r?\n[\s\S]+?Closing balance\s+:\s+(?<CLOSINGBALANCE>[.,\d()]+)") (2.8 KB)


1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.