Here i am trying to Extract the Fields by using the Regular expression, can any one help me out how to do this for the given sample input as shown below.

Here input i am using is Native pdf, as shown below.

After Convrting in to Text i am getting this text file, as shown below.
Output_Text.txt (486 Bytes)

So the Required output in excel, as shown below.
Output Excel.xlsx (9.1 KB)

Here i am not considering the Description Column
HI @adarsh_kotagiri

Checkout this threads


Can you try the following sample?

mc = System.Text.RegularExpressions.Regex.Matches(strData,"^(?<INVREF>\S+)\s+(?<POSTDATE>\d{2}/\d{2}/\d{4}).*?(?<INVDATE>\d{2}/\d{2}/\d{4})\s+(?<GROSSAMOUNT>[-\d.,]+)\s+(?<TDS>[-\d.,]+)\s+(?<AMOUNT>[-\d.,]+)",System.Text.RegularExpressions.RegexOptions.Multiline)

Then, set the following at ArrayRow

{m.Groups("INVREF").Value,DateTime.ParseExact(m.Groups("POSTDATE").Value,"dd/MM/yyyy",System.Globalization.CultureInfo.InvariantCulture),DateTime.ParseExact(m.Groups("INVDATE").Value,"dd/MM/yyyy",System.Globalization.CultureInfo.InvariantCulture),m.Groups("GROSSAMOUNT").Value,m.Groups("TDS").Value,m.Groups("AMOUNT").Value} (4.1 KB)



Go through this video.


Hi @Yoichi

Thank you so much for the work flow, it is working fine, the only challange i am facing is i am unable get the multi line data in single row as shown below.

The output i am getting in this format.


But my Excepted Out put is
Expected Output

Please help me in this.

It seems difficult to identify which string should be added to INV.REF string in the previous line.

If string to be added always consists of numeric characters at the beginning of the line, the following will work.
(However, i suppose it may not work as there are various cases…) (4.6 KB)


Hi @adarsh_kotagiri ,

Could you maybe try checking the below post on PDF Table Extraction. Here the input is the native PDF text file itself.

Let us know if this does not help for your case.

Thank you @Yoichi its working perfectly and i learned something new.

