Specific PDF Data Extraction using Regular Expression

Hey all,

I need to extract specific PDF data from this PDF file.
I need to extract Number, PNR, Flight No, From, To, Place of Supply and Grand Total from the PDF and write into an Excel file.

Thanks in advance

1 Like

Working on it :slight_smile:

one more pdf.zip (11.1 KB)

I am just having problem with Grand Total, but I’m going to figure it out

1 Like

Hello @NiranjanKN,

You can use the following high level activities to achieve this:

  1. user Read PDF Text activity and you will have the content as a string
  2. Perform string operations to get the exact data from that string. (Substring, Matches or regex)
  3. Assign the values to specific Variables (Use assign activity)
  4. Perform Excel activities, to write there the information from variables.

Vasile

1 Like

@srdjan.suc Same as you my friend . I could also do it till Place of Supply.
But, couldn’t execute for the Grand Total.

Hello @wasea,

I could extract all the data like Flight Number, From, To and Place of Supply.
But, the Grand Total couldn’t be Extracted like ReadPDFAirport.xaml (11.6 KB) from @srdjan.suc

I can but it throws me Unrecognized Delimiter exception.

I Tried using “\bGrand Total\s+\K(.*)” then splinting the string on Spaces and selecting the (1) element. Should work in theory but i need to find replacement for \K

Seems like \K is not supported by .Net

1 Like

one more pdf.zip (11.5 KB)

There, done.
Please mark it down as a solution <3

1 Like

For the last one i used this:

Then selected the 1st grouping like
LongValue = RegexResult(0).Groups(1).Values

After that i splited the LongValue into StringArray

StrArr = LongValue.Split(" "C)**

After that i used index 1 to select the desired result

Total = StrArr(1)

1 Like

Thanks man. @srdjan.suc

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.