PDF Text to Table Column

Hi, I need help to extract pdf text to columns. Each record span into multiple lines and total I want to extract is in last line of each record row. Pls refer attached file for sample. I also attach image of fields want to extract.

I only want to extract

  1. Air WaybillNumber (10 digits number)
  2. Shippers Reference (May be blank or Start with “C/” or “PK/” follow by 5 digits + “/” + 2 digits)
  3. Shipment Date
  4. Total

Pls advise what is the best way to extract above sample.

DHL_Det.txt (6.2 KB)

Hi @yee68

After extracting the data please give the input to matches activity and then use the below mentioned regex in the matches activity and then use the for each activity and pass the output of the matches activity to the for each activity and later use the below mentioned regex’s to extract the data you required.

System.Text.RegularExpressions.Regex.Match(InputString,"(\d{9,}\s+[A-Z]*\s*[A-Z]*\d*\/\d+\/\d+[\s\S]*?SINGAPORE[\s\S]*?SINGAPORE[\s\S]*?(\d+\.\d+))").Value

System.Text.RegularExpressions.Regex.Match(InputString,"\d{9,}(?=\s+[A-Z]*\s*[A-Z]*\d*\/\d+\/\d+)").Value

System.Text.RegularExpressions.Regex.Match(InputString,"(\d+\/\d+\/\d+)").Value

System.Text.RegularExpressions.Regex.Match(InputString,"(?<=\d{9,}\s+)([A-Z]+\/\d+\/\d+)").Value

System.Text.RegularExpressions.Regex.Match(InputString,"(?<=SINGAPORE[\s\S]*?SINGAPORE[\s\S]*?)(\d+\.\d+).Value

Regards

2 Likes

Thanks a lot. I am very new to Regular Expression and figure out for more than one day and can not figure out. Really appreciate for solution.

1 Like

Hi @yee68

Thank you. If you find the solution please mark as solution to close the loop.

Happy Automation!!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.