Extract Data PDF

Hi all.
I am using Screen scrapping activity and have the data in below format

Emp Code:
11111
DOJ:
28-Feb-22
Emp Name:
Mahesh S
Designation:
Consultant
DOB:
14-Jan-95
Father’s/Spouse Name:
X XYZZZ
Department:
GDIC
Location:
Bangalore

How to extract this specific data ( Like Emp Name and Designation:

You can use an invoke code where the input is the pdftext. Then in code:
Try

Dim rows As String() =input.Split(New String() {Environment.NewLine},StringSplitOptions.None)
Dim lineNr As Int32=0
Dim Name As String=String.Empty
Dim Designation As String=String.Empty

For Each row As String In rows
Console.WriteLine(row)
If row.Trim=“Emp Name:” Then
Name=rows(lineNr+1).ToString
Else If row.Trim=“Designation:” Then
Designation=rows(lineNr+1).ToString
End If
lineNr=lineNr+1
Next row

Catch ex As Exception
Console.WriteLine(ex.Message)
End Try

1 Like

Hi @happyfeat87 ,

If the Data is always in that specific format, we could perform either String/Regex operations.

Check the Below Expression for Regex Method :

Regex.Match(HTMLBodyStr1,"(?<=Emp Code:)\n.*",RegexOptions.IgnoreCase).ToString.Trim

For extracting other field values, you could simply Replace Emp Code: with other Field labels Present.

Alternately, we could also adopt Dictionary approach to dynamically attain the key value pair.

Let us know if the Regex/String methods are working for your case.

1 Like

Hi @happyfeat87
Firstly after you do data scrapping just store extracted value as a string and use following regex expression:

Thank You
SD

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.