Regular expressions - Extracted text from pdf

I need to get the checked values from pdf. Is there any regex to do this?
pdf image


Extracted text from pdf
sample text file.txt (318 Bytes)

Hi @Lalitha_Selvaraj

Try this

(?<=0 ).*

Regards,

1 Like

@Lalitha_Selvaraj

(?<=\s0\s+).*

In the Notepad the checkboxes are written as 0 “So i write the regex based on that”.Use Find Matching Patterns activity to get Ienumerable and iterate it using For loops.

1 Like

@Lalitha_Selvaraj

Pattern : "(?<=0 ).*"

Output:

image

Regards,

1 Like

Hi @Lalitha_Selvaraj

=> Read Text File
Output → str_Text

=> Use below syntax in Assign:

CheckedMatches = System.Text.RegularExpressions.Regex.Matches(str_Text, "\d+[A-Za-z ].*")

CheckedMatches is of DataType IEnumerable(System.Text.RegularExpressions.Match)

=> Run a For Each loop and use the below syntax in Assign:

For Each currentMatch in CheckedMatches
   Assign -> CheckedMatch = If(System.Text.RegularExpressions.Regex.IsMatch(str_Text,"\d+[A-Za-z ].*"), System.Text.RegularExpressions.Regex.Match(currentMatch.ToString,"(?<=0\s+)[A-Za-z].*").Value, String.Empty)
   Log Message -> CheckedMatch.ToString
End For Each 



Hope it helps!!

1 Like