Read last lines of a PDF text

I am opening up a pdf using the read pdf text activity and am trying to select a certain string from the last few lines of the page by using parse on the pdfstring. What would be a good solution to accomplish this?

Regex is usually the most powerful way to handle this, UiPath have the activity Matches that can do this.

Upload a sample or two of your text and bold what you are trying to obtain.
If there is a pattern to the text then Regex will be possible.

the portion in my pdf I am trying to extract lies in this line in the middle of the page:

\nSignature: Venkategowda, Arpitha (133709) By signing this timesheet you are certifying that hours were incurred on the charge Approval: Zade, Rajesh (49554)\r\nDate: Mar 11, 2020 6:18:04 AM and day specified in accordance with company policies and procedures Date: Apr 10, 2020 9:46:09 AM

Now there are many β€œDate:” values in this pdf but I need to get this specific date (the Apr 10th) and the approval name Zade, Rajesh.

Can regex look for Approval and then we can do string manipulation on it to find the date some how?

There are several expressions to get what you need, like this:
(^Approval: )(.*)( )
the name Zade, Rajesh, using the returned match.groups(1).value
but about the date i dont understand what is the characteristic of the data you need, is that the last one in the text always?

1 Like

Hey @tsorrill

Yes, assuming the sample text is constant then Regex will be possible. Have a look at the below :slightly_smiling_face:

For the Date you could try this:
Regex pattern:
(?<=\sDate: )\w+\s\d+,\s20\w+
Regex101 Link
Description: the space before the word Date makes this result unique.

image

For the Approval name try this:
Regex pattern:
(?<=Approval: )(.*)(?=\s(\d+)
Regex101 Link
Description: This will capture all text between β€œApproval: " and " (49554)”. The numbers in the brackets can change and be any length (as long as it atleast 1 digit) :slight_smile:

image

More information and sample are always better when finding a Regex patten :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.