Hi @indra,
Actually my original post was not a question but a solution: I just posted it in case someone else ran into the same issue with Regex patterns that contain line breaks (in this case I was replacing text between a lookahead/lookbehind).
The code that works is:
Regex.Replace(readPDFText,"(?<=" & " person—" & “)([\s\S]*?)(?=" & "Not over" & ")",Environment.NewLine).ToString
(shown above with ampersands for clarity sake but this can be reduced to:)
Regex.Replace(readPDFText,"(?<=person—)([\s\S]*?)(?=Not over)".Environment.NewLine
In case you’re still interested, the original Text looked like this:
TABLE 1—WEEKLY Payroll Period
(a) SINGLE person (including head of household) (b) MARRIED person—
(after subtracting
withholding allowances) is:
The amount of income tax
to withhold is:
(after subtracting
withholding allowances) is:
The amount of income tax
to withhold is:
Not over Over But not over Bracket Base Percentage of excess over Over But not over Bracket Base Percentage of excess over
$71 $0.00 0.00% $0 $0 $222 $0.00 0.00%
$71 $254 $0.00 10% $71 $222 $588 $0.00 10% $222
$254 $815 $18.30 12% $254 $588 $1,711 $36.60 12% $588
$815 $1,658 $85.62 22% $815 $1,711 $3,395 $171.36 22% $1,711
$1,658 $3,100 $271.08 24% $1,658 $3,395 $6,280 $541.84 24% $3,395
$3,100 $3,917 $617.16 32% $3,100 $6,280 $7,914 $1,234.24 32% $6,280
$3,917 $9,687 $878.60 35% $3,917 $7,914 $11,761 $1,757.12 35% $7,914
$9,687 $2,898.10 37% $9,687 $11,761 $3,103.57 37% $11,761
It’s tab delimited data showing US Federal Income Tax brackets for weekly paychecks. The goal was to remove the text between “MARRIED person—” and the line starting “Not over Over But not over”. The same text I wished to delete repeats itself multiple times through then entire document (readPDFText) in between these same keywords.
What was originally messing me up were the line breaks: UiPath’s Replace Activity wasn’t working because of the line breaks which led me to use Regex.Replace
To confuse matters further it’s my understanding that regex for javascript recognizes (.*?) (or more specifically the special character “period” “.”) as “all characters including line-breaks” whereas the same regex syntax for VB does not recognize line-breaks (“period” equals “all characters except for line breaks”) … so (.*?) (see original post) kept failing. It also was not working for (.\n*?) so my line breaks were not getting found by these regex patterns.
Instead, the answer I found is that you can use ([\S\s]*?) which accounts for all Whitespace and all Non-Whitespace characters.
The result simply deletes the following:
(after subtracting
withholding allowances) is:
The amount of income tax
to withhold is:
(after subtracting
withholding allowances) is:
The amount of income tax
to withhold is:
Hope that’s more clear; I just wanted to post in case anyone else ran into similar problems with Whitespace and/or Hidden Characters while parsing text!