Substring after line containing specific text


#1

I’m working on a process that will read PDF text for multiple-page files with an invoice on each page. I was able to figure out how to set the range to read each page separately using a combo of the sample “Counter Example” in UiPath and Vvaidya’s pdfPages xml from this thread:

PDF Page Count

However, I am still having difficulty with extracting the exact text I need. The text contains many line breaks, and I need to pull back the full text of a given line after a line containing specific text. For example:

PO Number

123456

I need to extract “123456” since it is the text of the line that comes after the line containing “PO Number.” I imagine this is done using Substring, but I’m having trouble figuring out exactly how. Any help is greatly appreciated!


11.RPA Challenge - Regex to match Invoice number and Order number on consecutive lines
#2

Hi,

Sure, I think you can do this either with .Split or Regex pattern.

Here are my 2 solutions:

text.Split({"PO Number"},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

so that should split the text by PO Number, then split the second part by Newline character to pull in the number.

System.Text.RegularExpressions.Regex.Match(text,"PO Number(.*)[0-9]{4,8}").Value.Replace("PO Number","").Trim

so that should pull out the pattern “PO Number and 4 to 8 numbers” then remove the “PO Number”

Hope that atleast gets you in right direction.

Regards.


String to float
#3

Thanks! Trying to play with both options - the first one is giving me an “Index out of Range” error when I try to pull the next line as it is. If I change the quoted text to just “PO” it does work and pull back number.

EDIT for clarity: If I change the quoted text to just “PO” it does work and pulls back the word “Number.”

The second one just returns a blank value after I modified the {4,8} to {12}.


#4

Hi,

I think the Index out of range means that “PO Number” was not found, which is why “PO” worked. There could be extra spaces between “PO” and “Number”

If changing {12} gave you blank value then it might be because the number didn’t have 12 characters. I would check and verify that. You can also use {10,12} if the number is between 10 and 12 digits.
EDIT: Also, this goes back to “PO Number” might not be found.

Hope that helps.

If you can provide a small sample of text that isn’t working, I could test it on my end.

Regards.


#5

Sent you a sample in a PM


#6

The first option actually does work - I made a stupid mistake and didn’t realize the NUMBER was all caps. Thanks again!