Substring after line containing specific text

I’m working on a process that will read PDF text for multiple-page files with an invoice on each page. I was able to figure out how to set the range to read each page separately using a combo of the sample “Counter Example” in UiPath and Vvaidya’s pdfPages xml from this thread:

PDF Page Count

However, I am still having difficulty with extracting the exact text I need. The text contains many line breaks, and I need to pull back the full text of a given line after a line containing specific text. For example:

PO Number

123456

I need to extract “123456” since it is the text of the line that comes after the line containing “PO Number.” I imagine this is done using Substring, but I’m having trouble figuring out exactly how. Any help is greatly appreciated!

Hi,

Sure, I think you can do this either with .Split or Regex pattern.

Here are my 2 solutions:

text.Split({"PO Number"},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

so that should split the text by PO Number, then split the second part by Newline character to pull in the number.

System.Text.RegularExpressions.Regex.Match(text,"PO Number(.*)[0-9]{4,8}").Value.Replace("PO Number","").Trim

so that should pull out the pattern “PO Number and 4 to 8 numbers” then remove the “PO Number”

Hope that atleast gets you in right direction.

Regards.

2 Likes

Thanks! Trying to play with both options - the first one is giving me an “Index out of Range” error when I try to pull the next line as it is. If I change the quoted text to just “PO” it does work and pull back number.

EDIT for clarity: If I change the quoted text to just “PO” it does work and pulls back the word “Number.”

The second one just returns a blank value after I modified the {4,8} to {12}.

Hi,

I think the Index out of range means that “PO Number” was not found, which is why “PO” worked. There could be extra spaces between “PO” and “Number”

If changing {12} gave you blank value then it might be because the number didn’t have 12 characters. I would check and verify that. You can also use {10,12} if the number is between 10 and 12 digits.
EDIT: Also, this goes back to “PO Number” might not be found.

Hope that helps.

If you can provide a small sample of text that isn’t working, I could test it on my end.

Regards.

Sent you a sample in a PM

The first option actually does work - I made a stupid mistake and didn’t realize the NUMBER was all caps. Thanks again!

Hi @ClaytonM,

I am new to UiPath and I have similar problem. Thanks for the solutions. Its working for me to extract the “CampID” for the below URL. But, if I want to get the “id” what should I do.

http://testing.com/receipt.aspx?id=7f-b6-4d-8b-d006e&CampID=17-96-4a-9e-7e0fe

GetCurrentURL.Split({“CampID =”},System.StringSplitOptions.None)(0).Trim.Split(System.Environment.Newline(0))(0).Trim

But how can I extract the “id” number for the above URL?

GetCurrentURL.Split({“&CampID=”},System.StringSplitOptions.None)(0).Split({“id=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

You just need to add another split to split by “id=” and take the (1) index, because it splits it into 2 parts and you want the part after “id=”.

I’ll also provide you a Regex solution. It may or may not be better.

System.Text.RegularExpressions.Regex.Match( GetCurrentURL, "(?<=(id\=))(.*)(?=(\&|$))" ).Value

“?<=” is a look behind and “?=” is a look ahead, and they will be ignored when it extracts the value. So it pulls everything “(.*)” inbetween those strings. I used “&|$” so it will check for either & or end of line.

Hopefully that is clear and helpful.

Regards.

2 Likes

Hi @ClaytonM,

Thanks for the support. It works. But, when I am trying to print the output using Message Box, I am able to print only the “id” number. What should I do if I want to print even the “CampID”.

image

I suggest you have 2 Assigns. One for the id and one for CampID.
Unless you want both together as one variable, then you need to remove the .Split(“CampID=” part so it only splits by the “id=”

Hi @ClaytonM

If I assign 2 Assigns and tying to print the “id” its printing “id”+“CampID”.Please find the OutPut below.

78c4d38f-b2d6-4de1-8b18-d4e2b21a006e&CampID=14614972-965c-4a63-904e-77e0fec5130d

How can I remove the “CampID” from the string.

Can I use split method?

Hi. Which code are you using to extract the “id”?

Yeah, you need to split by the “&CampID=”, and you can do so within the same line of code. Look at this for an example:
GetCurrentURL.Split({“&CampID=”},System.StringSplitOptions.None)(0).Split({“id=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

You can also try the Regex solution too.

Regards.

Thanks @ClaytonM its working. I am using the Split method.

Hi @ ClaytonM, I want to fetch the 100$ from Above word document.

So I use the split logic:- word_OUTPUT_array = word_OUTPUT.Split({"(Previous Year)"},StringSplitOptions.None)

as_PY = word_OUTPUT_array(1)

But it’s not working for me.

It gives me following output,

Untitled

But I want only 100$.
Please give a suggestion.
Thanks in advance.

Sure no problem. You need to also split it by the next newline character.

It will look like this:
as_PY = word_OUTPUT.Split({"(Previous Year)"},StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

You can go straight to assigning it to as_PY if you want, since it requires a double split, then take the first item.

I’ll also provide you with the Regex solution as well.
It will look like this:
as_PY = Regex.Match(word_OUTPUT, "(?<=(\(Previous Year\))(.*)$").Value
Hopefully, I did that right :thinking:
Basically, it uses “?<=” as a look behind and the $ means the end of the line, but I’m not 100% I did that right. If not and it’s something you are interested in, I can get you the correct code (or you can do some searches for ideas).

Regards.

2 Likes

Hi ClaytonM,
1st solution is works for me.
Thanks. You are amazing.
Thanks for reply.

Hi @ClaytonM,

I am using the below code to extract auth value from below URL

JsonResponse.Split({“&ts=”},System.StringSplitOptions.None)(0).Split({“auth=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim

When I am trying to extract auth value from the below URL, I am getting below error message

URL
https://reward/v1/gift/6147001/5?amount=5&ts=20181219192334&auth=AppEngineeringSolutions:5MUA5swWhfx359SixhirmkmVC57yhjyKpabm5M4mamM=

Error Message
18.3.1+Branch.master.Sha.4c05f670b311e90ee097c589605b399e9bee4874

Source: Assign

Message: Index was outside the bounds of the array.

Exception Type: System.IndexOutOfRangeException

An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at lambda_method(Closure , ActivityContext )
at Microsoft.VisualBasic.Activities.VisualBasicValue1.Execute(CodeActivityContext context) at System.Activities.CodeActivity1.InternalExecuteInResolutionContext(CodeActivityContext context)
at System.Activities.Runtime.ActivityExecutor.ExecuteInResolutionContext[T](ActivityInstance parentInstance, Activity1 expressionActivity) at System.Activities.InArgument1.TryPopulateValue(LocationEnvironment targetEnvironment, ActivityInstance activityInstance, ActivityExecutor executor)
at System.Activities.RuntimeArgument.TryPopulateValue(LocationEnvironment targetEnvironment, ActivityInstance targetActivityInstance, ActivityExecutor executor, Object argumentValueOverride, Location resultLocation, Boolean skipFastPath)
at System.Activities.ActivityInstance.InternalTryPopulateArgumentValueOrScheduleExpression(RuntimeArgument argument, Int32 nextArgumentIndex, ActivityExecutor executor, IDictionary2 argumentValueOverrides, Location resultLocation, Boolean isDynamicUpdate) at System.Activities.ActivityInstance.ResolveArguments(ActivityExecutor executor, IDictionary2 argumentValueOverrides, Location resultLocation, Int32 startIndex)
at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation)

1 Like

That error means on one of the .Split()'s, it could not find the string that you are splitting by.
In your case, you are splitting by “&ts=”, taking the first item which does not include the next split string “auth=”.

You need to change the index to (1) so you take the second item in the split that includes “auth=”

Also, you may want to check your string to make sure it includes those strings that you are splitting by before making the split, incase a problem comes up like this again.

expression should be like this, I believe:
JsonResponse.Split({“&ts=”},System.StringSplitOptions.None)(1).Split({“auth=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim
Just changed the index from (0) to (1)

Regards

Thanks @ClaytonM it worked :slight_smile:

Hi,

So i am using the split pattern and it gives me the first line of result. How can i extract the other 5 string of numbers that are under the header “PO Number”

Thank you.