I’m working on a process that will read PDF text for multiple-page files with an invoice on each page. I was able to figure out how to set the range to read each page separately using a combo of the sample “Counter Example” in UiPath and Vvaidya’s pdfPages xml from this thread:
However, I am still having difficulty with extracting the exact text I need. The text contains many line breaks, and I need to pull back the full text of a given line after a line containing specific text. For example:
PO Number
123456
I need to extract “123456” since it is the text of the line that comes after the line containing “PO Number.” I imagine this is done using Substring, but I’m having trouble figuring out exactly how. Any help is greatly appreciated!
Thanks! Trying to play with both options - the first one is giving me an “Index out of Range” error when I try to pull the next line as it is. If I change the quoted text to just “PO” it does work and pull back number.
EDIT for clarity: If I change the quoted text to just “PO” it does work and pulls back the word “Number.”
The second one just returns a blank value after I modified the {4,8} to {12}.
I think the Index out of range means that “PO Number” was not found, which is why “PO” worked. There could be extra spaces between “PO” and “Number”
If changing {12} gave you blank value then it might be because the number didn’t have 12 characters. I would check and verify that. You can also use {10,12} if the number is between 10 and 12 digits.
EDIT: Also, this goes back to “PO Number” might not be found.
Hope that helps.
If you can provide a small sample of text that isn’t working, I could test it on my end.
I am new to UiPath and I have similar problem. Thanks for the solutions. Its working for me to extract the “CampID” for the below URL. But, if I want to get the “id” what should I do.
“?<=” is a look behind and “?=” is a look ahead, and they will be ignored when it extracts the value. So it pulls everything “(.*)” inbetween those strings. I used “&|$” so it will check for either & or end of line.
Thanks for the support. It works. But, when I am trying to print the output using Message Box, I am able to print only the “id” number. What should I do if I want to print even the “CampID”.
I suggest you have 2 Assigns. One for the id and one for CampID. Unless you want both together as one variable, then you need to remove the .Split(“CampID=” part so it only splits by the “id=”
Yeah, you need to split by the “&CampID=”, and you can do so within the same line of code. Look at this for an example: GetCurrentURL.Split({“&CampID=”},System.StringSplitOptions.None)(0).Split({“id=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim
Sure no problem. You need to also split it by the next newline character.
It will look like this: as_PY = word_OUTPUT.Split({"(Previous Year)"},StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim
You can go straight to assigning it to as_PY if you want, since it requires a double split, then take the first item.
I’ll also provide you with the Regex solution as well.
It will look like this: as_PY = Regex.Match(word_OUTPUT, "(?<=(\(Previous Year\))(.*)$").Value
Hopefully, I did that right
Basically, it uses “?<=” as a look behind and the $ means the end of the line, but I’m not 100% I did that right. If not and it’s something you are interested in, I can get you the correct code (or you can do some searches for ideas).
Message: Index was outside the bounds of the array.
Exception Type: System.IndexOutOfRangeException
An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at lambda_method(Closure , ActivityContext )
at Microsoft.VisualBasic.Activities.VisualBasicValue1.Execute(CodeActivityContext context) at System.Activities.CodeActivity1.InternalExecuteInResolutionContext(CodeActivityContext context)
at System.Activities.Runtime.ActivityExecutor.ExecuteInResolutionContext[T](ActivityInstance parentInstance, Activity1 expressionActivity) at System.Activities.InArgument1.TryPopulateValue(LocationEnvironment targetEnvironment, ActivityInstance activityInstance, ActivityExecutor executor)
at System.Activities.RuntimeArgument.TryPopulateValue(LocationEnvironment targetEnvironment, ActivityInstance targetActivityInstance, ActivityExecutor executor, Object argumentValueOverride, Location resultLocation, Boolean skipFastPath)
at System.Activities.ActivityInstance.InternalTryPopulateArgumentValueOrScheduleExpression(RuntimeArgument argument, Int32 nextArgumentIndex, ActivityExecutor executor, IDictionary2 argumentValueOverrides, Location resultLocation, Boolean isDynamicUpdate) at System.Activities.ActivityInstance.ResolveArguments(ActivityExecutor executor, IDictionary2 argumentValueOverrides, Location resultLocation, Int32 startIndex)
at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation)
That error means on one of the .Split()'s, it could not find the string that you are splitting by.
In your case, you are splitting by “&ts=”, taking the first item which does not include the next split string “auth=”.
You need to change the index to (1) so you take the second item in the split that includes “auth=”
Also, you may want to check your string to make sure it includes those strings that you are splitting by before making the split, incase a problem comes up like this again.
expression should be like this, I believe: JsonResponse.Split({“&ts=”},System.StringSplitOptions.None)(1).Split({“auth=”},System.StringSplitOptions.None)(1).Trim.Split(System.Environment.Newline(0))(0).Trim
Just changed the index from (0) to (1)
So i am using the split pattern and it gives me the first line of result. How can i extract the other 5 string of numbers that are under the header “PO Number”