Retrieve text between two pieces of text in a String

I have read a pdf file and generated a string from it. I would like to search this large string for a name and retrieve it. The name consistently appears between two other recurring texts ("Name: " and “/Accounting”). I would like to perform this action on many PDFs that are different sizes, lengths etc.

Ex:
PDFString = “Name: Bill/Accounting”

  • I want to grab the text between "Name: " and “/Accounting”
  • I want to grab the name “Bill”
  • This name might change, so the length of the string might change, so I can’t just say “read 5 characters ahead of “Name:””
  • I want to be able to grab this information wherever it may be in the PDFString (Which will be changing lengths)
  • The number of pages in the pdf is not consistent
  • The number of lines in the pdf is not consistent

My preferred method of grabbing text between 2 fixed texts is using regex. Be sure to import System.Text.RegularExpression (using the Imports tab by the variables and arguments tab). Then the following assign activity will pull out the value between “Name:” and “/Accounting” wherever it is found. I assume it happens only once, or you only want the first occurrence.

Assign OutputString = Regex.Match(PDFString,"(?<=Name:\s).+(?=/Accounting)",RegexOptions.IgnoreCase).Value.Trim

2 Likes

Hello Dave,

Thank you for your response, it did the trick! This was the code I ended up using:

Regex.Match(PDFString,“(?<=Name:\s).*?(?=/Accounting)”, RegexOptions.IgnoreCase).Value.Trim

Part of the reason I was not having success was that I had also not imported System.Text.RegularExpression.
Thanks again!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.