String Manipulation - Extract fields from PDF by knowing exact Indexes

Hi,

Inside a workflow I’m developing, I’m using a sequence with a series of steps to extract the value of some strings on a PDF. Let’s imagine the following scenario:

The PDF has got always the same fields, such as Name, Age, Location… whatever. In front of all these fields that act as label in the PDF, I have the values I want. Example:

“Blah blah blah…
Name: John Doe
Age: 31
Location: Mars
Lorem impsum…”

Let’s say that I want to extract just the name, John Doe.

Right now, what I’m using is something like this:

I have a ForEach activity that iterates through all the lines of the PDF, either a PDF with 10 lines or 1000 lines. Foreach “line” in “PdfContent.Split(New String() {Environment.NewLine}, StringSplitOptions.None)” I then I have inside the Body an If condition who looks like this:

If
item.ToString.Contains("Name ") = True
Assign
varName = item.ToString.Replace(“Name: “,””)

Since this solution works already, giving me the John Doe value… I’m just asking if there’s no better approach?

For example, is there some sort of Substring, Split or whatever method with the option to extract directly the values, since I know exactly where those values will be on the pdf?

I know that the Name is always after "Name: " and before “Age” so… is there some way of using a method that extracts me what I want by knowing that what I want is what is between the Index of “Name” and the index of “Age”?

Thanks in advance.

Hi @samureira,

Try using Regex

Thanks,

Instead of running through a loop, you can just use the .Split twice…
PdfContent.Split({"Name: "},StringSplitOptions.None)(1).Split(Environment.Newline(0))(0)

or just use Regex, which is more powerful.
Regex.Match(PdfContent, "(?<=(Name: ))(.+)").Value

Regards.

3 Likes

@ClaytonM, I understand that Regex might be more powerful and efficient but I believe the two Splits solution is easier for me to understand what’s going on! Thanks!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.