Array contains

Im reading a PDF file, the result is assing to a string variable with //n//n between words (spaces), after this, the string variable is splited by spaces and assing to string array. so in this array I have each word in each position of the array.

I need to find a specific words or frases, for example:
String[10]:{
“23223”,
“date”;
“18/5/2020”,
“name: Juan Perez”,
“Identification 123456”,
“3453543”,
“test”,
“1”,
“Location”,
“Miami”,
…}

And i want to get name, identification, location. but the problem is that some pdf changes orders, so name wont be in same array(5) position, or maybe, name as a world is in array(5), and the name “Juan perez” is in array (6)

Any idea?

Hello,

You’d probably better look directly with regex into the string from the pdf.

From your question, here is a approach. If the actual location is not in the same string but directly the next array’s element, please apply something like the following before (dirty but it’s up to you):

Dirty part

Assign (Int32)
idx = Array.IndexOf(myArray, "Location")

Assign
myArray(idx) = myArray(idx) & " " & myArray(idx + 1)

Variables

Assign (Dictionary<String, String>)
person = New Dictionary(Of String, String)

Assign (Array of String)
keys = {"name", "identification", "location"}

Assign (String)
pattern = "(?<key>\w+)[\:\s]*(?<value>.+)"

Processing

ForEach candidate in myArray (String)

  • Assign (Match)
    match = System.Text.RegularExpressions.Regex.Match(candidate, pattern)

  • If match.Success

    • Assign (String)
      key = match.Groups("key").ToString.ToLower

    • If keys.Contains(key)

      • Assign
        person(key) = match.Groups("value").ToString.Trim