Extract information through a repeated word

howto

#1

Hello everyone, if I have a word repeated in a document and I need to extract the information that goes with it. How can I identify each one and gather all the information?

21367_140455 (1).pdf (13.7 KB)

example:
In this pdf the word “Lapesa” comes out several times and I need that when you finish reading the pdf, write the information in a file like this

.

Regards.


#2

So you want each line that mentions the word Lapesa (or this can be any string)?
To solve this problem, I would first read the pdf, and then split the output string on a newline character, and then for each line, check to see if that entry in the array. If the entry in the array contains Lapesa (or whatever identifier you want) you can then append it to a text file. Hope that helps!


#3

You can do a For each line In txt.Split(System.Environment.Newline(0))
Then, Write Text File and Append Text, or you can concatenate the lines to a string and write it at the end.
That’s basically what jacob suggested which works.

If there are thousands of lines I would suggest LINQ expressions.
For example,

filteredArray = txt.Split(System.Environment.Newline(0)).Where(Function(line) line.Trim.ToUpper.StartsWith("LAPESA") ).ToArray

That would give you an array to process in a “for each” if needed. You can also surround this array with a .Join to to write it to a file.

Write Text File => String.Join(System.Environment.Newline, filteredArray)

Regards.