Extracting text from a .txt document


I extracted pdf (catalog containing item numbers and other description) content to a .txt file. I need to extract only the item numbers from .txt file and search for that item number in a website. How can I extract only the item numbers from the .txt file?



Are the item numbers in the same spot consistently? If so, you can .Split the text up then pull that column from the array and store it back to an Array of numbers. If it’s not consistent, then you can use Regex to pull the numbers and again store it to an array.

Here are both examples to assign the list of numbers to an Array[string] variable:

text.Split(vblf(0)).Select(Function(row) row.ToString.Trim.Split({" "},System.StringSplitOptions.RemoveEmptyEntries)(0)).ToArray

text.Split(vblf(0)).Select(Function(row) System.Text.RegularExpressions.Regex.Match(row.ToString.Trim,"[0-9]{1,4}").Value).ToArray

They might need some work though.

Alternatively, you can run a ForEach row In text.Split(vblf(0)) as Argument type String, and use Assign activity to replace each row with the number that’s in that row

Hope this helps.