Scraping information from text document

Hello all. I cannot share screenshots for security reasons and all data given is dummy test data

I am working on a process that will take a .rtf file, gather specific information and store all of it into a Data Table

Test Data:

Family Name BAGGINS
Given Name BILBO
some other irrelevant information
Family Name TOMLINSON
Given Name BABETTE
more irrelevant information

The goal is to extract every instance of Family Name into an array and then again with Given Name.

Currently I have the .rtf file stored as a String using Word Application Scope.
I have then tried to use UiPath.Core.Activities.Matches with a RegEx that works fine for extracting the data. However, I am having a hard time working with the output type of this activity as I need to store all data gathered into a Data Table and if for example, Family Name has more elements than Given Name, I run into issues with adding the Data Rows in a later Loop.

I then tried using a String Split, I have it currently as an assign

String[] Test = RTF.Split({"Family Name", "Given Name"}, StringSplitOptions.RemoveEmptyEntries)

This works in isolating the required data but also leaves my array with too many elements that are not usable (all the irrelevant information).

Is there a way to either remove entries that are irrelevant in my Split method or easily create a DataTable using multiple outputs from Matches?

Use the regular expression below.

((?<=Family Name ).*)\n(Given Name) .*

This should give you Family Name in the first group and Given Name in the second group. From the second group, you have to remove the constant string Given Name

Click here to see the regular expression in action.

1 Like

@GoldCutlery

Use Add data Row activity and type the below in the Arrray row.
{DataRegex.Groups(1).ToString, DataRegex.Groups(2).ToString}

Note : This will help u to store the regex group in Array, do reply for more help.