Remove specific duplicated rows

Hi… i have a PDF file with multiple pages containing the following data:

Page 1
Officers
Name
Tom
Jerry
Ben

Page 2
Officers
Name
Mary
Jack

Because of the page break, when i use the read PDF text activity, the “Officers” and “Name” headers are duplicated in the entire text.

how can i remove the duplicated headers after the first header so that my text file is now:

Officers
Name
Tom
Jerry
Ben
Mary
Jack

@Areto_taco

Welcome to the community

try this

"Officers" + Environment.NewLine + "Name" + str.Replace("Officers" + Environment.NewLine + "Name","")

cheers

Hi,

you can also use “yourString”.Split(Environment.NewLine.TocharArray).Distinct().ToArray()

is it possible to specifically remove only duplicated header rows? because i may have duplicated officer names as well but i want to keep the duplicated officer names…

@Areto_taco ,

Give this a try:
Areto.xaml (6.7 KB)

Yes it is possible, you can do it that way:

strOfficers = System.Text.RegularExpressions.Regex.Replace(yourString,“[\n]”," “)
then
strOfficers.Replace(“Officers”,”“).Replace(“Name”,”")

Regards