Hi… i have a PDF file with multiple pages containing the following data:
Page 1
Officers
Name
Tom
Jerry
Ben
Page 2
Officers
Name
Mary
Jack
Because of the page break, when i use the read PDF text activity, the “Officers” and “Name” headers are duplicated in the entire text.
how can i remove the duplicated headers after the first header so that my text file is now:
Officers
Name
Tom
Jerry
Ben
Mary
Jack
Anil_G
(Anil Gorthi)
March 27, 2023, 9:54am
2
@Areto_taco
Welcome to the community
try this
"Officers" + Environment.NewLine + "Name" + str.Replace("Officers" + Environment.NewLine + "Name","")
cheers
zaqq
(Michał Olesiński)
March 27, 2023, 9:59am
3
Hi,
you can also use “yourString”.Split(Environment.NewLine.TocharArray).Distinct().ToArray()
is it possible to specifically remove only duplicated header rows? because i may have duplicated officer names as well but i want to keep the duplicated officer names…
@Areto_taco ,
Give this a try:
Areto.xaml (6.7 KB)
zaqq
(Michał Olesiński)
March 27, 2023, 11:27am
6
Yes it is possible, you can do it that way:
strOfficers = System.Text.RegularExpressions.Regex.Replace(yourString,“[\n]”," “)
then
strOfficers.Replace(“Officers”,”“).Replace(“Name”,”")
Regards