Remove specific lines/row on txt File

Hello All,

Currently i had Robot with process flow :

  • Extract PDF to txt file
  • Read txt file and get value from specific lines/rows
  • store data in excell

But i had facing some issue because there is additional lines/rows from pdf when extracted to txt.

Sample additional lines/rows :

1 of 1176 8/25/2022, 12:59 PM
Idxxxx Idxxxxx+ xxxxx.com

The Question is, how can i get the pattern of this value and remove the lines/rows

Thankyou

Hi,

Can you elaborate? What condition is to remove lines/rows?

Regards,

pls upload 2 sample PDF files and mark which line you want to remove

@aliaga
Here is two ways.
1- we can replace extra text with space by making extra text regex
System.Text.RegularExpressions.Regex.Replace(str_input,”(\W…)”,””).ToString

2- we can make regex for required data and get required data by matching

**System.Text.RegularExpressions.Regex.Match(strInput,“…”)

Hi @Yoichi ,
basically i need to remove this lines/rows from txt file.

Regards,

Hi @raja.arslankhan ,

Sorry i can’t upload the pdf file because its cofidential, but here the sample of screenshoot extracted txt.

Regards,

Hi,

How about the following? This expression remove it even if dynamic.

yourString = System.Text.RegularExpressions.Regex.Replace(yourString,"(?<=^|\n)\d+\s+of\s+\d+\s+\d+/\d+/\d+.*\n","")

Regards,

Hi @raja.arslankhan ,

Thankyou for your reply,
for this option, i need to delete the rows too, not only replace with empty string. so i’m not sure if i could achieve with this option.

Regards,

pls upload the txt file as your image

Hi @Yoichi ,

already use that regex but the text value didn’t change.


Regards,

Hi,

The text of the above image seems to differ from the first text.
Can you clarify which condition to remove the line? for example including https etc?

Regards,

Hi @Yoichi ,

I’m Sorry i just realize that there is 2 pattern of file


2.
image

Regards,

Hi,

Can you try the following? This removes lines which starts with “https://” or “(number) of (number) (date)”

System.Text.RegularExpressions.Regex.Replace(yourString,"(?<=^|\n)(https?://|\d+\s+of\s+\d+\s+\d+/\d+/\d+).*\n","")

Regards,

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.