aliaga
(Ali Aga Mustofa)
September 13, 2022, 1:46am
1
Hello All,
Currently i had Robot with process flow :
Extract PDF to txt file
Read txt file and get value from specific lines/rows
store data in excell
But i had facing some issue because there is additional lines/rows from pdf when extracted to txt.
Sample additional lines/rows :
1 of 1176 8/25/2022, 12:59 PM
Idxxxx Idxxxxx+ xxxxx.com
The Question is, how can i get the pattern of this value and remove the lines/rows
Thankyou
Yoichi
(Yoichi)
September 13, 2022, 1:50am
2
Hi,
Can you elaborate? What condition is to remove lines/rows?
Regards,
Mr.H
September 13, 2022, 1:54am
3
pls upload 2 sample PDF files and mark which line you want to remove
@aliaga
Here is two ways.
1- we can replace extra text with space by making extra text regex
System.Text.RegularExpressions.Regex.Replace(str_input,”(\W…)”,””).ToString
2- we can make regex for required data and get required data by matching
**System.Text.RegularExpressions.Regex.Match(strInput,“…”)
aliaga
(Ali Aga Mustofa)
September 13, 2022, 2:15am
5
Yoichi:
elaborate
Hi @Yoichi ,
basically i need to remove this lines/rows from txt file.
Regards,
aliaga
(Ali Aga Mustofa)
September 13, 2022, 2:16am
6
Hi @raja.arslankhan ,
Sorry i can’t upload the pdf file because its cofidential, but here the sample of screenshoot extracted txt.
Regards,
Yoichi
(Yoichi)
September 13, 2022, 2:19am
7
Hi,
How about the following? This expression remove it even if dynamic.
yourString = System.Text.RegularExpressions.Regex.Replace(yourString,"(?<=^|\n)\d+\s+of\s+\d+\s+\d+/\d+/\d+.*\n","")
Regards,
aliaga
(Ali Aga Mustofa)
September 13, 2022, 2:19am
8
Hi @raja.arslankhan ,
Thankyou for your reply,
for this option, i need to delete the rows too, not only replace with empty string. so i’m not sure if i could achieve with this option.
Regards,
Mr.H
September 13, 2022, 2:34am
9
pls upload the txt file as your image
aliaga
(Ali Aga Mustofa)
September 13, 2022, 3:01am
10
Hi @Yoichi ,
already use that regex but the text value didn’t change.
Regards,
Yoichi
(Yoichi)
September 13, 2022, 3:06am
11
Hi,
The text of the above image seems to differ from the first text.
Can you clarify which condition to remove the line? for example including https etc?
Regards,
aliaga
(Ali Aga Mustofa)
September 13, 2022, 3:19am
12
Hi @Yoichi ,
I’m Sorry i just realize that there is 2 pattern of file
2.
Regards,
Yoichi
(Yoichi)
September 13, 2022, 3:24am
13
Hi,
Can you try the following? This removes lines which starts with “https://” or “(number) of (number) (date)”
System.Text.RegularExpressions.Regex.Replace(yourString,"(?<=^|\n)(https?://|\d+\s+of\s+\d+\s+\d+/\d+/\d+).*\n","")
Regards,
1 Like
system
(system)
Closed
September 16, 2022, 3:25am
14
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.