Hi
I am using regex to take certain parts of a PDF but it also takes page no and other lines that I don’t need. I am not sure how to remove the words, since it can change from pdf to pdf.
The only thing that remains constant in all the pdf’s is the page number format.
I have attached the image below.
I want to remove everything circled red and highlighted parts are not constant.
Thanks in Advance
Yoichi
(Yoichi)
February 2, 2023, 8:11am
2
Hi,
How about the following expression?
System.Text.RegularExpressions.Regex.Replace(yourString,"Page[\s\S]+?Invoice\s+No[\s\S]+?Period:.*","")
Regards,
Anil_G
(Anil Gorthi)
February 2, 2023, 8:16am
3
@sunilkanth
Please try this
System.Text.RegularExpressions.Regex.Replace(str,"(?=Page)[\s\S]*(?=Due Date)","")
cheers
Anil_G
(Anil Gorthi)
February 2, 2023, 8:25am
5
@sunilkanth
Can you paste the exact data…Because I tried from myside and I could see it is working
Are due date and Page not cosntant?
Input
Output
cheers
Anil_G
(Anil Gorthi)
February 2, 2023, 8:42am
7
sunilkanth:
everything
@sunilkanth
Please try this
System.Text.RegularExpressions.Regex.Replace(str,"(?=Page)[\s\S]*(?=Period).*","")
cheers
system
(system)
Closed
February 5, 2023, 8:46am
9
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.