How to remove set of words with a keyword

Hi

I am using regex to take certain parts of a PDF but it also takes page no and other lines that I don’t need. I am not sure how to remove the words, since it can change from pdf to pdf.

The only thing that remains constant in all the pdf’s is the page number format.

I have attached the image below.

I want to remove everything circled red and highlighted parts are not constant.

Thanks in Advance

Hi,

How about the following expression?

System.Text.RegularExpressions.Regex.Replace(yourString,"Page[\s\S]+?Invoice\s+No[\s\S]+?Period:.*","")

Regards,

@sunilkanth

Please try this

System.Text.RegularExpressions.Regex.Replace(str,"(?=Page)[\s\S]*(?=Due Date)","")

cheers

No it didn’t work

@sunilkanth

Can you paste the exact data…Because I tried from myside and I could see it is working

Are due date and Page not cosntant?

Input
image

Output
image

cheers

@sunilkanth

Please try this

System.Text.RegularExpressions.Regex.Replace(str,"(?=Page)[\s\S]*(?=Period).*","")

cheers

Thank You,it Worked!!! :grinning:

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.