Remove string with known start and end string

Hello everyone

I need help with something, I have a text string that I extract from a pdf, it is a datatable, this datatable has several pages so when passing the information to a string I have the headers and footers of the next page, How can I remove them,knowing that they all start with a keyword just as they end, for example:
1036943 770202511947 SMALL CHRISTMAS BAG NOEL 12/UN 1.00 12.00 0.92000 11.0400
1039857 770200705898 CHOCOBALLS 150GR NOEL 22/150 GR 1.00 22.00 1.11000 24.4200
1042601 770202514300 NOE CHRISTMAS COOKIE NOEL LUNCHBOX 6/300 GR 1.00 6.00 5.30000 31.8000
Printing Date: 03/7/2022 Time: 2:53PM Receipt Date: 03/9/2022RIBA SMITH, S. A. Order No.: 22987321
Receipt Date: PURCHASE ORDER - 8082 - ALIMENTOS CARNICOS DE PANAMA S.A. Bod.: RIBA SMITH, S. A
03/09/2022 MARCH 7, 2022 TRANSISTHMIC
Pages: 2
---------- Order ---------- Cost
Supplier Code Bar/Internal Description Brand Pack/Size Boxes Units Unit Ext. Cost
1043120 770202513488 CHRISTMAS COOKIE BAG NOEL 12/220 GR 1.00 12.00 0.97000 11.6400
1054848 770200706266 CHRISTMAS CHOCOBALLS 125G NOEL 24/125 GR 1.00 24.00 0.94000 22.5600
1055644 770202511068 INTREGRAL SALTIN ​​BISCUIT BOWL NOEL 24/216 GR 2.00 48.00 1.38000 66.2400
1055664 770202513312 NOEL DUCAL BISCUITS 24/189 GR 1.00 24.00 1.26000 30.2400

I need to remove all the text between “Printing Date:” and “Ext. Cost”

we assume Printing Date also will be removed:


Printing Date(.|\n)*?(?=Ext. Cost)

use this pattern in a Regex.Replace

And Also:

Thank you very much, I ask you, how could I also remove “Ext. Cost”

Hi

If the input is with string variable named Strinput

Then use a assign activity like this

stroutput = System.Text.RegularExpressions.Regex.Replace(strinput.ToString, “(?<=Printing)(.|\r|\n)*(?=Ext. Cost)”, “ ”)

Cheers @Juan_Esteban_Valencia

If you want to remove that as well then use this expression

(?<=Printing)(.|\r|\n)*(Ext. Cost)

In this expression

Cheers @Juan_Esteban_Valencia

1 Like

Thank you very very much

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.