Need to remove the data between two words by using regex

Hi Team,

I have to remove the data between “upc” and “TAP” in the below text by using regex expressions, Can any one please suggest.

input:

Ordered Shipped Measure Description Manufacturer Item Your Item Price
6 6 EA 65044737 1825200 14.15 84.90 N
upc code: 1825200
125 8-32 H2 OSG STI SP PT PLUG TAP
12 12 EA 41113150 41113150 2.83 33.96 N
upc code: 728012327235
1/4-20 H3 2FL HERTEL HSS SP PT PLUG TAP
CPN : 41113150
10 10 EA 77430155 5.27 52.70 N
upc code: TAA51270E
3/8-16 4FL H3 HERTEL HSS HAND PLUG TAP
10 10 EA 77430098 4.67 46.70 N
upc code: TAA51260B
5/16-18 4FL H3 HERTEL HSS HAND PLUG TAP
10 10 EA 04544136 MSC-04544136 7.26 72.60 N
upc code: 101P062
1/2-13 H3 HSS SP PT PLUG TAP5/16-18 4FL H3 HERTEL HSS HAND PLUG TAP
10 10 EA 04544136 MSC-04544136 7.26 72.60 N
upc code: 101P062
1/2-13 H3 HSS SP PT PLUG TAP

Regards,
Manoj.

Hi @manojmanu.rpa ,

To be more precise in providing the right regex could you provide us the expected Output for the Input provided ?

@supermanPunch i need an output in below format.
output

Regards,
Manoj.

@manojmanu.rpa

I hope its a pdf. First try to read the pdf using read pdf and woth preserve formatting option checked…then find the column separator in it and use generate datatable activity

cheers

@manojmanu.rpa ,

As suggested, Do check the PreserveFormat option If/when using the Read PDF Text activity and let us know if you were able to get the output directly using the Generate Datatable activity, if not do provide us with the updated Data here, so we can maybe check with an Alternate solution.

@supermanPunch @Anil_G i tried with preserve format also it’s not working, can you please suggest how can we remove the upc and below line by using regex expressions.

Regards,
Manoj.

@manojmanu.rpa ,

Could you provide us the resultant text that was created when PreserveFormat option was enabled ?

We’ll be able to work on that and check if we can provide a regex solution. Also, Do Let us know if there are Empty values present in the cells of the Table of PDF.

@manojmanu.rpa

Please try this to replace the text in between

Regex.Replace(str,"(?<=upc).*\s.*(?=TAP)","")

cheers

Thanks @Anil_G it works.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.