I want to delete some parts of the text data of the rule that is repeated over and over again

Hello.
in the process of processing data, I want to delete only the unnecessary parts of the repeated rules, is there a way?

With similar activities
The way someone taught me is to extract only the middle except for the top and bottom.
str.Split({“a”,“b”},StringSplitOptions.None).Where(function(x,i) not (i mod 2) = 0).ToArray
When I did it in the above way, I didn’t get the data I wanted, so I want to know how to extract it in a similar way as above.

The data I want to delete is the yellow marked part below.

I’d appreciate it if you could tell me how.

Hi,

Can you share your input text and expected result?
And also your purpose?
It might be better to extract necessary part than to delete unnecessary part.

Regards,

PDF.TXT (19.3 KB)

The uploaded file is the same as the screenshot above and
The same rules are repeated over and over again.

i want to delete from invoice to R.TON
Not just one part, but the entire data.

using assign activity,
Is it possible to delete only the parts I want?

HI,

How about the following expression? Does this work for you?

 System.Text.RegularExpressions.Regex.Replace(strPdf,"INVOICE[\s\S]+?R\.TON","")

Regards,

Thank you. It’s working
How do I remove (××× ~ R.TON)?

And to apply it to a similar case
(“INVOICE[\s\S]+?R.TON”) How should I modify this part?

Hi,

It’s necessary to clarify the requirement.

If 2 R.TON always exist in the line, the following works.

System.Text.RegularExpressions.Regex.Replace(strPdf,"INVOICE[\s\S]+?R\.TON.*?R\.TON","")

OR

If R.TON always exists at the end of the line, the following will work.

  System.Text.RegularExpressions.Regex.Replace(strPdf,"INVOICE[\s\S]+?R\.TON(?=\r?\n|$)","")

Regards,

It works.
If you apply the method you taught me

FREIGHT COLLECT CONTAINER FCL 20DRx1
FREIGHT COLLECT CONTAINER FCL 40HCx4
FREIGHT COLLECT CONTAINER FCL 40HCx1
I want to delete this part, too

These parts only change after FCL

System.Text.RegularExpressions.Regex.Replace(strPdf,“FREIGHT[\s\S]+FCL(\s\S)”,“”)

Can I use this method?

HI,

Does the above exists in PDF.TXT which you shared?
If no, to make sure, can you share inpunt and expected output?

Regards,

PDF.TXT (24.7 KB)


I did one as you taught me.

The part that I want to get rid is
INCOTERMS FOB FREIGHT COLLECT CONTAINER FCL 20DRx1
This is a recurring rule
In here, The part I want to remove is
FRIGHT ~ FCL XXXX
XXXX is where it keeps changing.

The other place I want to get rid of is
SUB TOTAL ~ XXX
Here X is a number that changes the end, so how can I use the method I taught you?

Hi,

If you need to delete the characters after the specific keyword in the line, the following will work.

System.Text.RegularExpressions.Regex.Replace(strPdf,"FREIGHT\s+COLLECT.*","")

System.Text.RegularExpressions.Regex.Replace(strPdf,"KEYWORD.*","")

Regards,

Thank you. It works.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.