Remove text from pdf extraction

Hi all,

In the above example the highlighted part we need to keep and need to remove the unwanted string data.
Highlighted part is the pattern we get in pdf file.
“320151IGV, -320152PartB, -320153, -25 rev 02,
-320157, -320161, -320168 Part A, -320170, -320171, -320172, -320174
SI 18-2022, SI 53-2021.
SIL-019.
Special instruction SI53-2021.c
None”
Only these type of pattern need to extract and rest all need to remove.
The part which is not highlighted is not fixed it will changing.
How to do.
Please help. Thanks in advance.

Thanks,
Lakshmi

Hey,

Use this one
Split(strVar,“Disposition”)(0)

Thanks

Hi @Rounak_Kumar1 ,
I tried the above expression but it will work only for single file. For other files not working because the data which is not highlighted is not fixed it will be changing, Highlighted part data is fixed.
What changes I need to do.

Thanks,
Lakshmi

Hey,

Is there any keyword which is constant to each document…?
Please let me know?
Thanks

@Rounak_Kumar1 ,
Possible keywords,
Cause and Corrective action/Certificate
Justified Removal:
Disposition

Keywords are changing and repeated.

Thanks,

Hey,

Are you getting None in each document?
which i highlighted in the below image

Thanks

Hi,
In few files i get.
Thanks,

Hi @lakshmi.mp

Can you try this in an Assign.
Left Assign:
Str_Result

Right Assign
System.text.regularexpressions.regex.replace(Str_Input,”[\s\S]+(?=Cause and Corrective action/Certificate|Justified Removal:|Disposition)”, “”)

This should replace the highlighted text with nothing (“”)

Cheers

Steve

1 Like

Hi @Steven_McKeering ,
Highlighted part needs to extracted and remaining needs to be removed.


What changes i need to do.
Thanks,
Lakshmi

Hey

My mistake.

Try this.

Can you try this in an Assign.
Left Assign:
Str_Result

Right Assign
System.text.regularexpressions.regex.replace(Str_Input, “(?<=Cause and Corrective action/Certificate|Justified Removal:|Disposition)[\s\S]+”, “”)

This should replace the highlighted text with nothing (“”)

Cheers

Steve

2 Likes

@Steven_McKeering ,
image
SBs embodied during shop visit: -320151IGV, -320152PartB, -320153,
-320157, -320161, -320168 Part A, -320170, -320171, -320172, -320174
Cause and Corrective action/Certificate
Its extracting Justified Removal: , Cause and Corrective action/Certificate along with required values. Output from 2 files.
Thanks,

Hi,

System.text.regularexpressions.regex.replace(SB_incorp, “(?=Cause and Corrective action/Certificate|Justified Removal:|Disposition)[\s\S]+”, “”)

Used this expression to remove the unwanted text.
Thanks for helping and providing the solution. @Steven_McKeering @Rounak_Kumar1
Its working now.

Thanks & Regards,
Lakshmi

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.