How to extract certain Text When it's getting distorted after using the read PDF text Activity

I am using Read PDF text to convert the text inside a PDF the PDF looks like this

image

After Extracting the text to PDF the text is getting distorted like this :

image

Now I need the Date which is mentioned in front of the GRN Date which in this case is :

image

Any Ideas on What Regex expression can be applied over this ?

Attaching the text file which contains the text after extracting from the PDF file for reference :
test.txt (139 Bytes)

Can you please upload a sample PDF as well. It seems when one field is having blank value then its get replace by next one. eg. After RMS PO No. (10240290) there supposed be a blank line but its not there.

Can’t attach the full PDF that’s confidential Sorry :frowning: and about the replacement the values are not getting replaced, Those are blank only

Hi,

Can you share settings of PreserveFormatting in ReadPDFText?

image

If it’s blank or False now, can you also share text in case set true?

Regards,

Keeping the Preserved Data option tIcked I got the output like this

image

Hi,

In this case, we can easily get each value using regex.
Can you share text file of this?

Regards,

Sure @Yoichi

Here you go -

test.txt (194 Bytes)

Hi @Ishan_Shelke ,
Kindly find the screenshot for your solution,
image

HI,

Can you try the following sample?

System.Text.RegularExpressions.Regex.Match(strText,"(?<=GRN\s+Date\s+:\s+)\w.*").Value

Sample20230623-2L.zip (2.7 KB)

Regards,

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.