Extract company name using regex expression

I would like to extract the company name and registration no from a pdf, therefore would like to know whether are there any alternative regex expression. Am currently still trying to figure out how to use the regex expression

Previously i did raised a thread on this : Regex expression to extract company name

The solution provided
(System.Text.RegularExpressions.Regex.Match(yourString,“(?<=COMPANY NAME:\s*).*”).Value)

was great but somehow, not sure why for some files, the 2 variables; ie company name and registration no was not able to extract.

Therefore, would like to know whether is there any alternative solution to this?

See screenshot below:

Hi @Justine

Can you share the 2(new)_.pdf file here. To check the regex pattern?

Regards
Gokul

hi @Gokul001 , attaching the attachment.
2.pdf (123.5 KB)

This expression is working correctly @Justine

Regards
Gokul

1 Like

haha, this is weird. no idea why is my uipath not extracting specific text for some pdf. the format and pattern are the same. but anyway thanks for the help!

HCan you try to Keep Message box and check it before writing into the excel file @Justine

HI @Justine

Checkout this expression for Company name

YourVariableString.Substring(YourVariableString.IndexOf("COMPANY NAME:")+"COMPANY NAME:".Length).Split(Environment.NewLine.ToCharArray)(0).Trim

Checkout this expression for reg no

YourVariableString.Substring(YourVariableString.IndexOf("REGISTRATION NO.:")+"REGISTRATION NO.:".Length).Split(Environment.NewLine.ToCharArray)(0).Trim

Read PDF properties

Hope this Helps

Regards
Sudharsan

Updated the expression above @Justine

YourVariableString is your string coming out of PDF

Regards
Sudharsan

3 Likes

@Justine ,

We would like you to inspect the Data either using a formatted text editor or online web text editor (could use regex 101), to look for hidden characters that could be observed in some cases.

Also could log the Text to the Output Panel using Write Line activity and check the escape characters present.

hi @Sudharsan_Ka , it works perfectly. thank you for the help.

Would like to ask, on what conditions should we set the preserve formatting to True?

We are telling to read the pdf with format same as the pdf is in @Justine

So we are setting preserve format to true

Regards
Sudharsan

1 Like

Thanks for the help. Much appreciated!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.