I would like to extract the company name and registration no from a pdf, therefore would like to know whether are there any alternative regex expression. Am currently still trying to figure out how to use the regex expression
Previously i did raised a thread on this : Regex expression to extract company name
The solution provided
(System.Text.RegularExpressions.Regex.Match(yourString,“(?<=COMPANY NAME:\s*).*”).Value)
was great but somehow, not sure why for some files, the 2 variables; ie company name and registration no was not able to extract.
Therefore, would like to know whether is there any alternative solution to this?
See screenshot below:
Gokul001
(Gokul Balaji)
October 21, 2022, 1:16pm
2
Hi @Justine
Can you share the 2(new)_.pdf
file here. To check the regex pattern?
Regards
Gokul
hi @Gokul001 , attaching the attachment.
2.pdf (123.5 KB)
Gokul001
(Gokul Balaji)
October 22, 2022, 7:00am
4
This expression is working correctly @Justine
Regards
Gokul
1 Like
haha, this is weird. no idea why is my uipath not extracting specific text for some pdf. the format and pattern are the same. but anyway thanks for the help!
Gokul001
(Gokul Balaji)
October 22, 2022, 7:07am
6
HCan you try to Keep Message box and check it before writing into the excel file @Justine
HI @Justine
Checkout this expression for Company name
YourVariableString.Substring(YourVariableString.IndexOf("COMPANY NAME:")+"COMPANY NAME:".Length).Split(Environment.NewLine.ToCharArray)(0).Trim
Checkout this expression for reg no
YourVariableString.Substring(YourVariableString.IndexOf("REGISTRATION NO.:")+"REGISTRATION NO.:".Length).Split(Environment.NewLine.ToCharArray)(0).Trim
Read PDF properties
Hope this Helps
Regards
Sudharsan
Sudharsan_Ka:
Checkout this expression for Company name
YourVariableString.Substring(YourVariableString.IndexOf("COMPANY NAME:")+"COMPANY NAME:".Length).Split(Environment.NewLine.ToCharArray)(0).Trim
Checkout this expression for reg no
YourVariableString.Substring(YourVariableString.IndexOf("REGISTRATION NO.:")+"REGISTRAT
Updated the expression above @Justine
YourVariableString is your string coming out of PDF
Regards
Sudharsan
3 Likes
@Justine ,
We would like you to inspect the Data either using a formatted text editor or online web text editor (could use regex 101), to look for hidden characters that could be observed in some cases.
Also could log the Text to the Output Panel using Write Line
activity and check the escape characters present.
Justine
October 22, 2022, 5:20pm
10
hi @Sudharsan_Ka , it works perfectly. thank you for the help.
Would like to ask, on what conditions should we set the preserve formatting to True?
We are telling to read the pdf with format same as the pdf is in @Justine
So we are setting preserve format to true
Regards
Sudharsan
1 Like
Justine
October 23, 2022, 9:30pm
12
Thanks for the help. Much appreciated!
system
(system)
Closed
October 26, 2022, 9:31pm
13
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.