I want to read specific text from pdf . How should I read it

I want read a pdf which has a certain header and it has own value how should I read it?
eg. ABC
123
In this I want to read123

Hello @Bhushan_Nagaonkar , Try Regex method
After reading the PDF, save the data in variable as String and pass it in Regex

System.text.regularExpressions.Regex.Match("Your String","\d+").Tostring.Trim

image

2 Likes

Ok I will try, what is +d? I dont know anything about regex

@Bhushan_Nagaonkar

Hi Check here

https://forum.uipath.com/t/extracted-specific-text-from-pdf-paste-to-excel/539589

1 Like

@Bhushan_Nagaonkar
Regex mean Regular Expression
It is a sequence of characters that specifies a match pattern in text
Refer to this Post to learn more

1 Like

Hi @Bhushan_Nagaonkar - Assuming your pdf is not a scanned one, if that is the case use

  • Read PDF Text Activity
  • Then, you can apply string operations or Regex

Please check the below video

(180) How to extract data from PDF’s with RegEx in UiPath - Full Tutorial - YouTube

1 Like

Hi @Bhushan_Nagaonkar ,

In addition to the Tutorials/Post suggested, you could check the below post as well on understanding of when to apply regex and for what scenarios :

When performing data extraction for documents, it is needed to be understood if the similar data pattern in the documents is going to be observed for all the data samples that you receive. Hence, we will be able to understand that structure of the documents and find the Keywords/Constants that can be tagged to the values.

We leverage these Keywords/Constants for the extraction of the required values.

1 Like

This is very indetailed thanks for this. @supermanPunch @ushu @Manju_Reddy_Kanughula

1 Like

I applied the condition which has been suggested by you guys.
MicrosoftTeams-image (77)

I want the output of employee social number.This is the only section I want output as.

Hi @Bhushan_Nagaonkar ,

Can you paste the text here after you read the pdf?, you can jumble the sensitive numbers in case of security issues.

Regards,

1 Like

MicrosoftTeams-image (78)

I want the number.

@Bhushan_Nagaonkar ,

If you want the social security number which has a hiphen in it then try the following regex expression:

[0-9-]+

This will match numbers with hiphen in it, for eg xxx-xxxx

Regards,

The pdf has other data, also this is the only data I want from PDF. How should I do it?
Can you show me in ss format?

Thankyou

Hi @Bhushan_Nagaonkar ,

We will not be able to provide a proper regex if the Sample data format is not provided, However check the below regex :

(?<=social security number.*\s*)[\d-]+

2 Likes

@Bhushan_Nagaonkar , in the expression provided by @Gokul_Jayakumar , just change the regex pattern and you are good to go.

Assign it to any string variable

System.text.regularExpressions.Regex.Match("Your String","[0-9-]+").Tostring.Trim

Regards,

1 Like

It worked thanks.

I just have one question what if I don’t use get text and just want to use readpdf activity and assign it to regex to get the output how should I do it in this case.

@Bhushan_Nagaonkar , after you read the pdf pass the extracted string variable directly to the above expression.

Regards,

I tried but the output I got was 7 which was of other elements present in doc

Im getting an output but this is the error

MicrosoftTeams-image (82)
MicrosoftTeams-image (81)

@Bhushan_Nagaonkar ,

You might have missed the + Sign at the end in the pattern.