I want read a pdf which has a certain header and it has own value how should I read it?
eg. ABC
123
In this I want to read123
Hello @Bhushan_Nagaonkar , Try Regex method
After reading the PDF, save the data in variable as String and pass it in Regex
System.text.regularExpressions.Regex.Match("Your String","\d+").Tostring.Trim
Ok I will try, what is +d? I dont know anything about regex
@Bhushan_Nagaonkar
Regex mean Regular Expression
It is a sequence of characters that specifies a match pattern in text
Refer to this Post to learn more
Hi @Bhushan_Nagaonkar - Assuming your pdf is not a scanned one, if that is the case use
- Read PDF Text Activity
- Then, you can apply string operations or Regex
Please check the below video
(180) How to extract data from PDF’s with RegEx in UiPath - Full Tutorial - YouTube
Hi @Bhushan_Nagaonkar ,
In addition to the Tutorials/Post suggested, you could check the below post as well on understanding of when to apply regex and for what scenarios :
When performing data extraction for documents, it is needed to be understood if the similar data pattern in the documents is going to be observed for all the data samples that you receive. Hence, we will be able to understand that structure of the documents and find the Keywords/Constants that can be tagged to the values.
We leverage these Keywords/Constants for the extraction of the required values.
This is very indetailed thanks for this. @supermanPunch @ushu @Manju_Reddy_Kanughula
I applied the condition which has been suggested by you guys.
I want the output of employee social number.This is the only section I want output as.
Hi @Bhushan_Nagaonkar ,
Can you paste the text here after you read the pdf?, you can jumble the sensitive numbers in case of security issues.
Regards,
I want the number.
If you want the social security number which has a hiphen in it then try the following regex expression:
[0-9-]+
This will match numbers with hiphen in it, for eg xxx-xxxx
Regards,
The pdf has other data, also this is the only data I want from PDF. How should I do it?
Can you show me in ss format?
Thankyou
Hi @Bhushan_Nagaonkar ,
We will not be able to provide a proper regex if the Sample data format is not provided, However check the below regex :
(?<=social security number.*\s*)[\d-]+
@Bhushan_Nagaonkar , in the expression provided by @Gokul_Jayakumar , just change the regex pattern and you are good to go.
Assign it to any string variable
System.text.regularExpressions.Regex.Match("Your String","[0-9-]+").Tostring.Trim
Regards,
It worked thanks.
I just have one question what if I don’t use get text and just want to use readpdf
activity and assign it to regex to get the output how should I do it in this case.
@Bhushan_Nagaonkar , after you read the pdf pass the extracted string variable directly to the above expression.
Regards,
I tried but the output I got was 7 which was of other elements present in doc
Im getting an output but this is the error
You might have missed the +
Sign at the end in the pattern.