I have this data and I would like to extract the email and mobile number etc. beside their respective keywords. (e.g. input: email: xxx, output: xxx)
Hey,
You can use Read PDF text Activity or if this pdf is kind of image then you can use Read pdf with OCR after that you can do string Manipulation.
Thanks,
Rounak
Hi,
You can follow this workflow as i attached
image.pdf (17.1 KB)
Sample2.xaml (7.4 KB)
Thanks,
Rounak
- Convert this pdf to text using Read PDF Text Activity
- Now, you can apply regular expressions to extract the data
- To get Email Address, take assign activity
Variable of type string Email = System.Text.RegularExpressions.Regex.Match(PDFText, "(?<=Email:).*.com").ToString
- To get the Mobile no
Variable of type string Mobile = System.Text.RegularExpressions.Regex.Match(PDFText, "(?<=Mobile:)\d+").ToString
- PDFText is the output variable of Read PDF Text Activity
If this didn’t work please share a sample pdf
Hi!
The email extraction worked but the mobile number did not.
Can you also assist me on how to extract the rest of the variables. Thank you!
Raheem Mohamed Resume.pdf (64.3 KB)
Hey @audrxyx
Try this:
Variable of type string Mobile = System.Text.RegularExpressions.Regex.Match(PDFText, “(?<=Mobile:\s*)\d+”).ToString
You can learn Regex by checking out my Regex MegaPost
Cheers
Steve
@audrxyx Try the below one for mobile number
Variable of type string Mobile = System.Text.RegularExpressions.Regex.Match(PDFText, "(?<=Mobile:\s+).*").ToString