Hi All,
I have number of unstructure resumes in a folder.
So i need to capture phone number and store those into a excel.
Can any one help me like how to retrive the phone number from unstructure documents(Docx or PDF)?
Can any one please help. Its urgent requirement for me
Can you please try below logic
- Read PDF Text Activity this will read ur unstructured data and convert into Text format.
- Assign String phoneNumber= System.Text.RegularExpressions.Regex.Match(OutPDFText,“((?<=PhoneNumber.).*)”).Value
Please work on this and let me know whether it’s useful or not
Thanks,
Arunachalam.
Hi @Arunachalam,
thanks for the reply. Yes, it will work if the document contains string like PhoneNumber. but in my document it will differ like "cell, mobile, phone, phone number, contact number, contact "so many like this and some times with any prefix they mentions phone number directly. so in that case how can i proceed?
Hello @Abhinandan can you give some sample text containing phone number in different format…
phone number format can be like +91 11111 11111 or +911111111111 or +91 11111-11111 etc
Can you please give me the sample document or pdf so that I can work and sent it to you.
Thanks,
Arunachalam.
Documents.zip (321.4 KB)
Please find the attached zip file for your reference
+971 56 926960, +971-125-52758, +97150249167, +45-(0)77-0400-321, (+92-54-2727137)
these are the sample formats
can you try below…
str = Text of pdf or doc
str = str.replace(" “,”“).replace(”-“,”“).replace(”(“,”“).replace(”)“,”")
your result will be like: +97156926960,+97112552758,+97150249167,+450770400321,+92542727137
then use regex to get the phone number
System.Text.RegularExpressions.MatchCollection matches = System.Text.RegularExpressions.Regex.Matches(str,“+[0-9]{12}|+[0-9]{11}|+[0-9]{10}”)
for each item in Matches
log message = item.tostring
next
but please note in this case you will loose all “()- < space>” chars in phone numbers
update: sample xaml attached…Test.xaml (7.8 KB)
Can you please try this way.
match= System.Text.RegularExpressions.Regex.Matches(str,“((+[0-9]).*)”)
I got output on your case. please check and let me know
Don’t Replace Str value into str.replace(" “,”“).replace(”-“,”“).replace(”(“,”“).replace(”)“,”")
Test (1).xaml (10.2 KB)
Thanks,
Arunachalam
agree "((/+[0-9]).*)"
this pattern will work…
but it will give the correct result only if phone number is at the end of the line and there is not text after it on that specific line…
for string example below
ABCDEFG
Ref: LC242-5054
Mobile: +971-125-527858 sfdascsd
Salman.xyz@gmail.com
your pattern will give you +971-125-527858 sfdascsd which is wrong…
I modify your pattern a bit like below
((\+[0-9])(.*[0-9]))
It is working fine now…
just make sure there is no digit after the phone number else it will fail as well
IRC59810.zip (3.1 MB)
Thanks all for your replies,
Please find the attached zip file which is not getting phone number as expected.
Please help me to complete this requirement.
And If you observe last 2 digits of phone numbers are missing
Thanks
I would recommend stick to my original solution of using this regex pattern "\+[0-9]{13}|\+[0-9]{12}|\+[0-9]{11}|\+[0-9]{10}"
… it is working fine for both PDF and word
if this is giving wrong result… kindly let us know that specific pdf/word file…
Thanks @Arunachalam and @AkshaySandhu,
Problem solved with ur inputs. Thanks alot
Wow good to hear Can you please mark as a solution. so that our folks can get info of this window.
Thanks,
Arunachalam
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.