Retrive Phone Number from unstructure documents

Hi All,
I have number of unstructure resumes in a folder.
So i need to capture phone number and store those into a excel.
Can any one help me like how to retrive the phone number from unstructure documents(Docx or PDF)?

1 Like

Can any one please help. Its urgent requirement for me

@Abhinandan ,

Can you please try below logic

  1. Read PDF Text Activity this will read ur unstructured data and convert into Text format.
  2. Assign String phoneNumber= System.Text.RegularExpressions.Regex.Match(OutPDFText,"((?<=PhoneNumber.).*)").Value

Please work on this and let me know whether it’s useful or not

Thanks,
Arunachalam.

1 Like

Hi @Arunachalam,

thanks for the reply. Yes, it will work if the document contains string like PhoneNumber. but in my document it will differ like "cell, mobile, phone, phone number, contact number, contact "so many like this and some times with any prefix they mentions phone number directly. so in that case how can i proceed?

Hello @Abhinandan can you give some sample text containing phone number in different format…
phone number format can be like +91 11111 11111 or +911111111111 or +91 11111-11111 etc

@Abhinandan,

Can you please give me the sample document or pdf so that I can work and sent it to you.

Thanks,
Arunachalam.

Documents.zip (321.4 KB)

Please find the attached zip file for your reference

Hi @AkshaySandhu

+971 56 926960, +971-125-52758, +97150249167, +45-(0)77-0400-321, (+92-54-2727137)
these are the sample formats

can you try below…
str = Text of pdf or doc

str = str.replace(" “,”").replace("-","").replace("(","").replace(")","")

your result will be like: +97156926960,+97112552758,+97150249167,+450770400321,+92542727137

then use regex to get the phone number
System.Text.RegularExpressions.MatchCollection matches = System.Text.RegularExpressions.Regex.Matches(str,"+[0-9]{12}|+[0-9]{11}|+[0-9]{10}")

for each item in Matches
log message = item.tostring
next

but please note in this case you will loose all “()- < space>” chars in phone numbers

update: sample xaml attached…Test.xaml (7.8 KB)

3 Likes

@AkshaySandhu,

Can you please try this way.

match= System.Text.RegularExpressions.Regex.Matches(str,"((+[0-9]).*)")

I got output on your case. please check and let me know

Don’t Replace Str value into str.replace(" “,”").replace("-","").replace("(","").replace(")","")

Test (1).xaml (10.2 KB)

Thanks,
Arunachalam

agree "((/+[0-9]).*)" this pattern will work…
but it will give the correct result only if phone number is at the end of the line and there is not text after it on that specific line…
for string example below
ABCDEFG
Ref: LC242-5054
Mobile: +971-125-527858 sfdascsd
Salman.xyz@gmail.com

your pattern will give you +971-125-527858 sfdascsd which is wrong…

I modify your pattern a bit like below
((\+[0-9])(.*[0-9]))
It is working fine now…
just make sure there is no digit after the phone number else it will fail as well

IRC59810.zip (3.1 MB)

Thanks all for your replies,
Please find the attached zip file which is not getting phone number as expected.
Please help me to complete this requirement.
And If you observe last 2 digits of phone numbers are missing
Thanks

I would recommend stick to my original solution of using this regex pattern "\+[0-9]{13}|\+[0-9]{12}|\+[0-9]{11}|\+[0-9]{10}"… it is working fine for both PDF and word

if this is giving wrong result… kindly let us know that specific pdf/word file…

1 Like

Thanks @Arunachalam and @AkshaySandhu,
Problem solved with ur inputs. Thanks alot

@Abhinandan,

Wow good to hear :grinning: Can you please mark as a solution. so that our folks can get info of this window.

Thanks,
Arunachalam

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.