How to get only numbers from PDF file?

Suppose there is a pdf containing

Name: xyz
Phone number: 9663743

i want to get “9663743” i.e numbers .

so this program should be able to extract all the numbers present in the pdf…

I tried with “Number Only” option from Dropdown menu of Google OCR but that is giving different result

-Get the content of the pdf into string using “Read Pdf text activity”.
-Use “Matches” activity with input string as pdf string and pattern as “\d{length}”
replace length as the length of numbers you want to extract ie for 9663743 regex is “\d{7}”
if length is 7 or more digits use expression "\d{7,}
-output will be the IEnumberable of Match, you can get each value from the ienumerable using the for each actvity with type as Match.

@chinmay_dhabal

Read the entire PDF Text and extract number values from the same.

Please find below 2 methods

Hope it helps…:slight_smile:

Regards
Madhura

1 Like

Thanks @palindrome @Madhuraj

SAMPLE.pdf (84.4 KB)
I want to get whatever number present in the pdf … eg .only number “9663743” from this pdf

1 Like

@chinmay_dhabal,

Refer to below mentioned screen shot.

Regards
Madhura

2 Likes

I get permission error when I tried this. Any suggestions why? thanks

Hi Sam,

What is the error? “Permission missing: Launcher”?
If so, please refer to this post:

That post was helpful, Thank you

Hi Everyone

I have a doubt, For example i have to Run a cycle in some data processing tool, after every run the tool will update the start time and Finish time data down side to the last run data.
Now i want to read the latest Finish time and write into an excel sheet. Can anyone help me in this. Below is the Example.

Start:
Start time:12:00:00PM
Finish time: 12:15:00PM

Restart1:
Start time:01:00:00PM
Finish time: 01:15:00PM

Restart2:
Start time:03:00:00PM
Finish time: 03:15:00PM