How to get only numbers from PDF file?

pdf
ocr
code
activities

#1

Suppose there is a pdf containing

Name: xyz
Phone number: 9663743

i want to get “9663743” i.e numbers .

so this program should be able to extract all the numbers present in the pdf…

I tried with “Number Only” option from Dropdown menu of Google OCR but that is giving different result


Extract number from pdf
#2

-Get the content of the pdf into string using “Read Pdf text activity”.
-Use “Matches” activity with input string as pdf string and pattern as "\d{length}"
replace length as the length of numbers you want to extract ie for 9663743 regex is "\d{7}"
if length is 7 or more digits use expression "\d{7,}
-output will be the IEnumberable of Match, you can get each value from the ienumerable using the for each actvity with type as Match.


#3

@chinmay_dhabal

Read the entire PDF Text and extract number values from the same.

Please find below 2 methods

Hope it helps…:slight_smile:

Regards
Madhura


#4

Thanks @palindrome @Madhuraj

SAMPLE.pdf (84.4 KB)
I want to get whatever number present in the pdf … eg .only number “9663743” from this pdf


#5

@chinmay_dhabal,

Refer to below mentioned screen shot.

Regards
Madhura


#6

I get permission error when I tried this. Any suggestions why? thanks


#7

Hi Sam,

What is the error? “Permission missing: Launcher”?
If so, please refer to this post:


#8

That post was helpful, Thank you


#9

Hi Everyone

I have a doubt, For example i have to Run a cycle in some data processing tool, after every run the tool will update the start time and Finish time data down side to the last run data.
Now i want to read the latest Finish time and write into an excel sheet. Can anyone help me in this. Below is the Example.

Start:
Start time:12:00:00PM
Finish time: 12:15:00PM

Restart1:
Start time:01:00:00PM
Finish time: 01:15:00PM

Restart2:
Start time:03:00:00PM
Finish time: 03:15:00PM