Hi all,
I’m having problem with get the text from pdf.
I wanna to get the specific text from the pdf, for example, the name and the IC number (the text in the textbox.
Which method should i use?
Regards,
Lean
Hi all,
I’m having problem with get the text from pdf.
I wanna to get the specific text from the pdf, for example, the name and the IC number (the text in the textbox.
Which method should i use?
Regards,
Lean
@lyjun550 If the format is same always, then try using Read PDF Text Activity and get the text in the PDF as a String, Output the String in a Message Box. We might be able to apply Regex for it to Extract the details needed.
This is the output from read PDF Text.
The name and IC No, are not with the same location.
Sorry, i dont have example…
for the payee name and purpose also moved, and the tick cant be read.
How regax work?
Click the icon below in the Output panel.
Save to desktop and upload.
De-identify the request if needed.
Copy contents and paste them here
We can then do our best to assist.
Can you please bold what you are trying to obtain .
07/14/2020 14:44:39 => [Debug] Execution started for file: test
07/14/2020 14:44:44 => [Info] Extract PDF execution started
07/14/2020 14:44:48 => [Info] Authorization For Salary Deduction
Date:
The Human Resource Officer
Tokio Marine Life Insurance Malaysia Bhd. (457556-X)
Menara Tokio Marine Life, Ground Floor,
189, Jalan Tun Razak,
50400 Kuala Lumpur.
Dear Sir,
Abu Bakar 010101000011
I , (I/C No ) ,
hereby authorise TOKIO MARINE LIFE INSURANCE MALAYSIA BHD. to deduct the sum of
RM 9999 from my salary and remit on my behalf to the following:
Payee’s Name
Chin Chan
Purpose
1231414
Salary deduction
4 October
Once; Salary deduction only for the month of __________________
Please Select
Recurring; Salary deduction effective from the month of ________________ and this authorization will
remain in force until and unless revoked by me in writing.
Thank you
Yours truly,
Employee No : 1
Dept / Branch : Information Technology
@Steven_McKeering
Date:
The Human Resource Officer
Tokio Marine Life Insurance Malaysia Bhd. (457556-X)
Menara Tokio Marine Life, Ground Floor,
189, Jalan Tun Razak,
50400 Kuala Lumpur.
Dear Sir,
Abu Bakar 010101000011
I , (I/C No ) ,
hereby authorise TOKIO MARINE LIFE INSURANCE MALAYSIA BHD. to deduct the sum of
RM 9999 from my salary and remit on my behalf to the following:
Payee’s Name
Chin Chan
Purpose
1231414
Salary deduction
4 October
Once; Salary deduction only for the month of __________________
Please Select
Recurring; Salary deduction effective from the month of ________________ and this authorization will
remain in force until and unless revoked by me in writing.
Thank you
Yours truly,
Employee No : 1
Dept / Branch : Information Technology
Date : 12 12 12
The Abu Bakar and the numbers have to seperate
Hello
Please provide the text result for the tick box and the month sample.
The other results are below. Insert them into a Matches activity.
To get the digits after RM:
(?<=RM)\s+(\d+)
Regex101 link
Must be digits only.
Then use an assign activity with the following to get group 1.
INSERTVARIABLE(0).Groups(1).Tostring
Payee’s Name:
(?<=Payee.s Name)\s+\n(.*)
You will need use group 1 to get the result.
Regex101 link
Purpose
(?<=Purpose)\s+\n(.*)
Get group 1 again.
Regex101 link
4 October - Double check this one.
\d+\s\w+\n(?=Once;)
Regex101 link
This will work as long as the next line starts with “Once;”
Dept / Branch
(?<=Dept / Branch)\s+:\s+(.*)
Get Group 1 for this one.
Regex 101 link
Date
(?<=Date)\s+:\s+(\d+\s\d+\s\d+)
Get group 1 for the result.
Regex101 link
Employee No
(?<=Employee No)\s+:\s+(\d+)
Get group 1 for the result.
Regex101 link
Abu Bakar AND 010101000011
(?<=Dear Sir.)\s+\n\s*([\D]+)\s(\d+)
Get group 1 for Abu Bakar
Regex101 link
To get the number 010101000011
Regex pattern: (?<=Dear Sir.)\s+\n\s*([\D]+)\s(\d+)
Use group 2 for 010101000011
If this helped, please marked as solved
Thanks! but how to seperate the group 1 and group 2 like Abu Bakar and 01010101
You will need to clean the string first.
UiPath thinks there is invisible characters…
System.Text.RegularExpressions.Regex.Replace(INSERTVARIABLE, “[^a-z A-Z 0-9]”, “”)
So,
I had to clean the string the invisible/illegal characters before UiPath would like it
Workflow attached.
Main.xaml (15.4 KB)
If this helped, please mark as solved.
It works, thanks!
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.