PDF Extraction of specific fields

I have the following results extracted from pdf. Can someone help me with precise regex to extract the fields?

@"ZSMENT Ox
Se
<5, rm fee] a
=
* er a * LAND TITLE CERTIFICATE
BE
g AD
RG 4
AD
Sea
s
LINC SHORT LEGAL TITLE NUMBER
0018 489 989 406180;10;8 212 049 869
LEGAL DESCRIPTION
PLAN 4061EO0
BLOCK 10
LOT 8
EXCEPTING THEREOUT ALL MINES AND MINERALS
ESTATE: LEASEHOLD , FOR A TERM OF 042 YEARS
COMMENCING ON THE 01 DAY OF APRIL , 2010
TERMINATING ON THE 31 DAY OF MARCH , 2052
ATS REFERENCE: 6;1;45;16
MUNICIPALITY: MUNICIPALITY OF JASPER
REFERENCE NUMBER: 162 188 380
REGISTERED OWNER(S)
REGISTRATION DATE (DMY) DOCUMENT TYPE VALUE CONSIDERATION
212 049 869 24/02/2021 TRANSFER OF $760,000 $760,000
LEASEHOLD TITLE
OWNERS
JAE HONG PARK
AND
S00 AH PARK
BOTH OF:
617 GEIKIE STREET
JASPER
ALBERTA TOE 1EO
AS JOINT TENANTS
( CONTINUED )ENCUMBRANCES, LIENS & INTERESTS
PAGE 2
REGISTRATION 4 212 049 869
NUMBER DATE (D/M/Y) PARTICULARS
SEE TITLE FOR ESTATE OF LARGER EXTENT,
IF ANY, FOR REGISTRATIONS PRIOR TO LEASE
212 049 870 24/02/2021 MORTGAGE
MORTGAGEE - SERVUS CREDIT UNION LTD.
151 KARL CLARK ROAD NW
EDMONTON
ALBERTA T6N1HS
ORIGINAL PRINCIPAL AMOUNT: $570,000
TOTAL INSTRUMENTS: 001
THE REGISTRAR OF TITLES CERTIFTES THIS TO BE AN
ACCURATE REPRODUCTION OF THE CERTIFICATE OF ha
\
TITLE REPRESENTED HEREIN THIS 10 DAY OF Kama
FEBRUARY, 2023 AT 07:04 A.M. ==
ORDER NUMBER: 46467180 saa
Pi
CUSTOMER FILE NUMBER:  6490343-s CN
Tr
Dr
*END OF CERTIFICATE*
THIS ELECTRONICALLY TRANSMITTED LAND TITLES PRODUCT IS INTENDED
FOR THE SOLE USE OF THE ORIGINAL PURCHASER, AND NONE OTHER,
SUBJECT TO WHAT IS SET OUT IN THE PARAGRAPH BELOW.
THE ABOVE PROVISIONS DO NOT PROHIBIT THE ORIGINAL DURCHASER FROM
INCLUDING THIS UNMODIFIED PRODUCT IN ANY REPORT, OPINION,
APPRAISAL OR OTHER ADVICE PREPARED BY THE ORIGINAL PURCHASER AS
PART OF THE ORIGINAL PURCHASER APPLYING PROFESSIONAL, CONSULTING
OR TECHNICAL EXPERTISE FOR THE BENEFIT OF CLIENT (S)."

Hello @Umer_Shahid Welcome to the UiPath Community Forum

Can you please provide following details:

  1. Which digits are relevant to Title Number → starting with 0018*; or 212
  2. Please confirm if pattern will always be same

Thanks

Hi @Umer_Shahid

Can you mark the expected output in bold and will the pattern of the PDF’s be same. If yes, I can help you out with Regular expressions.

Regards

@Umer_Shahid
As per my understanding I have created a workflow. I am still not sure for address if the format will be fixed because as it is a Land Agreement (it can be JOINT, SINGLE, CO-OP etc tenants can be there).

Now talking about the specific regex, all of them are dependant on keywords:

  1. (?s)(?<=OWNERS)(.*?)(?=BOTH OF:) - It will extract anything between these two keywords.
  2. Title Number\s+([\d\s;]+) - This will extract all digits after Title Number till new line. And the further string manipulation is done in xml.
  3. Plan\s+(\w+) - This is a straightforward regex to extract digit after Plan. Replace Plan with “Block” and “Lot”, you will get those details as well.

Attached Workflow (Query File contains the same data you posted here) -
Test.xaml (15.3 KB)
query.txt (1.9 KB)

Thanks

Hi @Umer_Shahid

Check out the below workflow. As per the given input I have created the workflow.
Input.txt (1.8 KB)
Sequence1.xaml (10.1 KB)

Output:
image

Happy to help if you face any difficulties.
Regards

1 Like

Can you please send file again? Its not opening. Can you send zip?

Hi @Umer_Shahid

Check the below zip file:
BlankProcess25.zip (87.1 KB)

Happy to help if you face any difficulties.
Regards

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.