How to find a fixed starting position in PDF when the format is not fixed

Hi, I have an issue regarding on how to find a starting position in the PDF file.
Background: I am trying to extract Identity Card information like name, identity number, date of birth etc in my pdf. My PDF is a scanned document which can’t select an individual element.

I wanted to find a particular phrase “REPUBLIC OF SINGAPORE” as the starting position to start extracting my specific elements. Is there any possible way where I could find the position as I can’t select element by element? I have tried to scrap the information using Read PDF OCR (Tesseract) but the information seems to be inaccurate for different samples.

Please provide me with some guidance as I am really new to UiPath! Thank you.

@Palaniyappan @pavanh003 Could you please assist me to help as I have came across your explanation for other users and it’s really clear and precise? Thank you and hope you could share this post to your network to reach out to more people.

Hello

Have you thought about using a Regex expression to extract your information?

A Regex expression will allow you to match a pattern of text.

If you want to try using Regex - are you able to provide a sample, the expected output and information on the pattern.

If you want to learn Regex - check out my Regex MegaPost :slight_smile:

1 Like