How extract text as different variables

I am currently separating the line below (read from PDF) by space and use the count function to extract them into different variable (i.e., name, ID, nationality, address,ACRA, Date). However, the name and adress always have different number of index which is causing a lot of trouble.

How can I extract each variable individually?

EXAMPLE 1:
CHANDRA M PARESH S1234567G SINGAPORE CITIZEN 11 HOLLY CRESCENT ACRA 16/01/1999

EXAMPLE 2:
LILATI W/O J MANHAN PARESH S1234567B SINGAPORE CITIZEN 11 LUCKY PANDA STREET 2 ACRA 01/01/1910

Do we have a specific format for the ID?

no, it can be 7/8/9 characters

What about the number of digits inside it? Is there any specific rule/count?

The thing is if we can get the ID, we will be able to catch what’s before it (name) and what’s after it (country and address)

Hi @Leo88

Please check the attached xaml extracts the values as expected.

Using Regex we can extract name, address, street. etc., and assigning to a variable.ExtractAddressUsingRegex.xaml (8.7 KB)

Thanks,
Boopathi.

@Charbel1
no specific rule/count, as the format of the ID for different country is different.

We need something standard to be able to have a code that works for all cases…

Do you have a list of countries that are used, maybe?

@Charbel1
something that is common is ACRA/OSCAR which will apear for every line of text that contains the key person’s information

@Boopathi.M the ID format and country names always varies. The xmal file provided is only applicable for those from Singapore…any other ways that I can solve this?

Hi @Leo88

Could you please provide some sample formats or pattern for different countries.

Thank,

Some countries are:

  1. India
  2. United Arab Emirates
  3. Virgin Islands, British
  4. Chinese
  5. Australia

I dont have to extract the ID anymore, just need the name and country