Can help on regex activity

Hi all, I have multiple pdf templates which have different templates. When I have the read with PDF, it difficult to use regex since it all in a line

for example:
PDF1
1a. Nama Kantor : Dewi Pancar Sdn Bhd. US DOLLAR 15,574.0000

PDF2
3a. Nama Kantor : Dewi. Pancar Sdn Bhd US DOLLAR 15,574.0000

PDF3

  1. Nama Kantor : Dewi Pancar Sdn. Bhd. US DOLLAR 15,574.0000

my current regex in the data table = (?<=1a.\s+Nama\s+Kantor\s+:\s+).+ which will read the whole line.
may i get help on this. Thank you

@Binti_Sulaiman_Nurulain_A

May I know if you need each value to be separated into different firlds

If yes…as always last is number extract the number first …regex - [\d,]+\.\d+

Then it looks like the country names comes so split with dpace and get only the last word str.Trim.split(" "c).Last

Then whole of remaining part looks like the name …so can get that

Cheers

Yes sir, i only need the information of “Dewi… Bhd” only and excluded the US DOLLAR 15,574.0000

if its in data table what should i put my value

@Binti_Sulaiman_Nurulain_A

Try this

(?<=Nama\s+Kantor\s+\:).*(?=\s+\w+\s+\w+\s+[\d,]+\.\d*)

Cheers

@Binti_Sulaiman_Nurulain_A

you can use below regex
(?<=Nama\sKantor\s:\s)(.*)(?=\s+US)

before using this as direct values use one assign activity and store output of this regex then you can that variable in datatable

1 Like

Thanks sir, its worked :smiley:

may i know if the information we need is repeated in the PDF what should i do for the regex? same condition as above as well.
For example, i need to extract the information of all below, but the nama is the same and multiple.

PDF1
Vendor
1a. Nama : Dewi Pancar Sdn Bhd US DOLLAR 15,574.0000
Entity
2a. Nama : Seri Dinamik Sdn Bhd

PDF2
Vendor
1a. Nama : Dewi Pancar Sdn Bhd. US DOLLAR 15,574.0000
Entity
2a. Nama : Seri. Dinamik Sdn. Bhd.

Thank you sir

@Binti_Sulaiman_Nurulain_A

For each type you need to use a separate regex…here you dont have a following number and dollar so use only nama as anchor and get the value

Cheers

yes sir, but some of the pdfs there is some included, some are not. meaning the number is there, if there is more information.

I afraid if i put the nama only, it will read wrongly. so i was thinking to include in the number as well.
Do you have any advice on this.
Thank you

@Binti_Sulaiman_Nurulain_A

Then first extract full line with using only nama as in the regex…

Then verify if there is us dollar and number…if yea then remove them if no …leave it as is.

For the above one regex might not suite so a little logic of extracting whole and then checking for extra values is needed

Cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.