Hi all, I have multiple pdf templates which have different templates. When I have the read with PDF, it difficult to use regex since it all in a line
for example:
PDF1
1a. Nama Kantor : Dewi Pancar Sdn Bhd. US DOLLAR 15,574.0000
PDF2
3a. Nama Kantor : Dewi. Pancar Sdn Bhd US DOLLAR 15,574.0000
PDF3
Nama Kantor : Dewi Pancar Sdn. Bhd. US DOLLAR 15,574.0000
my current regex in the data table = (?<=1a.\s+Nama\s+Kantor\s+:\s+).+ which will read the whole line.
may i get help on this. Thank you
Anil_G
(Anil Gorthi)
December 17, 2024, 4:40am
2
@Binti_Sulaiman_Nurulain_A
May I know if you need each value to be separated into different firlds
If yes…as always last is number extract the number first …regex - [\d,]+\.\d+
Then it looks like the country names comes so split with dpace and get only the last word str.Trim.split(" "c).Last
Then whole of remaining part looks like the name …so can get that
Cheers
Yes sir, i only need the information of “Dewi… Bhd” only and excluded the US DOLLAR 15,574.0000
if its in data table what should i put my value
Anil_G
(Anil Gorthi)
December 17, 2024, 6:13am
4
@Binti_Sulaiman_Nurulain_A
Try this
(?<=Nama\s+Kantor\s+\:).*(?=\s+\w+\s+\w+\s+[\d,]+\.\d*)
Cheers
@Binti_Sulaiman_Nurulain_A
you can use below regex
(?<=Nama\sKantor\s:\s)(.*)(?=\s+US)
before using this as direct values use one assign activity and store output of this regex then you can that variable in datatable
1 Like
Thanks sir, its worked
may i know if the information we need is repeated in the PDF what should i do for the regex? same condition as above as well.
For example, i need to extract the information of all below, but the nama is the same and multiple.
PDF1
Vendor
1a. Nama : Dewi Pancar Sdn Bhd US DOLLAR 15,574.0000
Entity
2a. Nama : Seri Dinamik Sdn Bhd
PDF2
Vendor
1a. Nama : Dewi Pancar Sdn Bhd. US DOLLAR 15,574.0000
Entity
2a. Nama : Seri. Dinamik Sdn. Bhd.
Thank you sir
Anil_G
(Anil Gorthi)
December 18, 2024, 3:35am
7
@Binti_Sulaiman_Nurulain_A
For each type you need to use a separate regex…here you dont have a following number and dollar so use only nama as anchor and get the value
Cheers
yes sir, but some of the pdfs there is some included, some are not. meaning the number is there, if there is more information.
I afraid if i put the nama only, it will read wrongly. so i was thinking to include in the number as well.
Do you have any advice on this.
Thank you
Anil_G
(Anil Gorthi)
December 18, 2024, 3:51am
9
@Binti_Sulaiman_Nurulain_A
Then first extract full line with using only nama as in the regex…
Then verify if there is us dollar and number…if yea then remove them if no …leave it as is.
For the above one regex might not suite so a little logic of extracting whole and then checking for extra values is needed
Cheers
system
(system)
Closed
December 21, 2024, 3:52am
10
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.