Hi All,
I need to extract the items in “Deskripsi” and “Tipe/Kode” as below
I used System.Text.RegularExpressions.Regex.Match(match.Value, “(?<=^\d+\s)[A-Za-z\s.,()]+(?=\s[A-Z]+\d+)”).Value for the “Deskripsi” and I only get as the result of number 8,10,11,12.
8 mm tube
10 in))
11 in))
12 in))
Also, any idea for the regex of various “Tipe/Kode” codes?
Thanks a lot
1 Like
Yoichi
(Yoichi)
March 13, 2023, 2:29am
2
HI,
Do you use ReadPdfText activity? If so, can you share the text as a text file?
Regards,
Hi @Yoichi ,
Yes, I used readpdftext activity. Here’s the txt file
test.txt (636 Bytes)
Yoichi
(Yoichi)
March 13, 2023, 2:48am
4
Hi,
In this case, I recommend to use regex.replace and GenerateDataTable as the following, because we can easily handle each data from datatable.
strData = System.Text.RegularExpressions.Regex.Replace(strData,"(?<=(^|\r?\n)\d+)\s+|\s+(?=\w+(\r?\n|$))",chr(9))
strData = System.Text.RegularExpressions.Regex.Replace(strData,"^.*\r?\n","No."+chr(9)+"Deskripsi"+chr(9)+"Tipe / Kode"+vbCrLf)
Then get datatable using GenerateDataTable
Sample20230313-2L (2).zip (8.8 KB)
Regards,
Hi @Yoichi , thanks a lot!
It seems work but I got more question, how to just extract the table? since there are some sentences before the table. Maybe you can check on this txt file.
test1.txt (7.5 KB)
Yoichi
(Yoichi)
March 13, 2023, 6:03am
6
Hi,
In this case, it’s better to extract necessary lines in advance. Can you check the following sample?
mc = System.Text.RegularExpressions.Regex.Matches(strData,"(?<=No. Deskripsi Tipe / Kode\r?\n)(\d+\s.*\n)+")
then
strData =String.Join("",mc.Cast(Of System.Text.RegularExpressions.Match).Select(Function(m) m.Value))
Sample20230313-2L (3).zip (6.4 KB)
Regards,
@Yoichi looks great! thanks a lot
But there still some unnecessary line between and at the last. Any idea?
Yoichi
(Yoichi)
March 13, 2023, 6:24am
8
HI,
In my environment, it seems no problem as the following image.
Did you use same input data in the above?
Regards,
Hi @Yoichi , yes it’s the same data
Yoichi
(Yoichi)
March 13, 2023, 7:25am
10
Hi,
It seems strange…
I just modified it as the following. Can you try this?
System.Text.RegularExpressions.Regex.Matches(strData,"(?<=No. Deskripsi Tipe / Kode\r?\n)(\d+\s.*?\n)+(?=\D|$)")
Sample20230313-2L (4).zip (6.4 KB)
Regards,
2 Likes
It’s working! thanks a lot
1 Like
system
(system)
Closed
March 17, 2023, 2:05am
12
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.