s.txt (1.6 KB)
I was using screen scrapping for the table because Data Scrapping is not possible for this Structure. I need to extract the data from the text. Here I have three table data and I need to extract the values and store it in a excel file.
This is table structure refer it for field names
The approach I used is REGEX but as you see in table each field has multiple values and also in excel i need to store the table format like
If Item no is the field name means then the three table values are stored under this in each row one by one, like that I need to do for all field names.
Can anyone have any Idea how we can handle this?
Thanks in Advance,
Yoichi
(Yoichi)
August 30, 2022, 8:24am
2
Hi,
Hope the following sample helps you.
mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quantity(?<QUANTITY>\d+)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")
Sample20220830-4.zip (4.0 KB)
Regards,
1 Like
Thanks @Yoichi . It’s Working
Can you recommend some resources to learn more about REGEX?
Gokul001
(Gokul Balaji)
August 30, 2022, 11:30am
4
agathiyanv
(Agathiyan)
September 2, 2022, 5:32am
5
sampletext.txt (3.0 KB)
Your pattern is not working for this text please help!
Yoichi
(Yoichi)
September 2, 2022, 6:12am
6
Hi,
Because your sample seems to have typo.
I think it should be “Quantity” ? (t is missing)
Or if you want to accept this, the following will work.
mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d+)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")
Regards,
agathiyanv
(Agathiyan)
September 2, 2022, 10:00am
7
Can you share with workflow because i cannot able extract the data.
Yoichi
(Yoichi)
September 2, 2022, 1:44pm
8
Hi,
Oh, we need to modify one more thing because there is no numeric for Quantity. Can you try the following?
mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d*)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")
Sample20220830-4v2.zip (4.7 KB)
Regards,
1 Like
agathiyanv
(Agathiyan)
September 6, 2022, 5:53am
9
Thank you for your help. The main issue here is the data type except quantity,Net Price,Discount Percentage everything is alpha numeric data type. But I don’t know how to modify the types.
Yoichi
(Yoichi)
September 6, 2022, 5:56am
10
Hi,
agathiyanv:
alpha numeric data type
Can you try to use [0-9A-Za-z]+
or [0-9A-Za-z]*
instead of \d+
or \d*
?
Regards,
agathiyanv
(Agathiyan)
September 6, 2022, 6:31am
11
LineItem_Text.txt (3.3 KB)
When I did the modification in that pattern , the output excel file is empty.
I don’t know what I did wrong and you can refer the screenshot, for data I need to extract from the text for all the line items.
Yoichi
(Yoichi)
September 6, 2022, 7:11am
12
Hi,
Can you try the following pattern?
(Item No|Item Number )(?<ITEMNUMBER>\d+)\D+?(ModificationIndex|ModificationIndexModification Index )(?<MODINDEX>[A-Za-z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[0-9A-Za-z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d*)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)
Regards,
1 Like
system
(system)
Closed
September 9, 2022, 7:19am
14
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.