Extracting Table data from the Screen scrapped text

s.txt (1.6 KB)
I was using screen scrapping for the table because Data Scrapping is not possible for this Structure. I need to extract the data from the text. Here I have three table data and I need to extract the values and store it in a excel file.
This is table structure refer it for field names
image

The approach I used is REGEX but as you see in table each field has multiple values and also in excel i need to store the table format like
If Item no is the field name means then the three table values are stored under this in each row one by one, like that I need to do for all field names.
Can anyone have any Idea how we can handle this?
Thanks in Advance,

Hi,

Hope the following sample helps you.

mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quantity(?<QUANTITY>\d+)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")

Sample20220830-4.zip (4.0 KB)

Regards,

1 Like

Thanks @Yoichi . It’s Working
Can you recommend some resources to learn more about REGEX?

Hi @agathiyanv

Have a look on the learn material

How to Extract Data With RegEx in UiPath – andersjensenorg.

Regards
Gokul

sampletext.txt (3.0 KB)
Your pattern is not working for this text please help!

Hi,

Because your sample seems to have typo.

image

I think it should be “Quantity” ? (t is missing)

Or if you want to accept this, the following will work.

mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d+)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")

Regards,

Can you share with workflow because i cannot able extract the data.

Hi,

Oh, we need to modify one more thing because there is no numeric for Quantity. Can you try the following?

mc = System.Text.RegularExpressions.Regex.Matches(strData,"Item No(?<ITEMNUMBER>\d+)\D+?ModificationIndex(?<MODINDEX>[A-Z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[A-Z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d*)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)")

Sample20220830-4v2.zip (4.7 KB)

Regards,

1 Like

Thank you for your help. The main issue here is the data type except quantity,Net Price,Discount Percentage everything is alpha numeric data type. But I don’t know how to modify the types.

Hi,

Can you try to use [0-9A-Za-z]+ or [0-9A-Za-z]* instead of \d+ or \d* ?

Regards,

LineItem_Text.txt (3.3 KB)
image
When I did the modification in that pattern , the output excel file is empty.
I don’t know what I did wrong and you can refer the screenshot, for data I need to extract from the text for all the line items.

Hi,

Can you try the following pattern?

(Item No|Item Number )(?<ITEMNUMBER>\d+)\D+?(ModificationIndex|ModificationIndexModification Index )(?<MODINDEX>[A-Za-z0-9 ]+)\s*Item No[\s\S]*?SupplierPart(?<SUPPLIERPART>[0-9A-Za-z ]+)\s*?ADDITIONAL INFORMATIONS(?<ADDITIONALINFO>.*)\s*?Description(?<DESCRIPTION>[A-Z0-9 ]+)\s*?PR(?<PR>\d+)\s[\s\S]*?Quant?ity(?<QUANTITY>\d*)\s[\s\S]*?UOM(?<UOM>[A-Z]+)\s[\s\S]*?DeliveryDate(?<DELIVERYDATE>\d+/\d+/\d+)\s*?Net Price(?<NETPRICE>[\d.]+)\s*?LotPrice(?<LOTPRICE>[A-Z]+)\s*?Discount Percentage(?<DISCOUNTPERCENTAGE>[\d.]+)\s*?AdditionalCosts(?<ADDITINALCOSTS>[\d.]+)

Regards,

1 Like

It’s Working

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.