String Manipulation with Index position,split based issue

hi,
i have been working String Manipulation on Invoice data.
i want data like this below

CUSTOMER ID:: 6406484453
SHIPMENT :: SSIN9125999
CLIENT VAT # :: 4200189118
DUE DATE:- 30-Dec-18
TERMS :: 60 days from EOM
SHIPPER :: CUINS ABCDR PACIFIC PTEL LTD
CONSIGNEE :: CUMMINSDF GLOBAL LOGISTICS Imports OCN
ORDER NUMBERS / OWNER’S REFERENCE :: IND4366318
CUSTOMS ENTRY DETAILS : 00655953JSA20181008585277, 585277, JSA201810085037472


i tried all above but i am not getting proper data.
can anybody help to me .String_manipulation.xaml (9.9 KB) invoice.txt (451 Bytes)

Hi @Anand_Designer,

Can you share the invoice sample as well?

it is confidentiality Doc. thats why i attached invoice.invoice.txt (451 Bytes) txt.

I would suggest that you use regular expression to extract the information. The general pattern should be (?<=LABEL\s+).+ and then use RegEx.Match() to get the first match, e.g.

CustomerID = System.Text.RegularExpressions.RegEx.Match(InvoiceText, "(?<=CUSTOMER ID\s+).+").Value

Slash has a different meaning in regular expression so remember to escape it as \/

OrderNumbers = System.Text.RegularExpressions.RegEx.Match(InvoiceText, "(?<=ORDER NUMBERS \/ OWNER'S REFERENCE\s+).+").Value

The only issue I can see with the input data (invoice.txt) is that the information for SHIPPER and CONSIGNEE is merged. It looks like a table with the formatting stripped away. There’s no reliable way to separate them unless there’s a pattern, e.g. if the shipper’s name always ends with “LTD”. If you have access to the original document, then you can use data scraping to retrieve the table instead.

image

@ptrobot Thanks for your solution. in my all pdf same pattern shipper and consignee . but shipper always ends with LTD .
can’t we split like this use Regex?
shipper :1.CUINS ABCDR PACIFIC PTEL LTD
consignee :2.CUMMINSDF GLOBAL LOGISTICS Imports OCN

In that case we can split it. First we will extract the whole line for SHIPPER and CONSIGNEE and then we will split them using regular expressions. Or you could also use substring() to split them.

ShipperConsignee = System.Text.RegularExpressions.RegEx.Match(InvoiceText, "(?<=SHIPPER CONSIGNEE\s+).+").Value
Shipper = System.Text.RegularExpressions.RegEx.Match(ShipperConsignee, ".+LTD").Value
Consignee = System.Text.RegularExpressions.RegEx.Match(ShipperConsignee, "(?<=LTD\s+).+").Value

It’s great working fine, total on Regex base. Thanks @ptrobot

You are welcome! Great to hear that it’s working.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.