Extracting a specific data from txt file

studio
regex

#1

Hi,

I tried to extract invoice data from txt file but it seems i am not able to extract data or there is some issue with my RegExpression code as you can see below
invoice\s((no|number)&\s)?\d{4,}
The above code i have written to extract data from text file. i am attaching you the dummy text file below.
i want to extract from txt file only INVOICE # 850113 and sometimes it can be invoice no or invoice number followed by the 0-9 numbers. I have pasted a dummy txt file info please help me in this.

INVOICE
123 Basedow Street
Leipzig, DE, 04277 DATE 6/6/2016
Phone: 341 600 800 INVOICE # 850113
Fax: 341 600 801 CUSTOMER ID A700
Website: www.tiefland.com DUE DATE 7/21/2016
BILL TO
IDES AG Frankfurt
231 Lyoner Street
Frankfurt, DE, 60441
Phone: 69 700 777
UNIT PRICE QTY TAXED AMOUNT
5,000.00 1 X 5,000.00
[42] Subtotal 5,000.00
Taxable 5,000.00
Tax rate 10%
Tax due 500.00
Other -
TOTAL € 5,500.00
123 Basedow Street
Leipzig, DE, 04277
Bank Name: Ostbank Berlin
Bank Account: 7387324
IBAN Code: DE560000997387324
Tiefland Glass AG

Total payment due in 45 days
Please include the invoice number on your check
DESCRIPTION
Seitz Freun, 010/32323, seitz.freun@tiefland.com
Professional services
Make all checks payable to
Tiefland Glass AG
If you have any questions about this invoice, please contact
OTHER COMMENTS
Invoice Template © 2013-2014 Vertex42.com


#2

Hi aamir,

You expression is almost right. Try with invoice\s((no|number|#)\s?\d{4,}) (RegexOption: IgnoreCase)
invoice_regex.zip.zip (2.2 KB)


#3

Thanks it worked but may i know what exactly the error i was making and it was not coming


#4

You’ve added the “&” which in your expression meant that it matches the “&” character. So you’ve done the hard work here, I just ‘polished’ it a little. :wink:


#5

Thank you so much


#6

Hi,

Can you help me with one more thing actually i have written all the code and everything but i am not getting anyoutput it seems from my end that code us okay but i am not getting any output.
What i have done is i ahve loaded all the pdf in one folder i am converting one pdf into txt file one at a time and then extracting specfic data and writing it in o/p but its not working should i send you my code.


#7

Hi,

The regular expression was working great in one pdf but in another pdf i am trying to convert it into text using read pdf activity but its not getting converted so i have converted using Read pdf with ocr and then it got converted but here it seems regular function is not extracting any data. i am pasting some small part of content of pdf from where i want to extract invoice number along with its data but its not working.

.-’ Telephone : 044-42277374 Fax: 04443060622 - , _ .
‘ Email : loyal@loy’altextiles.com CIN 1 L17111TN1946PLC001361 '
‘ . Loyal Textile MinsLtd,’ . " Invoice Number :F3D170183
Dispacthed From :A I Invoice Date ‘5 : 17-Nov-2017 '
LOYAL .SVUPEB‘FABRICS, ‘ . I _ - ’ Sales Order No. : 222—170065 ~