How to extract specific text from invoice

Hello everyone,
I m working on multiple invoices and invoice format like

From this invoice i have to extract some specific data which is
Invoice No :- 688/APR/24-15 , Dated 27-APR-24
State name from first block

Once the extract data then i have to compare with different invoice the values are matching or not.

Thanks.

Hi @suraj_gaikwad

Can you share the text tile and are the words static for all the PDF’s.

Regards

I already shared .
No words are not static its totally different but the format is same.

Which title u want

@vrdabberu

Hi @suraj_gaikwad

Is Invoice No., Dated and State Name constant or varied from different PDF.

Regards

@vrdabberu

Yes, once its extract i have check in different invoice the value same or not.

Hi @suraj_gaikwad

May be text file would help in writing the regular expressions. Please share that. Fine if it’s filled with dummy data also.

Regards

ST TAX INVOICE-e-Invoice

IRN:3fbbc41166fa3f81898b207b63bdd3b6d77485a14d1–

dde474f9f753fba0c7f83-

Ack No.: 122421135617409-

HP-7.3)

Ack Date: 25-Apr-24-

HP <7.3)

Bytes World Invoice No. Dated-

H.O-C-12 Siddhpura Indl Estate 688/APR/24-25-27-Apr-24-

Amrut Nagar Ghatkopar West Mumbai-86 Delivery Note Mode/Terms of Payment-

y no sponsors.

nsor today!

Design and Development tips in our inbox. Every weekday.

TEL NO.-022-42349000-30 DAYS

SALES OFF: 4B, Gr. Floor Vijay Chambers Reference No. & Date. Other References

*1140 Tribhuvan Road Grant Road (E)-

Mumbai-400-004, Maharashtra Code 27-Buyer’s Order No. Dated

Tel No. 022-43216000

GSTIN/UIN: 27AAGFB5950J1Z9 D | ESL | EWMTECH | 2024-25/8-23-Apr-24-

The data is comin like this after read

@vrdabberu

Hi @suraj_gaikwad

Invoice No.:
Regex Pattern: (?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+

Assign activity syntax: InvoiceNo = System.Text.RegularExpressions.Regex.Match(strPDFText,"(?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+").Value.Trim()

Date:
Regex Pattern: (?<=(?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+\-)\d+\-[A-Za-z]+\-\d+

Assign activity syntax: Date = System.Text.RegularExpressions.Regex.Match(strPDFText,"(?<=(?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+\-)\d+\-[A-Za-z]+\-\d+").Value.Trim()

State Name:
Regex Pattern: (?<=\d+\,?\s*)[A-Za-z]+(?=\s*Code)

Assign activity syntax: StateName = System.Text.RegularExpressions.Regex.Match(strPDFText,"(?<=\d+\,?\s*)[A-Za-z]+(?=\s*Code)").Value.Trim()

Regards

@vrdabberu

Why its not matching in regex?

Hi @suraj_gaikwad

Try in regexr website

Regards

@vrdabberu

Still showing error

Thanks

Hi @suraj_gaikwad

Try this for Date
(?<=(?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+\s*)\d+\-[A-Za-z]+\-\d+

Regards

@vrdabberu

Its work but the date format is changing like
In invoice 9-05-2024, 09-05-2024, 09-May-2024, 9-May-24

Thanks

Hi @suraj_gaikwad

Try this. This will work for all type of Dates you have mentioned.

(?<=(?<=[A-Za-z].*)\d+\/[A-Z]+\/\d+\-\d+\s*)\d+\-[A-Za-z0-9]+\-\d+

Regards

@vrdabberu

Its worked but when i used different data of different invoice then its not working.

Same for the invoice number

Thanks

Hi @suraj_gaikwad

Can you share that text please

Regards

IRN: fe47bdeb9804b1c94091034693f911cf1714e6b08be4e0619- 4de399c919616fc
Ack No.: 122420089794176
Ack Date: 7-Feb-24
VISUAL SOFT Invoice No. Dated
F-004, 8-05-2024
Mala 40009. Delivery Note Mode/Terms of Payment
TEL: 91 22 2843,2842
Gst/UIN: 27AVHPS3745C1ZH Reference No. & Date. Other References
State Name: Maharashtra, Code: 27
E-Mail: salesNo. Dated

C2024

Thanks

@vrdabberu

@vrdabberu

Have any idea?

Can you tell me the date invoice number from this invoice? @suraj_gaikwad

@vrdabberu

9-05-2024, 09-05-2024, 09-May-2024, 9-May-24 these rype of date coming in invoice l.

Thanks

1 Like