Regex for extracting invoice information

Hi Team,

I have the below text from 3 different invoices and I would like to have a regex to extract it. Can you please help me with regex thanks,

INVOICE 647-444-1234 Your Address City, State, Country ZIP CODE Invoice Total Billed To Invoice Number $4520.00 oooo00 Client Name 1 Client Address Date Of Issue City, State, Country 10/07/14 ZIP CODE Description Unit Cost Qty Hr Rate Amount Your item Name $1000 o00 Item description goes here Your item Name $1000 o00 Item description goes here Your item Name $1000 1o00 Item description goes here Your item Name $1000 o00 Item description goes here $4000.00 Subtotal $520.00 Tax $4520.00 Total Invoice Terms Amount Due (USD) $4520.00 Ex. Please pay your invoice by.
INVOICE East Repair Inc. 1912 Harvest Lane New York, NV12210 Bill To Ship To US-001 Invoice # John Smith John Siith Invoice Date 11/02/2019 2 Court Square 3787 Pineview Drive P.O.# 23122019 Yorle, 1/Y 12210 Cambridge, MA 12210 few 26/02/2019 Due Date QTY DESCRIPTIOI UNIT PRICE AMOUNT Front and rear brake cadles 1a0.00 1go.00 15.00 30.00 New set af pedal arins Labar 3hrs 5.00 15.00 145.00 Subtolal Sales Tax 6.25% 9.06 TOTAL $154.06 zna Gu Terms & Conditions days Payment due within 15 Please make checks payable lo: East Repair Inc
INVOICE Logo Name Too brelcon oohere rjlrrscere5a5 fn 114784712 fiaae46]3354016] EO 3[ave5s 41r:[204)]552=1]44 fnairecailasarneinlplsaciarn Gill Ta: Eeluas cemsgn" Cemzary al]5= 3rlggrd3Ig! aanru |304|555-1183 Guaniin Dcseriorigr Unil pricc amaunl Isa Disaunl auulicd T3__ a5587 IeriuMal 706"4"= uuuDL ssg Ijsuil nealeues raraeste Aduegnlalaauno allgg54) ealarcgeus 10)4,72 Aoree
INVOICE Company Name [Stree: Accress] ICity, ST ZIP] LATE 12!9!2019 Phane: [uuu ugu uuuu] IIVOICE [123456] Fax: uuu uuu uugu] CUSTOMER I[ [123] DUE DATE 1/8/2020 Mebsiie: sull edumain.coli BILL TO [Manie] ICani pany Manie] [Stree: Accress] ICity, ST ZIP] IPhone] DESCRIPTION TAXD AMOUNT ISenice Fee] 230.00 ILaor: S75Jhr] 375.00 nqurs 345.00 [Parts] Subtocal 950.00 345.00 Taatl= OTHER COLJIEITS Tax rat= 6.25u# Tocal paym ent due in 3u day Tax cu= 21.56 2. Hleae include de inygice numter an ur check Ozher TOTAL 971.56 Make all checls payalle to [Your Company Mame] ju have any quesions aboui ihis inygice ease conliaci [Hani= Phon= Eni ail] Thank You For Your Business! lutrs:llriri#vetex42.cunu ExclTenplst=s 2010-2019 Dy Vetex42 lencel inveice templste hti illwels= Telcllat=


@srinivas_pradeep - you didn’t mention what fields you want to extract from this?

To extract all the information from the invoices, i would suggest to try using Document Understanding method.

Thank you @prasath17 . I would like to extract

  1. Invoice number
  2. Invoice date
  3. Total Amount
    Sure will try Document understanding. But would also like to know what form regex we can use to extract this information.

Assuming you are extracting text and then using RegEx, here’s something you can give these shot.
In the first RegEx you may to do some additional parsing like splitting the string and extract only the Invoice # from the raw text.

But like others stated, this is a typical use case for using DU framework and extracting information using the Form extractor

Hope this helps

@srinivas_pradeep - All 3 invoices are completely different , so I don’t think Regex can be used here. Or you have write 9 Regex Patterns for extracting the info from 3 invoices.

As I mentioned DU is your best bet.

Thank you @prasath17

Thank you @rpavanguard . Sure will try this.

@prasath17 - DU also uses some kind of regex correct at the backend? please correct me.

@srinivas_pradeep - Nope…In DU you have different extractors → ML , Form based, Intelligent Form and Regex Based extractor…

1 Like

Yes it does. Advantage of the RegEx extractor is that you don’t have to rely on API Endpoints such as the Form Extractor. But that said, the RegEx extraction methods have to work reliably.

1 Like

@srinivas_pradeep - Please start with the below video, below one covers the invoices using ML extractor.