Extract item description using regex

I have to extract item description from the below mentioned inputs and the input is extracted from a pdf where the pdf format changes and the item description value changes as well.
In input 1 the value to be extracted is IT support but there are cases where the value to be extracted may have 3 words i.e, Beverages and Commodities
The same goes for Input 2 as well

Input 1 :
Invoices must be paid within 20 days starting with the issue date.

Item Description Quantity Price Per Total

IT Support 1 243864 CAD 243864 CAD

Subtotal: 203220 CAD

Tax: 40644 CAD

Total: 243864 CAD

Input 2 :
Item # Item Description Unit Price Units Total

1 Waste management services 183441 USD 1 183441 USD

Subtotal: 183441 USD

Tax: 36688.2 USD

Total: 220129 USD


Please try this

System.Text.RegularExpressions.Regex.Match(str,"(?<=Total\n+[ \d]*)[A-Za-z ]").Value.Trim

Hope this helps


Yes it works exactly how I wanted,

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.