Get data from read pdf output

"Invoice No. 221944 \r\n \r\nAleea Rozelor 84\r\nIasi, Romania\r\n \r\nDate: 2017-08-23\r\n \r\nVendor: Star Software\r\n \r\n \r\nClient Name \r\nACME Systems Inc. \r\nSomewhere Road 59, \r\nBucharest, Romania \r\n \r\nNotes\r\n \r\n \r\nInvoices must be paid within 20 days starting with the issue date.\r\n \r\n \r\nItem Description Quantity Price Per Total \r\nProfessional Services 1 37412.4 EUR 37412.4 EUR\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\nSubtotal: 31177 EUR\r\n \r\nTax: 6235.4 EUR\r\n \r\n \r\nTotal: 37412.4 EUR\r\n \r\n \r\n "

here is my read pdf activity output now I want to extract Invoice No., Date, Total so how can I get it?

for the invoice no. a simple regular expression like: \d+ will work.
for the date: (\d{4}-\d{2}-\d{2})
for the total: (?<=Total: )(\d+\.\d{1})(?= )

1 Like

To expand on bcorrea’s answer, you can use the “Matches” activity in the Programming → String section of the activity panel.

See attached screenshots for details.


but this would return a lot of matches and he would be kind of lost… so he would be better with assign activities like this:
Dim m As Match = Regex.Match(value, "\d+", RegexOptions.IgnoreCase)

1 Like

Personally I would use the following expression to get the value between “Invoice No.” and the space following the invoice number:

(?<=Invoice No. ).*?(?=\s)

you could but you dont need, cause there is no other numeric value there without decimal positions…

That’s very true for the example string they provided, but it’s useful for this solution to be posted as well in case someone else comes along with a similar question and has a more robust data set.

This way we’re not making any assumptions about the number formats following the invoice number.

and also that would get something like:
Invoice No. NOT FOUND \r\n \r
so could be bad in some cases too…

1 Like

That could definitely be the case if an invoice number has a space in it.

The regex would definitely need to be modified if that were the case.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.