Extract string data from text using regex

07/16/2021 22:32:06 => [Debug] Debug started for file: ExtractPDFdataRegex
07/16/2021 22:32:08 => [Info] Process_Vendor_Invoices_Performer_Project execution started
07/16/2021 22:32:09 => [Debug] Invoice No. 216060 Bulevardul Papetariei 57
Bucuresti, Romania
Date: 2017-07-13

Vendor: Office Supplies

Client Name

ACME Systems Inc.

Somewhere Road 59,

Bucharest, Romania

Notes

Invoices must be paid within 20 days starting with the issue date.

Item Description Quantity Price Per Total

Various paper supplies 1 269033 USD 269033 USD

Subtotal: 224194 USD

Tax: 44838.8 USD

Total: 269033 USD
07/16/2021 22:32:09 => [Info] Process_Vendor_Invoices_Performer_Project execution ended in: 00:00:01

I have a piece of text like that and I just want only to extract (Various paper supplies) using regex as this string dynamically changes but its position fixed

@Mohamed_Abodonia - looks like(Duplicate) below post has the same thing …

In the above post…there were no spaces after the given text…here there are spaces…

so adjusted code as below…use this as shown in the above post…this will work…

  (?<=Total\r?\n\r?\n?)(.+)\s+\d+\s+\d+\s+([A-z]{3})

https://regex101.com/r/GZvr5S/1


it doesn’t return the output string

@Mohamed_Abodonia -I can see 2 spaces before your regex pattern… please look closely…just copy paste from the link I provided above…

image

If you are building the datatable variable like this…then what I gave won’t work…

Use the below one for Description

https://regex101.com/r/21l7nF/1

Currency - use the below one…

https://regex101.com/r/auKL1g/1

Thank you so much <3 final question how can i exctract only total not subtotal

Subtotal: 224194 USD
Tax: 44838.8 USD
Total: 269033 USD

@Mohamed_Abodonia - Please check this…

image

Note: You should uncheck the ignore case if you are using “Matches” activity…

Thank you so much

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.