Regex to extract specific sentence

image
how do i extract the first sentence “Tea leave supplier Pte Ltd” using regex
Thanks.

Hii @zumaho

Is that the formate is same for all the time

yes, all pdf files are the same

HI @zumaho

How about this expression?

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=INVOICE\n)\S.*").Tostring

image


its not working , i just need the regex code to put into the data table as shown in this video for the company name https://youtu.be/uCdBC2pXPyY

Share this text here? @zumaho

12/05/2022 20:56:00 => [Debug] Execution started for file: Sequence2
12/05/2022 20:56:01 => [Info] UI PATH PROJ execution started
12/05/2022 20:56:03 => [Debug] INVOICE

Tea Leave Supplier Pte Ltd

Blk 120 Bedok Avenue

Singapore 500120

Tampines Bubble Tea Pte Ltd Date: 10/1/2022

Blk 772 Tampines St 71 Invoice No.: TLS162

Singapore 520772 Your PO No.: 316

S/No. Description Qty Unit Price Total ($)
($)
1 Apple Flower Tea Leave (200g) 2 18 36
2 Phnom Penh Rose Tea Leave 2 25 50
(100g)

Amount Due 86

Please pay within 30 days

Thank you.

< This is a computer generated invoice which does not require any signature>
12/05/2022 20:56:03 => [Info] UI PATH PROJ execution ended in: 00:00:02

Hi @zumaho

Try with this (?<=INVOICE)\n+.*

Then use trim to remove extra spaces

cheers

Hi @zumaho

Try with this expression

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=INVOICE\n\n)\S.*").Tostring

image

As Windows can have Linebreaks like \r\n we can do

"(?<=INVOICE[\r\n]{1,}).*"
"(?<=INVOICE[\r\n]+).*"