I need to extract Text using regex from ocr read pdf text output activity

From the Below Text

C\r\nHUMIDITY: 35.3 %rh\r\nSERVICE ORDER NUMBER: 3120\r\nCUSTOMER INFORMATION\r\nFRESENIUS MEDICAL CARE NORTH\r\nAMERICA\r\nREPORT NUMBER:1111

I need to extract Customer information
Output should be-FRESENIUS MEDICAL CARE NORTH AMERICA

please help me with solution

Thanks
Likitha

Hi @vinjam_likitha

You can use the below regular expression,

(?<=CUSTOMER INFORMATION\\r\\n)[A-Za-z\s]+(?=\\r\\n)

Check the below steps,

- Assign -> Input = "C\r\nHUMIDITY: 35.3 %rh\r\nSERVICE ORDER NUMBER: 3120\r\nCUSTOMER INFORMATION\r\nFRESENIUS MEDICAL CARE NORTH\r\nAMERICA\r\nREPORT NUMBER:1111"

- Assign -> Output = System.Text.RegularExpressions.Regex.Match(Input.tostring, "(?<=CUSTOMER INFORMATION\\r\\n)[A-Za-z\s]+(?=\\r\\n)").Value.toString

Hope it helps!!

@vinjam_likitha

(?<=CUSTOMER INFORMATION\\r\\n)[A-Z a-z]+

Some times the Special charectors in between

FRESENIUS MEDICAL CARE NORTH(San Jose)

How to write in that case

@vinjam_likitha

(?<=CUSTOMER INFORMATION\\r\\n)[\s\S]*?(?=\\r\\n)

Okay @vinjam_likitha

A small modification in Regex, the below expression will extract if any additional special characters added to it,

(?<=CUSTOMER INFORMATION\\r\\n).*(?=\\r\\n[A-Z]+\\r\\n)

Check the below workflow for better understanding,

- Assign -> Input = "C\r\nHUMIDITY: 35.3 %rh\r\nSERVICE ORDER NUMBER: 3120\r\nCUSTOMER INFORMATION\r\nFRESENIUS MEDICAL CARE NORTH (San Jose)\r\nAMERICA\r\nREPORT NUMBER:1111"

- Assign -> Output = System.Text.RegularExpressions.Regex.Match(Input.tostring, "((?<=CUSTOMER INFORMATION\\r\\n).*(?=\\r\\n[A-Z]+\\r\\n)").Value.toString

Hope it helps!!

Hie @vinjam_likitha i’m tagging somescreeshot for the answer hope it will help you to your desired output . if you find that help you mark it as solution.
cheers
INPUT -


RegEX

RegEX Output
image
Remove middle unneccessary Word
image
This split function give the desired result you are trying for

Final output result
image