Line breaks in Windows usually consist of \r\n but not all text files are formatted like that. Some files just have \n as line break. So it’s better to make \r optional by using the question mark operator.
Try this:
(?<=^Vat-nr\.(.|\r|\n\S)+(\r?\n){2})(.+\r?\n.+)
It’s using “Vat-nr.” as anchor and takes the two lines after the “Vat-nr”-paragraph.
Thank you, but my input is not as shown, but
“Vat-nr. 457674456\r\nDepartment 4500 Pepsi Co\r\nEmployeeno. 6000 Avenue 5\r\nDate 18-11-11 Newark\r\n\r\nAccountant\r\nJohn Doe Smith\r\nTest Road 2\r\n1234 Test Town\r\n\r\nblabla\r\nblabla\r\nblabla\r\nblabla\r\n\r\nblablabla”
Thanks for really helping and I think this is the right way to go. It doesnt work. I read my input from a PDF (text) and when I do
inputText = inputText.Replace(“\r\n”, Environment.NewLine)
it will still show up with the \r and \n in debug mode
Ignore the output in Debug mode, it’s showing in C# expression so seeing \r\n is normal. Test to print it to the console using Write Line or show it in a Message Box instead.
Hey ptrobot. Your solution works perfectly, if I read the pdf text into a write line, copy it to a notepad or regex101 and copy it back into my inputText string variable.
But if I read directly from the pdf into the variable and then do the regex expression on it, it won’t work.