Regex Extractor and new line

So, I have this Document Text that I’m trying to extract data from. Text contains order lines looking something like this:

When testing my regex in regexr (and others), it goes fine:

And when I build the expression in Studio’s Regex Builder, everything looks good, too:

Problem is… nothing after the line break gets extracted and saved to my results.

Anyone with any ideas?

@jjes

You have a property called regex options…set that to multiline. That should solve

Currently as per screenshot i see the option as ignore case…that is where you will set multiple line

Cheers

Hi,

It might be linebreak matter.

Can you try to use (\r?\n) instead of \n ?

Regards,

1 Like

Tried it. No difference. But thanks :slight_smile:

Nope, same result. But thanks :slight_smile:

Hi,

Can you share your input text and current pattern as text file?

Regards,

HI @jjes

Try this pattern once

(?m)^\d+.*(.*?)Your articleno.:\s\d+

or

(?m)^\d+.*\r?\n(.*?)Your articleno.:\s\d+

Regards
Sudharsan

Pattern:
\d+\s+.{1,10}\s+(.+)\s+\d{2}-\d{2}-\d{2}.+\d+,\d{2}(.+)((?:.))(\r?\n?)(.)

Input:

Yada,yada…

1 BB22B-D Motor, 10Nm, 20-12-21 11 200,15 190,10 * 2.091,10 on/off/1-pkt., dim 8x8, 150s.
Your articleno.: 225521

2 BM21-a11.2 Rotor, 4Nm, on/off, 20-12-21 5 200,00 150,00 * 750,00 multipak, dim 8x8, spring-return.
Your articleno.: 443212

bla…bla…

Hi,

Can you try the following pattern?

 \d+\s+.{1,10}\s+(.+)\s+\d{2}-\d{2}-\d{2}.+\d+,\d{2}(.+)((?:.)*)((\r?\n)*)(.*)

Regards,

Hi,

FYI, I’ll attach the above sample as the following.

Sample20220117-4L.zip (2.9 KB)

Regards,

So strange, still nothing from the second line of each order.

Hi,

Does the above sample : Sample20220117-4.zip work in your environment?
If yes, probably input string is something different with the above sample.

Can you share input text as a file using WriteTextFile activity?

Regards,

Yes, your example spits out both lines.
I have attached the text file from the Digitize step, but anonymized it a bit. :slight_smile:

input.txt (265 Bytes)

HI,

In my environment, it also works as it is even if input is the above input.txt.

Sample20220117-4Lv2.zip (3.2 KB)

Is there any difference b/w the above and your workflow?

Regards,

If I use the regex from your example on that same text in my automation, the last line is still not saved in my result (i export the dataset by iterating through the Tables collection and then doing a Write Range on each one). So that is very, very strange.

What is also odd, is that if I check the ExtractionResult object’s ResultsDocument->Fields->Raw View->(3) I can see that the last line is actually NOT in the ExtractionResult, which is very, very strange, considering I’m using the same Regex on the same text target as in your example.

I’m puzzled.

No luck with these, but thanks for trying :slight_smile:

@jjes

Can you please try this

.*\n(?=Your articleno\.: \d{6}).*

Uploading: A8311581-A271-41C4-87C1-1557126F3E55.jpeg…

Cheers