Extract line from pdf with regex

good morning, I’m trying to regex the following string: 104.99 0.00 0.00 0.00 0.00

from the text that I put below, but I can’t find the way or for some reason I can’t make it work, what can I do?

Base Imponible +LPI IVA Cuota IVA Recargo Cuota Recargo Importe Total: 104,99 EUR

104,99 0,00 0,00 0,00 0,00

Forma de pago: RECIBO DOMICILIADO A 20 D-B2B Vencimientos: 26/11/2020 104,99 EUR

Hi Joseantonio,

You can use (?<=EUR)\s*\d.+ Here you have to keep in mind that EUR is used as anchor point. If it is changing paste that data to find a better regex.

Thanks for your help, I have managed to isolate the content I want by groups, my question now is the following, for example if I wanted only the third group, what would the expression be? I attach an example of regex to capture that line by groups

(?<=EUR)\n\n(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.*?(?=\s))

link https://regex101.com/r/cO8lqs/23674

Hello

To get group 3.

Try using an assign Activity.

YourVariable = REGEXRESULT(0).Groups(3).Tostring

Update capital letters from Matches result.

If yo I want to learn Regex - check out my Regex Megapost

1 Like

thanks

I usually use regex to extract the information and attach it to the datatable, but this is the first time that I have come across this complexity … once the groups are extracted, how do I add them to the separate datatable?

Hello

Check out this video:

thanks for the video, but I can’t see how to convert those groups into variables … to add to the data table that I have already created for other documents

variable = (?<=EUR)\n\n(. ?(?=\s))\s(. ?(?=\s))\s(. ?(?=\s))\s(. ?(?=\s))\s(.*?(?=\s)).(0).Groups(3).Tostring

the example would be like this?

Maybe I did not explain it well, I try to create a datable with the regex expressions I am attaching the example, so I would like to know how to put group 0, group 1 in the expression …

test

Sorry - wrong video.

Try this one:

You will need to obtain group 3 first.

Essentially you will use an assign activity with something like this:
Row.item(“Value”).tostring = Group3

Unfortunately that is not working, since I store the entire chain with all the groups … and when assig is a loop it doesn’t work, the chain is the following

(?<=EUR)\n\n(. ?(?=\s))\s(. ?(?=\s))\s(. ?(?=\s))\s(. ?(?=\s))\s(.*?(?=\s))

There is no expression so that in the same Regex chain, I get the group that I want?

Hello

Are you able to obtain Group 3 by itself?

You will have to conduct an interim step to get the data ready for processing.

Maybe you are using the wrong expression, in the link below, the expression I use appears,

1st I use matches and I introduce the expression in pattern, and I put an output variable, then I use a message box with the following MyVariable (0) .Groups (3) .toString ()

Perhaps it is the expression? Despite the fact that on the website that I attach the groups perform well for me?

help!

Hello

Modify your syntax to remove the last “()”

I have changed the expression, but if I want to show the result in the message box, it returns the same expression, as if it were text … I have tried to include it in the datatable and it also gives an error, I will be doing something wrong … but I don’t know what

Got a screenshot?

Thanks for the help, if with the capture you are able to see my error … since I cannot locate it

Hello

I can’t see anywhere you are using Regex.

I see the Regex patterns but they need to be applied using a Matches activity.

I strongly recommend reading section 3 of my Regex Megapost.

So insert a Matches activity.
Here is your Pattern:
([\d.]+)\s([\d.]+)\s([\d.]+)\s([\d.]+)\s([\d.]+)
Here is a preview

The input will be your raw text.
Then you will need to assign each group from your result accordingly (I hope I have not misread any of your previous posts).

Hopefully this helps you :blush:

Maybe it is an organization problem, matches I use it later since I enter the expressions in a table and then the regex commands