Regex, extraction problem

@joseantonio - i don’t think without groups you can’t capture all the amount in one shot…You may need 6 different regex for your case

If your pdfs follows the same pattern…you can get one regex pattern and write all the amount in shot in 6 different column in excel easily…

Yes, I want to use 6 different expressions, my problem is that trying to capture the first amount, it takes everything until the end of the text

@joseantonio - Won’t this serve the purpose…

835,80 - Regexvar(0).groups(1).tostring
21,00 - Regexvar(0).groups(2).tostring
175,52- Regexvar(0).groups(3).tostring
0,00 - Regexvar(0).groups(4).tostring
0,00 - Regexvar(0).groups(5).tostring
1.011,32 - Regexvar(0).groups(6).tostring

@joseantonio - you can get the amounts like this…

but what would be the expression to capture only 1,001,32 for example?

I would like to do nothing groups and do 6 different … but they do not work for me

This is what you asked initially?

Now this??

Your goal is extract all the 6 amounts right??? i am saying you can do with groups…and you just have to get the different groups to print the required amounts…i also shared the example above…you dont want to do in the effective way??

It may not be the best way to do it with 6 different expressions, but it is the only way to unify all the different types of documents that I have, that’s why I was asking,

Sorry, i didnt get you??

Your documents have same text pattern right?? I would like to see…could you please share the couple of more text formats like the one you shared above…i will put it in the text file and show you how to handle multiple documents…

Regex groups or 6 different pattern – does not matter , if your input text doesnt follow the same pattern Regex won’t work even if I give 6 different regex…that’s my point…hope you get it…

I understand you, but the same with the translator I think that I am not understood, the project that I have depending on the document, uses several regex expressions to extract data, with all the documents I have not had a problem, it extracts the regex expression from a database without problem, I use 6 expressions for each document, and so I always make sure of the same structure, the problem, I have it in the text that I have passed before, for some reason I am unable to create 6 expressions to follow the same process, that’s why help. I don’t know if you understand me … still thank you

1 Like

@joseantonio - sorry didn’t follow you…

But I will give you 6 pattern once I am back…thanks

I apreciate it, thanks

@joseantonio - Here you go…

835,80 - Link
21,00 - Link
175,52- Link
0,00 - Link
0,00 - Link
1.011,32 - Link - (Assumtion on this pattern: there will be always a date before this field, if not it won’t work)

Hope this helps…Let me know, if it didn’t work any other texts(same pattern)…I will happy to assist…

Thanks to your help, I have finally found why this document gives me errors, maybe you know a solution to this, I explain

If you look at text one after “Total Amount:” only “EUR” appears, while in text 2, an amount appears in the same place,

My question is, following the regex expressions that you attach to me, how can we adapt those regex expressions to this problem?

text1:

Base Imponible +LPI IVA Cuota IVA Recargo Cuota Recargo Importe Total: EUR 835,80 21,00 175,52 0,00 0,00 Forma de pago: RECIBO DOMICILIADO A 20 D-B2B Vencimientos: 06/04/2021 1.011,32 EUR

text 2

Base Imponible +LPI IVA Cuota IVA Recargo Cuota Recargo Importe Total: 261,92 EUR 216,46 21,00 45,46 0,00 0,00 Forma de pago: RECIBO DOMICILIADO A 20 D-B2B Vencimientos: 05/04/2021 261,92 EUR

Thanks for all

@joseantonio - So in the first case, you have to extract 835,80 and in the second 261,92??

if yes, here is how you can try…

Hope this helps…

Question: In the first text, you have only 5 set of amounts, whereas in 2nd you have 6…i am not sure, how you are mapping…

I need to omit in the regex expressions the amount that comes out after the total amount: so these expressions would be valid for both cases

because in the first text an amount appears at the beginning that I want to omit, I really don’t know if I explain myself? I would like to omit the amount between the “Importe total:” and "EUR "and after that omission it should work or the amount does not appear

I think i got it…you want to capture only after EUR…5 amounts…

I have adjusted the Regex patterns and updated the links below…

835,80 - Link
21,00 - Link
175,52- Link
0,00 - Link
0,00 - Link
1.011,32 - Link - Same as above…

I have tested the above pattern including your second text…it works on both…

It is perfect, it works correctly, thank you very much, I have more doubts about this but I will open other posts

1 Like

@joseantonio - Glad it worked…

If it relates to the same topic(ie same extraction problem), you can ask here…

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.