Need help me with that file to build a good regex plz?

Hello,

Can you help me with that file to build a good regex plz ?

i built a regex for everything except what I have surrounded :

Hello @Soudios To maintain clean learning environment, kindly show us what you have tried so far.

1 Like

@AkshaySandhu

Yes of course, i used that regex:

Contact :
(?<=Contact : ).*

Tél :
(?<=Tél. : )[0-9 ]+

Portable :
(?<=Portable : )[0-9 ]+

Fax :
(?<=Fax : ).*

Email :
(?<=E-mail : ).*

1 Like

Hello @Soudios here you go…
(Tél[\s\S]*?)Historique
use this pattern and let us know if it works for you or not…
if it is not working then please give sample text.
Test.xaml (6.1 KB)

1 Like

Hi @AkshaySandhu
This is not working for me, i want what is after ''Historique"

Hello @Soudios,

Regex for Historique:-
(?=Historique)[A-Za-z\s:0-9()-,]+(?=))

Thanks & Regards,
Raj Parsana

3 Likes

Not working good

This is the sample text :slight_smile:
Ain (01)
22 brasseries

IUT Lyon 1 - Site de Bourg-en-Bresse
Rue Henri de Boissieu 01000 BOURG-EN-BRESSE

Tél. : 04 74 45 52 65 Fax : 04 74 45 52 01
Portable :
Contact : Claude Noel (chef de département de Génie Biologique de l’IUT Lyon 1)
E-mail : iutbourg.bio@univ-lyon1.fr
Site web : iut.univ-lyon1.fr
Historique
Création : septembre 2008
Pas d’historique de production pour cette brasserie

104 Emmanuel Gillard | Projet Amertume (http://projet.amertume.free.fr)La bière en France Edition 2020

du Bugey
Lacoux 01110 HAUTEVILLE-LOMPNES

Tél. : Fax :
Portable : 06 67 36 43 36
Contact : Ludwig de Belvalet
E-mail : ludwig.debelvalet@gmail.com
Site web : la-biere-du-plateau.webnode.fr
www.facebook.com/brasseriedubugey
Historique
Création : mars 2017
La gamme de bières était brassée entre juin 2009 et décembre 2014 par La Bière du Plateau (Hauteville-Lompnes, FR), puis par La
Jacquerie (Conzieu, FR) entre décembre 2014 et mars 2017
Historique de production (1) prévision (2) estimation
2017 : 60 hl
2018 : 72 hl
2019 : 80 hl (1)

1 Like

Hello @Soudios,

Use this (?=Historique)[A-Za-z\n\é\s:0-9 PUT YOUR WORDS HERE ]+(?=))

Thanks & Regards,
Raj Parsana

3 Likes

i can’t put all the words here because there are 1000 pages

1 Like

It’s some different language so put that letters, I am not able to type using my keyboard.

2 Likes

Hello

Have a look at this solution:
(?<=www.facebook.com/.*\nHistorique)[\s\S]+

It will work as long as the “www.facebook.com” is always constant.

1 Like

@Steven_McKeering

Hi,

Thank you for your response but “www.facebook.com" is not always constant.
We need to find something else :slight_smile:

You can find here the output : output.txt (9.9 KB)

Also, the pattern does’nt work for me :

Hello

It won’t work in Regex101.com because its not a perfect match to UiPath’s language. It will work in UiPath :slight_smile:

Hmmm I need more information on the pattern.

What more can you tell me about it?

1 Like

Hello @Steven_McKeering ,

What i need is to extract information from pdf and put it in an excel file
I managed that for now :

and i need now to find a regex to extract : Name / city / adress / Zip Code and Historique

There is some output example from the pdf file : output.txt (9.9 KB)

This is the excel file i want to create : Projet TEST.xlsx (128.9 KB)

Hello

I should be able to help you :slight_smile:

Tell me about the pattern of the text - this will save me time :blush:

1 Like

Can you please find provide the the list of Names you want from the output.txt

Is Name just “Contact”?

I am unsure which is the correct city and address field.
Sample:
(308, rue de Perruet - ZA de la Maladière 01210 ORNEX)

01210 is the Zip code you want yes?

1 Like

Hello @Steven_McKeering
Thank you, you can find below my answer.

For this adress example : 308, rue de Perruet - ZA de la Maladière 01210 ORNEX.

Adress : 308, rue de Perruet - ZA de la Maladière

Zip code : 01210

City : ORNEX

For names, its the names of the company, for example in this picture
Name 1 is : Gessienne SARL STAJAM
Name 2 is : de Grilly

image

1 Like

Hi @Soudios

I believe I have a pattern here with no false positives.
Pattern:
(.*\n.*)\n(.*)\s{2,}(\d{5})\s{2,}\b(.*)

Please let me know how it goes.

2 Likes

Hi @Steven_McKeering
Perfect ! it works !

Now do you know how can i separate the information as i showed you before plz ?

Hey

I have a workflow that splits out the following into string variables for you.

  • Company (Whether 1 or 2 lines)
  • Address
  • Zip
  • City

Main.xaml (21.7 KB)

I am sure there are ‘cleaner’ ways to make this work but this should work fine :blush:

Hopefully this helps :smiley:

1 Like