Help with RegEx on text extracted from a PDF

Hi there!

I need some help with some RegEx!

I have tried using both ChatGPT and Microsoft Copilot, without much progress!

I extract a table from multiple pages in a PDF-file. The biggest issue I have, is that this PDF-file varies from run to run, and sometimes it can vary multiple times in a single run, as it gets new PDF files every day…

Let me give an example of the extracted data:
@"ID: 86845
Packing List Date: 20-03-2024
Xxxxx Laboratory Ltd
Courier: XXX Bag No: XXXXXXXX-XXXX

Client Y- XXXXXXXXX

Xxxxxxx XX DK-XXXX City

DK-XXXX City

Dentist Y- XXXXXXXXXXXXX

‘#’ Case ID Patient

1 869558 197905-226

2 871023 198103

3 871025 198083

4 871061 198123-210

5 871446 198237

6 871749 198194-43

7 871880 198315-9

8 872200 198374

9 872361 198435-254

10 872442 198434-226

11 872672 198509-49

12 873015 198639-89

13 873019 198644-210

14 873020 198645-223

15 873021 198647-6

16 873024 198658-259

17 873032 198693-248

18 873036 198689-246

19 873037 198688-201

20 873038 198687-248

21 873062 198623-75

22 873063 198616-262

23 873064 198618-236

24 873065 198620-236

25 873068 198617-59

26 873071 198652-210

27 873121 198676-4

28 873122 198675-246

29 873123 198674-246

30 873124 198673-246

31 873125 198672-246

20/03/2024 Page 1/4Client Y- XXXXXXXXXX

Xxxxxxxx XX DK-XXXX XXXXX

DK-XXXX Xxxxxx

Dentist Y- XXXXXXXXXXX

‘#’ Case ID Patient

32 873126 198671-246

33 873127 198670-248

34 873129 198624-4

35 873131 198679-246

36 873132 198678-4

37 873179 198562

38 873181 198566

39 873183 198528

40 873184 198519

41 873185 198483

42 873190 198592

43 873191 198598

44 873192 198556

45 873193 198588

46 873195 198582

47 873196 198574

48 873199 198589

49 873200 198603

50 873201 198538

51 873202 198564

                                          ...

It is quite tough to give a much better example as most of the places where I have “XXXXX” etc is company name and adress… But I think it gets the point across?? If not, feel free to tell me so :slight_smile:

can you brief what do you want to extract from the above text using RegEx?

Yes of course! The data I want to extract is everything else than Stuff such as "@"ID: 86845
Packing List Date: 20-03-2024
Xxxxx Laboratory Ltd
Courier: XXX Bag No: XXXXXXXX-XXXX

Client Y- XXXXXXXXX

Xxxxxxx XX DK-XXXX City

DK-XXXX City

Dentist Y- XXXXXXXXXXXXX",

So the only data I want is: "1 869558 197905-226

2 871023 198103

3 871025 198083

4 871061 198123-210

5 871446 198237

6 871749 198194-43

7 871880 198315-9

8 872200 198374

9 872361 198435-254

10 872442 198434-226

11 872672 198509-49

12 873015 198639-89

13 873019 198644-210

14 873020 198645-223

15 873021 198647-6

16 873024 198658-259

17 873032 198693-248

18 873036 198689-246

19 873037 198688-201

20 873038 198687-248

21 873062 198623-75

22 873063 198616-262

23 873064 198618-236

24 873065 198620-236

25 873068 198617-59

26 873071 198652-210

27 873121 198676-4

28 873122 198675-246

29 873123 198674-246

30 873124 198673-246

31 873125 198672-246"

Try to use the below pattern in RegEx

^\d+\s.*

Would you use a match or replace?

RegEx.xaml (9.2 KB)

Please refer the attached workflow
Below is the result of the workflow

1 Like

Thank you so much! I have been battling this issue for 2 hours now… What a great help :smiley:

Again thank you soo much, have a blessed day!

Best regards,
Martin

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.