Extract line from pdf with regex

help!

Hello

Modify your syntax to remove the last “()”

I have changed the expression, but if I want to show the result in the message box, it returns the same expression, as if it were text … I have tried to include it in the datatable and it also gives an error, I will be doing something wrong … but I don’t know what

Got a screenshot?

Thanks for the help, if with the capture you are able to see my error … since I cannot locate it

Hello

I can’t see anywhere you are using Regex.

I see the Regex patterns but they need to be applied using a Matches activity.

I strongly recommend reading section 3 of my Regex Megapost.

So insert a Matches activity.
Here is your Pattern:
([\d.]+)\s([\d.]+)\s([\d.]+)\s([\d.]+)\s([\d.]+)
Here is a preview

The input will be your raw text.
Then you will need to assign each group from your result accordingly (I hope I have not misread any of your previous posts).

Hopefully this helps you :blush:

Maybe it is an organization problem, matches I use it later since I enter the expressions in a table and then the regex commands

@joseantonio - is it possible to share your xaml? I will like to take a look and chk if I can fix it.

1 Like

Main.xaml (22.6 KB)

I am attaching the file, because I still don’t “understand” the error … I’m still starting with uipath, but I still don’t see the problem.

@joseantonio – I am really sorry, I am not able to follow your approach. Here is the sample workflow I recently developed, see if it helps. RegEx_ExtractFromInvoices.zip (193.9 KB)

You can’t add code to the Regex (String) column. Everything you enter there will be treated as string instead of code.

I would suggest that you add an extra column called Group. Then you can check in the for each loop if row(“Group”) is empty or not. If empty, assign row.Item(“Value”) the full matched value. If not, assign the matched group value instead.

but instead in another example that I have if it works assigning the regex expression as a string … what datatable format do I assign to the regex field?

Regular expressions are just strings so it’s fine to add them to the Regex column.

(?<=EUR)\n\n(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.*?(?=\s))

But varInf(0).Groups(3).ToString is code, and that is the part you can’t put in the datatable.

Okay, so I add the entire expression to the regex cell, and then what do I call that specific group of data?

In the for each loop:

You will need an If activity to check when to assign the full match and when to assign the group match (hence the need of a Group column in the datatable).

but even though I call the group manually, I never get to show it or isolate it

Did you check the group in your match?

ienMatch(0).Groups(3).ToString()

Here’s an example: RegexGroupTest.xaml (11.2 KB)

The column Group has been added with Int32 as type and default value 0. Groups(0) returns the full match, so let it be 0 unless you want a specific group.

image

image

1 Like

Thanks, I’ll review it and tell you if it served me

1 Like

Hello Ywen,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal

Code:

Thanks,
Cristian Negulescu