Extract line from pdf with regex

@joseantonio - is it possible to share your xaml? I will like to take a look and chk if I can fix it.

1 Like

Main.xaml (22.6 KB)

I am attaching the file, because I still don’t “understand” the error … I’m still starting with uipath, but I still don’t see the problem.

@joseantonio – I am really sorry, I am not able to follow your approach. Here is the sample workflow I recently developed, see if it helps. RegEx_ExtractFromInvoices.zip (193.9 KB)

You can’t add code to the Regex (String) column. Everything you enter there will be treated as string instead of code.

I would suggest that you add an extra column called Group. Then you can check in the for each loop if row(“Group”) is empty or not. If empty, assign row.Item(“Value”) the full matched value. If not, assign the matched group value instead.

but instead in another example that I have if it works assigning the regex expression as a string … what datatable format do I assign to the regex field?

Regular expressions are just strings so it’s fine to add them to the Regex column.

(?<=EUR)\n\n(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.?(?=\s))\s(.*?(?=\s))

But varInf(0).Groups(3).ToString is code, and that is the part you can’t put in the datatable.

Okay, so I add the entire expression to the regex cell, and then what do I call that specific group of data?

In the for each loop:

You will need an If activity to check when to assign the full match and when to assign the group match (hence the need of a Group column in the datatable).

but even though I call the group manually, I never get to show it or isolate it

Did you check the group in your match?

ienMatch(0).Groups(3).ToString()

Here’s an example: RegexGroupTest.xaml (11.2 KB)

The column Group has been added with Int32 as type and default value 0. Groups(0) returns the full match, so let it be 0 unless you want a specific group.

image

image

1 Like

Thanks, I’ll review it and tell you if it served me

1 Like

Hello Ywen,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

2:00 GitHub free code for all the files
2:20 Logic of general workflow
4:40 File 1 simple PDF
9:50 File 2 PDF with a column with multiple lines
20:10 File 3 PDF with a column with multiple words ON the LAST column
27:00 File 5 PDF with a column with multiple words ON inside column (2 columns)
31:40 File 6 PDF with a column with multiple lines
39:10 File 8 simple PDF
42:15 File 9 PDF with multiple spaces on that need to be correct
45:50 File 10 PDF with multiple columns that have multiple lines + multiple pages
55:50 File 11 simple PDF with protection empty Cells
58:35 File 12 Big PDF with an empty line and Empty columns and partial total
1:02:25 File 13 PDF with multiple columns that have multiple words and hard to define a rule
1:10:15 File 15 PDF with multiple columns that have multiple lines
1:12:50 File 17 simple PDF remove spaces from headers also remove space from Data
1:16:05 File 18 simple PDF
1:17:10 File 19 PDF with multiple pages and columns with multiple lines
1:22:10 File 20 PDF with multiple columns that have multiple lines
1:25:00 File 21 PDF with empty columns and subtotal

Code:

Thanks,
Cristian Negulescu

First of all thanks, the example works for me, but unfortunately they have removed a space, and that gives me an error, I attach the text and the regex that I try to do,
output_infor.txt (280 Bytes)

In the photo that I attach, the regex code is taking a whole space first (as in the photo) and I think that gives me an error in assign

I attach the text again with the modification that gives me an error, thanks

Hi …@joseantonio …Please check this…

I have the same problem …

What problem? @joseantonio

I just gave you the regex code for the new text

I keep getting that blank

I am attaching the main file … I do not understand what happens, thanks for the attentionMain.xaml (14.4 KB)

@joseantonio - Is this the output you are looking for ?

Here you go: RegEx_Groups.zip (35.6 KB)

Yes, that’s the way out I’m looking for, what change?