Extract PDF data of different files in different colums of dt

Im triying to extract PDF data (two Groups, regex matches) from different files and I look for the following result:

A-B- C-D -E-F
1-2- 1-2- 1-2
1-2- 1-2- 1-2
1-2- 1-2- 1-2

I create DT with 2 colums, and use for each file in path> matches> add data row, and write range… but the extraction give me all the extract data of each file in the same column:

1-2 (File 1)
1-2 (File 1)
1-2 (File 1)
1-2 (File 2)
1-2 (File 2)
1-2 (File 2)
1-2 (File 3)
1-2 (File 3)
1-2 (File 3)

where is the problem?


Hi @Japp

What does this A-B, 1-2 means ?

May I know

Nived N :robot:

1 Like

A-B are the DT columns.

1 and 2 are matches, (Group 1 and group 2).

Now all matches of all files are on columns AB, and i want:

A: Group 1 File 1
B: Group 2 File 1
C: Group 1 File 2
D: Group 2 File 2

thus for an indeterminate number of files


Try to use add data column instead of add data row. @Japp

1 Like

Try to provide screenshot or xaml code to look in to it.

1 Like

@NIVED_NAMBIAR not works :frowning:

Xaml Code, thanks!

EXTRACT DATA.xaml (9.0 KB)

Could you check the xaml code?

Please…help me :frowning:

I need to extract the data of each file in different rows o columns…
I have tried many things but I can’t.

Thanks for the Xaml @Japp. I am not understanding your requirement correctly. Let me re-iterate with ur code flow. You have - say 4 files on the “C:\Pdfs” folder. You are taking each file and checking the range 4-9 and matching with regular expressions and getting a group. For each group, you are iterating the group and adding some two strings in DT. After this, you are writing in a sheet.

Not sure what you are trying to achieve here… Can you please explain in details with examples? if possible attach some sample PDF’s which you have worked on, so that i can debug ur code.

This is a part of the process, once I extract the data and give it the desired format, it must be written in a web application. Each pdf contains data of a client and they have to be copied in the file of each client in the application, that is why I need them to be differentiated in some way to transfer them later.
My idea was to extract them in rows or columns to give the order to the robot that for each one of them it would introduce it in the client file.

Perhaps I have misplaced the automation process …


this is a part of a pdf. The 00xxx boxes are always the same and the amounts ( 173159,04 for example) are the ones that must be copied later …

ACTIVO NO CORRIENTE (N, A, P) 00101 173159,04
Inmovilizado intangible (N, A, P) 00102
Desarrollo (N) 00103
Concesiones (N) 00104
Patentes, licencias, marcas y similares (N) 00105
Fondo de comercio (N, A, P) 00106
Aplicaciones informáticas (N) 00107
Investigación (N) 00108
Propiedad intelectual (N) 00700
Otro inmovilizado intangible (N) 00109
Resto (A, P) 00110

Inmovilizado material (N, A, P) 00111 156179,11
Terrenos y construcciones (N) 00112
Instalaciones técnicas y otro inmovilizado material (N) 00113
Inmovilizado en curso y anticipos (N) 00114

Inversiones inmobiliarias (N, A, P) 00115
Terrenos (N) 00116
Construcciones (N) 00117

Inversiones en empresas del grupo y asociadas a largo plazo (N, A, P) 00118
Instrumentos de patrimonio (N, A, P) 00119
Créditos a empresas (N) 00120
Valores representativos de deuda (N) 00121
Derivados (N) 00122
Otros activos financieros (N) 00123
Otras inversiones (N) 00124
Resto (A, P) 00125

Inversiones financieras a largo plazo (N, A, P) 00126 1223,27
Instrumentos de patrimonio (N, A, P) 00127
Créditos a terceros (N) 00128
Valores representativos de deuda (N) 00129
Derivados (N) 00130
Otros activos financieros (N) 00131
Otras inversiones (N) 00132
Resto (A, P) 00133 1223,27

Activos por impuesto diferido (N, A, P) 00134 15756,66
Deudores comerciales no corrientes (N, A, P) 00135