Extract specific data from txt file to dt

Hello, I am trying to extract specific data from txt files to a datatable. The txt format is similar to the following, (accounting) but with more lines

"ASSET1 … 001 XXXXXXX
ASSET2 … 002 XXXXX
Hello, im from Spain
I am learning Uipath
ASSET3 … 003 XXX
ASSET4 … 004 (EMPTY)
Hello, im from Spain
I am learning Uipath
LIABILITIES1 … 005 XXXXX
(…) "

I just want to extract the numeric data (XXXX, not 001, 002,…) to a data table. I have thought that I can remove some characters that I don’t need (.) with “replace (”.“,” “)” And remove some lines that contain useless text. (hello, im from …). How could I erase these lines? RemoveAt (3)? The format of all the txt is the same, same lines with the same information each one, except the amounts, which as you can see, when it is = 0, it does not contain quantity. My goal is to transfer the amounts (XXXX) to a table (in lines or columns) to later copy them to an application. I think I have this Ok. Any ideas? Thank you very much! :slight_smile:

Your best bet would be to look at using Regex to extract the numbers you require, as your example always had 3 digits before the values you can use that.

Take a look at the matches activity.

2 Likes

Hello

You can use Regex to solve this.
You can use the Matches Activity with the Regex Pattern:
(\d{3})\s(.*)
Regex101.com - Preview the solution here…
image
image

However, if you tell us more about the XXXX then I can improve the Regex Pattern to be more robust.

Is the XXXX always a certain length?
Is it just numbers? - Try this pattern if so “(\d{3})\s(\d+)”
Is it just letters?

If you want to learn Regex check out my Regex New User MEGAPOST. There is even a Regex Demo you can download and try for yourself.

Hopefully this can help

Cheers

Steve

2 Likes

Thanks!!!

A real example down, always are numbers with two decimals (+/-) but i need to extract the “empty boxs” too.

Example of DT i want:

A------------- B
00101 173159,04
00102 0 (or empty cell)
00103

only COLUMN B its important, A its not necessary

Example of my txt:

ACTIVO NO CORRIENTE (N, A, P) 00101 173159,04
Inmovilizado intangible (N, A, P) 00102
Desarrollo (N) 00103
Concesiones (N) 00104
Patentes, licencias, marcas y similares (N) 00105
Fondo de comercio (N, A, P) 00106
Aplicaciones informáticas (N) 00107
Investigación (N) 00108
Propiedad intelectual (N) 00700
Otro inmovilizado intangible (N) 00109
Resto (A, P) 00110

Inmovilizado material (N, A, P) 00111 156179,11
Terrenos y construcciones (N) 00112
Instalaciones técnicas y otro inmovilizado material (N) 00113
Inmovilizado en curso y anticipos (N) 00114

Inversiones inmobiliarias (N, A, P) 00115
Terrenos (N) 00116
Construcciones (N) 00117

Inversiones en empresas del grupo y asociadas a largo plazo (N, A, P) 00118
Instrumentos de patrimonio (N, A, P) 00119
Créditos a empresas (N) 00120
Valores representativos de deuda (N) 00121
Derivados (N) 00122
Otros activos financieros (N) 00123
Otras inversiones (N) 00124
Resto (A, P) 00125

Inversiones financieras a largo plazo (N, A, P) 00126 1223,27
Instrumentos de patrimonio (N, A, P) 00127
Créditos a terceros (N) 00128
Valores representativos de deuda (N) 00129
Derivados (N) 00130
Otros activos financieros (N) 00131
Otras inversiones (N) 00132
Resto (A, P) 00133 1223,27

Activos por impuesto diferido (N, A, P) 00134 15756,66
Deudores comerciales no corrientes (N, A, P) 00135

Hi again!!

I think that i have my pattern OK (thanks to your great megapost) but I can’t get the two groups to “extract” the empty boxes too.
Can you help me with the sequence?

For now I have:
Read PDF Text
Matches
For each (item in matchs)
write line (item.tostring)

my pattern: “00\d{3}\s(\d+)\W(\d{2})”

How about?

Thank you very much for your help!!!

2 Likes

Hey @Japp

I am really happy my MegaPost was able to help you! That was the primary goal. Yay! :blush: :partying_face:

Okay - so I believe I have two different options for your Regex pattern. Essentially the + after the “\d” needed to be replaced by a *. The + means there will always be a result (1 or unlimited times). A * means (0 or unlimited times) there might not be a result but could.

Pattern 1:
00\d{3}\s*(\d*)\W(\d{2})

Pattern 2:
00\d{3}\s(\d*)\W(\d{2})

Good luck :slight_smile:

(00\d{3})\s(\d*\W*\d*)

Thats works i think!!! tomorrow i try in UiPath, im on mobile.

Can you help me with sequence to extract to datatable?

A------------- B
00101 173159,04
00102
00103. 6543,56
00104
00105
00106. 363,01

Thanks!!

Finally…

“(00\d{3})\s(\ - * \d * \ ,* \ d *)”

Its my perfect pattern, but im having difficults to extract all the information to a DT.

Okay, so each set of brackets will be a group. You can use these brackets to differentiate your results when putting them into a datatable.

Watch this tutorial in full. Then use it as a guide.

Good luck :blush:

1 Like

Basic concepts of data table are Ok in this video, but the problem its extract all matches and split in two columns (group 1 and group 2)…

I see this video too, but nothing… He have one match for each pattern and some patterns, and i have some matches and only one pattern…

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.