Extract specific data from txt file to dt

Japp · August 10, 2020, 10:22pm

Hello, I am trying to extract specific data from txt files to a datatable. The txt format is similar to the following, (accounting) but with more lines

"ASSET1 … 001 XXXXXXX
ASSET2 … 002 XXXXX
Hello, im from Spain
I am learning Uipath
ASSET3 … 003 XXX
ASSET4 … 004 (EMPTY)
Hello, im from Spain
I am learning Uipath
LIABILITIES1 … 005 XXXXX
(…) "

I just want to extract the numeric data (XXXX, not 001, 002,…) to a data table. I have thought that I can remove some characters that I don’t need (.) with “replace (”.“,” “)” And remove some lines that contain useless text. (hello, im from …). How could I erase these lines? RemoveAt (3)? The format of all the txt is the same, same lines with the same information each one, except the amounts, which as you can see, when it is = 0, it does not contain quantity. My goal is to transfer the amounts (XXXX) to a table (in lines or columns) to later copy them to an application. I think I have this Ok. Any ideas? Thank you very much!

TimK · August 10, 2020, 10:52pm

Your best bet would be to look at using Regex to extract the numbers you require, as your example always had 3 digits before the values you can use that.

Take a look at the matches activity.

Steven_McKeering · August 10, 2020, 11:17pm

Hello

You can use Regex to solve this.
You can use the Matches Activity with the Regex Pattern:
(\d{3})\s(.*)
Regex101.com - Preview the solution here…

However, if you tell us more about the XXXX then I can improve the Regex Pattern to be more robust.

Is the XXXX always a certain length?
Is it just numbers? - Try this pattern if so “(\d{3})\s(\d+)”
Is it just letters?

If you want to learn Regex check out my Regex New User MEGAPOST. There is even a Regex Demo you can download and try for yourself.

Hopefully this can help

Cheers

Steve

Japp · August 11, 2020, 5:57am

Thanks!!!

A real example down, always are numbers with two decimals (+/-) but i need to extract the “empty boxs” too.

Example of DT i want:

A------------- B
00101 173159,04
00102 0 (or empty cell)
00103
…
only COLUMN B its important, A its not necessary

Example of my txt:

ACTIVO NO CORRIENTE (N, A, P) 00101 173159,04
Inmovilizado intangible (N, A, P) 00102
Desarrollo (N) 00103
Concesiones (N) 00104
Patentes, licencias, marcas y similares (N) 00105
Fondo de comercio (N, A, P) 00106
Aplicaciones informáticas (N) 00107
Investigación (N) 00108
Propiedad intelectual (N) 00700
Otro inmovilizado intangible (N) 00109
Resto (A, P) 00110

Inmovilizado material (N, A, P) 00111 156179,11
Terrenos y construcciones (N) 00112
Instalaciones técnicas y otro inmovilizado material (N) 00113
Inmovilizado en curso y anticipos (N) 00114

Inversiones inmobiliarias (N, A, P) 00115
Terrenos (N) 00116
Construcciones (N) 00117

Inversiones en empresas del grupo y asociadas a largo plazo (N, A, P) 00118
Instrumentos de patrimonio (N, A, P) 00119
Créditos a empresas (N) 00120
Valores representativos de deuda (N) 00121
Derivados (N) 00122
Otros activos ﬁnancieros (N) 00123
Otras inversiones (N) 00124
Resto (A, P) 00125

Inversiones ﬁnancieras a largo plazo (N, A, P) 00126 1223,27
Instrumentos de patrimonio (N, A, P) 00127
Créditos a terceros (N) 00128
Valores representativos de deuda (N) 00129
Derivados (N) 00130
Otros activos ﬁnancieros (N) 00131
Otras inversiones (N) 00132
Resto (A, P) 00133 1223,27

Activos por impuesto diferido (N, A, P) 00134 15756,66
Deudores comerciales no corrientes (N, A, P) 00135

Japp · August 11, 2020, 5:52pm

Hi again!!

I think that i have my pattern OK (thanks to your great megapost) but I can’t get the two groups to “extract” the empty boxes too.
Can you help me with the sequence?

For now I have:
Read PDF Text
Matches
For each (item in matchs)
write line (item.tostring)

my pattern: “00\d{3}\s(\d+)\W(\d{2})”

How about?

Thank you very much for your help!!!

Steven_McKeering · August 11, 2020, 9:45pm

Hey @Japp

I am really happy my MegaPost was able to help you! That was the primary goal. Yay!

Okay - so I believe I have two different options for your Regex pattern. Essentially the + after the “\d” needed to be replaced by a *. The + means there will always be a result (1 or unlimited times). A * means (0 or unlimited times) there might not be a result but could.

Pattern 1:
00\d{3}\s*(\d*)\W(\d{2})

Pattern 2:
00\d{3}\s(\d*)\W(\d{2})

Good luck

Japp · August 11, 2020, 10:30pm

(00\d{3})\s(\d*\W*\d*)

Thats works i think!!! tomorrow i try in UiPath, im on mobile.

Can you help me with sequence to extract to datatable?

A------------- B
00101 173159,04
00102
00103. 6543,56
00104
00105
00106. 363,01

Thanks!!

Japp · August 12, 2020, 7:27pm

Finally…

“(00\d{3})\s(\ - * \d * \ ,* \ d *)”

Its my perfect pattern, but im having difficults to extract all the information to a DT.

Steven_McKeering · August 12, 2020, 8:52pm

Okay, so each set of brackets will be a group. You can use these brackets to differentiate your results when putting them into a datatable.

Watch this tutorial in full. Then use it as a guide.

Good luck

Japp · August 13, 2020, 6:41pm

Basic concepts of data table are Ok in this video, but the problem its extract all matches and split in two columns (group 1 and group 2)…

I see this video too, but nothing… He have one match for each pattern and some patterns, and i have some matches and only one pattern…

system · August 16, 2020, 6:41pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract a specific information from a text file Studio studio , question , activities_panel	5	807	April 28, 2023
Regex Matches Help Studio studio , question , extension	7	1418	January 14, 2022
Need help with extracting data using RegEx Activities datatable , excel , robot , question	23	1983	February 22, 2021
Extracting Particular Data from a .txt file Help activities , question , data_manipulation	6	1332	November 11, 2019
Extracting Data From Notepad(txt) File Activities uiautomation , activities , question	5	741	June 1, 2022

Most Active Users - Yesterday
Anil_G
ashokkarale
Ajay_Mishra
Gautham_Pattabiraman
BHUSHAN_NAGAONKAR1
vrdabberu
ABHIMANYU_THITE1
lrtetala
samantha_shah
shyamala_shyamu
More details...

Extract specific data from txt file to dt

Related Topics