How to retrieve a column from a data table in PDF format?

Hello guys, I need to retrieve the information from a PDF file which contains a data table. I need the information from the column “Nro. Operac.” . I have seen that there are some advices using Regex expressions or something like that but I do not know if the numbers will always start with “164” and apart from that there are also another columns with numbers.

Hi @ahzaradsh,
Could you please check below comment i think it will help you please try and update us back if you still face issue :slight_smile:

Hi,

The Nro Operac. is numeric pattern ? If so, you can easily retrieve using regex. Please, provide this table in string format get using Read PDF Text activity (belongs to UiPath.PDF.Activities LIB)

I can not use a read PDF activity since I do not have the file. I.e it is a document opened from an application (I think I need to use screen scrapping and regex but I have already done so but it does not work like I want, I think it is due to the regex expression)
imagen
But it also captures these 6 numeric digits from another column
imagen

The regex expression that I have used is [0-9]{6}, I tried with ^[0-9]{6}$ but that does not give a singe match.

This is the “PDF”

and this is the expression I obtain from getting the data with Get OCR Text

@“Nro
Operac.
164395
164396
164397
164398
164399 AFP
HABITAT
HABITAT
INTEGRA
INTEGRA
INTEGRA Producto
INV
INV
SOB
SOB
SOB CUSSP
579301MMXUX3
520001VSHRN5
505861JLFRN2
221300LAQES3
518731VGAlRO Afilliado
MAURICIO
VICTORIANIO
JORGE
LUCY
VICTOR Fecha
Vencimiento
09/11/2021
09/11/2021
09/11/2021
09/11/2021
09/11/2021 Mod.
Cotizadas”

Try this one:

[\d]{6}\s

This regex get only numbers that has space, new lines etc after it.

Then you can remove space using string.trim method.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.