Extract text from PDFs

Hey
I need to extract text from PDFs and save the information in one line in the text file for each new PDF.

I configured the flow following this article, but I don’t know how to get the correct information, I can’t extract just what I need. https://docs.uipath.com/activities/docs/read-pdf-files

In the print I show just what I need.

Hey

i think that document understanding with form extractor will work, you can refer from uipath academy videios

regards

Hi Rodrigo, the example Read PDF Files is to extract text information from PDF file. Then the extracted text is split into multiple lines. The required data fields have to be retrieved one by one based on the text you extracted from PDF.

You can check the text extracted from your PDF file and then decide how to extract individual fields.

There is no short cut or magic way to extract your selected fields even by UiPath Document Understanding.

Hi, @liu_shubin @fernando_zuluaga

I studied several articles and managed to evolve in the extraction of PDF texts.
The doubt I am is how to create a REGEX to get this information that is selected in the print can you help me please?

@Rodrigo_Buch

I hope you trying to get the highlighted text… The last up address.

Also will it always be always after the 0.00 0.00

@Rahul_Unnikrishnan
I took other models and it seems that yes it will always be 0.00 0.00

Hi Rodrigo,
I assume that you want to extract “VALOR TOTAL DA NF 32793” and there is no line break in your text. Otherwise, you can remove line breaks before applying RegEx.

Here is the RegEx to extract the data. “\s+” is to include multiple spaces as text extracted by OCR may include multiple spaces. “\d+” is the the end part of the string with multiple digits. You can refer to the flow attached.
Main.xaml (5.6 KB)

VALOR\s+TOTAL\s+DA\s+NF\s+\d+