Extract text from PDFs

Rodrigo_Buch · May 10, 2022, 12:27am

Hey
I need to extract text from PDFs and save the information in one line in the text file for each new PDF.

I configured the flow following this article, but I don’t know how to get the correct information, I can’t extract just what I need. https://docs.uipath.com/activities/docs/read-pdf-files

In the print I show just what I need.

fernando_zuluaga · May 10, 2022, 12:31am

Hey

i think that document understanding with form extractor will work, you can refer from uipath academy videios

regards

liu_shubin · May 10, 2022, 12:54am

Hi Rodrigo, the example Read PDF Files is to extract text information from PDF file. Then the extracted text is split into multiple lines. The required data fields have to be retrieved one by one based on the text you extracted from PDF.

You can check the text extracted from your PDF file and then decide how to extract individual fields.

There is no short cut or magic way to extract your selected fields even by UiPath Document Understanding.

Rodrigo_Buch · May 27, 2022, 2:19pm

Hi, @liu_shubin @fernando_zuluaga

I studied several articles and managed to evolve in the extraction of PDF texts.
The doubt I am is how to create a REGEX to get this information that is selected in the print can you help me please?

Rahul_Unnikrishnan · May 27, 2022, 2:26pm

@Rodrigo_Buch

I hope you trying to get the highlighted text… The last up address.

Also will it always be always after the 0.00 0.00

Rodrigo_Buch · May 27, 2022, 3:10pm

@Rahul_Unnikrishnan
I took other models and it seems that yes it will always be 0.00 0.00

liu_shubin · June 4, 2022, 11:04am

Hi Rodrigo,
I assume that you want to extract “VALOR TOTAL DA NF 32793” and there is no line break in your text. Otherwise, you can remove line breaks before applying RegEx.

Here is the RegEx to extract the data. “\s+” is to include multiple spaces as text extracted by OCR may include multiple spaces. “\d+” is the the end part of the string with multiple digits. You can refer to the flow attached.
Main.xaml (5.6 KB)

VALOR\s+TOTAL\s+DA\s+NF\s+\d+

Topic		Replies	Views
How to extract specific text from PDF Certification studio	10	4344	July 13, 2020
Unable to extract these values from PDF Help uiautomation , studio , question	13	1669	January 9, 2021
Pdf data scrapping Help	16	2235	April 16, 2019
Extraction in Invoice Problem Studio studio , question , activities_panel	30	2242	March 16, 2021
Extract Specific Text from PDF Help studio	13	8906	March 3, 2021

Extract text from PDFs

Related topics