Unstructured PDF

Richarlei_Reis · March 28, 2022, 1:00pm

Hey guys,
I’m developing an RPA to read and extract some information from a PDF to Excel, I’m using Document Understanding, but I can only extract one line from the PDF I need to extract all the lines and all the pages from this PDF and add it to an Excel.

Can anyone help me?

Thanks.

supermanPunch · March 28, 2022, 1:15pm

Hi @Richarlei_Reis ,

Could you provide us some more information about the methods used or Extractors used?

Is the PDF Digital or is it a Scanned PDF?

OLAOLUWA_DARAMOLA · March 28, 2022, 1:41pm

Hi @Richarlei_Reis,

Just like @supermanPunch has said more information about the extractor you used will be helpful, UiPath should be able to extract multiple lines from the PDF as your use case requires.

Angel_Llull · March 28, 2022, 2:19pm

@Richarlei_Reis,

As colleagues comment, we would need more info about what you are doing the get the data.

By the way, did you try to use data scraping?

ushu · March 28, 2022, 3:56pm

@Richarlei_Reis Which extractor you were using ?

If it is Form or Intelligent Form Extractor remember this you should always train the pdf with maximum number of row items that you can get in your use case (Ex: if you train for 4 row items and if you want to extract it for 10 line item, it can’t do that, instead you should train it for 10 line item)

If you are ML extractor, you should retrain the model with few more invoices to get it work as expected

Richarlei_Reis · March 28, 2022, 5:02pm

I’m using Form Extractor

I’m sorry, I’m new to Uipath and I’m still learning. I can send my project if needed.

Captura de tela 2022-03-28 135652

Richarlei_Reis · March 28, 2022, 5:15pm

@ushu @supermanPunch @OLAOLUWA_DARAMOLA @Angel_Llull

I’m using Form Extractor

It’s duplicating on the same line and also in excel.

I’m sorry, I’m new to Uipath and I’m still learning.

ushu · March 28, 2022, 5:33pm

@Richarlei_Reis This is not the way of extracting line items . Please refer to the below video and doc

Richarlei_Reis · March 28, 2022, 5:51pm

@ushu

The video below is a structured PDF my PDF is unstructured and I only need to extract a few items.

ushu · March 28, 2022, 6:54pm

@Richarlei_Reis Do you want to extract only few columns or few rows from the pdf

Richarlei_Reis · March 28, 2022, 7:12pm

@ushu I need to extract these specific items from all PDF’s

ushu · March 28, 2022, 7:39pm

@Richarlei_Reis If you want to extract any data from the table data we have to define the variable with type as Tabledata since you have to extract more than 1 row and define the columns you want to extract

Also, please go through the below doc on more info

Note: For better results train the pdf with only 2 rows since you want to get only first two rows dat

Richarlei_Reis · March 30, 2022, 7:42pm

@ushu

No, I need to extract all the lines from the PDF with these items and it can contain several pages.

I will read the document below,

Richarlei_Reis · April 19, 2022, 2:46pm

@ushu
I couldn’t reach my goal.

Topic		Replies	Views
Document Understanding - Trying to extract data from whole document, will only recognize information from the first page not all pages Document Understanding	3	1117	May 27, 2022
Help.I want to extract data from Scanned Pdf and import to 1 excel spreadsheet AI Center selector , uiautomation , pdf , data_scraping , question , ai_center	1	1401	February 21, 2021
Document understanding with UiPath MULTIPLE PAGES Document Understanding datatable , excel , uiautomation , forum , question	5	1957	November 21, 2022
Read multiple pages of a PDF file Document Understanding	10	4532	April 26, 2022
Hi, How can I Extract Multiple Line items from invoice. Please check the images below for understanding Studio studio , question , tools	2	1311	March 7, 2022

Unstructured PDF

Related topics