Unstructured PDF

Hey guys,
I’m developing an RPA to read and extract some information from a PDF to Excel, I’m using Document Understanding, but I can only extract one line from the PDF I need to extract all the lines and all the pages from this PDF and add it to an Excel.

Can anyone help me?

Thanks.

Hi @Richarlei_Reis ,

Could you provide us some more information about the methods used or Extractors used?

Is the PDF Digital or is it a Scanned PDF?

Hi @Richarlei_Reis,

Just like @supermanPunch has said more information about the extractor you used will be helpful, UiPath should be able to extract multiple lines from the PDF as your use case requires.

@Richarlei_Reis,

As colleagues comment, we would need more info about what you are doing the get the data.

By the way, did you try to use data scraping?

@Richarlei_Reis Which extractor you were using ?

If it is Form or Intelligent Form Extractor remember this you should always train the pdf with maximum number of row items that you can get in your use case (Ex: if you train for 4 row items and if you want to extract it for 10 line item, it can’t do that, instead you should train it for 10 line item)

If you are ML extractor, you should retrain the model with few more invoices to get it work as expected

2 Likes

I’m using Form Extractor

I’m sorry, I’m new to Uipath and I’m still learning. I can send my project if needed.

Captura de tela 2022-03-28 135652

@ushu @supermanPunch @OLAOLUWA_DARAMOLA @Angel_Llull

I’m using Form Extractor

It’s duplicating on the same line and also in excel.

I’m sorry, I’m new to Uipath and I’m still learning.

@Richarlei_Reis This is not the way of extracting line items . Please refer to the below video and doc

1 Like

@ushu

The video below is a structured PDF my PDF is unstructured and I only need to extract a few items.

@Richarlei_Reis Do you want to extract only few columns or few rows from the pdf

@ushu I need to extract these specific items from all PDF’s

@Richarlei_Reis If you want to extract any data from the table data we have to define the variable with type as Tabledata since you have to extract more than 1 row and define the columns you want to extract

Also, please go through the below doc on more info

Note: For better results train the pdf with only 2 rows since you want to get only first two rows dat

@ushu

No, I need to extract all the lines from the PDF with these items and it can contain several pages.

I will read the document below,

@ushu
I couldn’t reach my goal.