Document Understanding Invoices

I have an invoice which consist of 3 page. I want to get the table from this invoice. But the problem is that, table is also expanded to 3 page. How can i get the table values in such scenarios. Is there is any solution for this?
aa company 1.pdf (27.5 KB)

i have attached a sample pdf here. Consider each table in the page consist of more than 15 rows and also it is dynamic.

Hey @ajnaraya
you can use grouping a table row and extend it to new page.

Hi,

  1. Digitize Document: Use the Digitize Document activity with a suitable OCR engine to digitize the entire invoice.

  2. Data Extraction Scope: Place the Data Extraction Scope activity in your workflow and add the required extractor, such as the Form Extractor or Machine Learning Extractor.

  3. Present Validation Station: After extraction, use the Present Validation Station activity for human validation. Here, you can correct any mistakes in the extraction.

  4. Group Table Rows: In the Validation Station, manually group the rows of the table that span multiple pages by selecting the continuing fields and pressing the / key to indicate they are part of the same row.

  5. Export Extracted Data: Validate and export the data to a structured format like DataTable or Excel for further processing.

Sample UiPath Sequence:

  • Digitize Document (Output: DigitizedDocument)
  • Data Extraction Scope (Input: DigitizedDocument, Output: ExtractionResults)
  • Present Validation Station (Input: ExtractionResults, Output: HumanCorrectedResults)
  • Export Extraction Results (Input: HumanCorrectedResults)

Note:This method is suitable for tables spanning multiple pages and will work with large volumes of invoices

Hi @ajnaraya

-Use Screen scrapping activity
-Indicate the table you want to extract by clicking and dragging to select the table area across all pages of the PDF invoice.
-Once the table is selected, a window will appear to configure the scraping options.
-Choose the appropriate scraping method based on the structure of the table. You can choose between “Full Text,” “Native,” or “OCR” depending on the content of the PDF.
-Choose the output method as “Data Table” to extract the table values into a DataTable variable.
-Once the table values are extracted, you can further process them as needed

Cheers…!