I have an invoice which consist of 3 page. I want to get the table from this invoice. But the problem is that, table is also expanded to 3 page. How can i get the table values in such scenarios. Is there is any solution for this? aa company 1.pdf (27.5 KB)
i have attached a sample pdf here. Consider each table in the page consist of more than 15 rows and also it is dynamic.
Digitize Document: Use the Digitize Document activity with a suitable OCR engine to digitize the entire invoice.
Data Extraction Scope: Place the Data Extraction Scope activity in your workflow and add the required extractor, such as the Form Extractor or Machine Learning Extractor.
Present Validation Station: After extraction, use the Present Validation Station activity for human validation. Here, you can correct any mistakes in the extraction.
Group Table Rows: In the Validation Station, manually group the rows of the table that span multiple pages by selecting the continuing fields and pressing the / key to indicate they are part of the same row.
Export Extracted Data: Validate and export the data to a structured format like DataTable or Excel for further processing.
Sample UiPath Sequence:
Digitize Document (Output: DigitizedDocument)
Data Extraction Scope (Input: DigitizedDocument, Output: ExtractionResults)
Present Validation Station (Input: ExtractionResults, Output: HumanCorrectedResults)
-Use Screen scrapping activity
-Indicate the table you want to extract by clicking and dragging to select the table area across all pages of the PDF invoice.
-Once the table is selected, a window will appear to configure the scraping options.
-Choose the appropriate scraping method based on the structure of the table. You can choose between “Full Text,” “Native,” or “OCR” depending on the content of the PDF.
-Choose the output method as “Data Table” to extract the table values into a DataTable variable.
-Once the table values are extracted, you can further process them as needed