For some standard set of documents (like invoices, receipts, bank statements, etc.), UiPath has provided Public ML Endpoints - Public Endpoints for ML Packages which we can use along with Document Understanding related activities to extract info. But if there is a custom document type, how can we extract a tabular data using Document Understanding.
Note: The tabular data can span across multiple pages, so we cannot use Fixed Form Layout option.
Hi @sagar.sonavane ,
You can refer this…
The FlexiCapture Layout option uses OCR technology to read and analyze the PDF document, recognize the table structure, and extract the data into a DataTable. It works well for tables with varying layouts, and it can handle tables that span multiple pages. You can use the FlexiCapture Layout Options to define the table structure, such as the number of columns, row separators, and table headers.
In case yu need the steps, here you go:
- Open UiPath Studio and create a new project.
- Add a Sequence activity to the main workflow.
- Drag and drop the Read PDF Text activity to the Sequence.
- Select the PDF file you want to extract data from.
- Set the output variable to a string variable.
- Add the FlexiCapture Layout activity to the Sequence.
- Set the input variable to the string variable from the Read PDF Text activity.
- Set the output variable to a DataTable variable.
- Set the FlexiCapture Layout Options properties to define the table structure and layout.
- Run the Sequence to extract the tabular data.
Happy Automation
Mark it as asolution if you feel its helpful
Regards,
@pratik.maskar
Hello @pratik.maskar , thank you for the detailed reply.
The FlexiCapture Layout activity is available through which package?
Also, if you are also aware of Document Understanding related approach, can you please share the same?
The FlexiCapture Layout activity is available through the “UiPath.IntelligentOCR.Activities” package in UiPath.
Regards,
@pratik.maskar