Query related to PDF extraction through Document

Ram_Shiva_Reddy · January 24, 2023, 9:56am

Hello Team,
I having an issue in extracting the table from PDF. If there are 3 PDF’s with same format and each PDF has different rows of table, like 1st PDF has 4 rows of table, 2nd has 6 rows and 3rd has 2 rows. Can we make it dynamic to extract the table?

supermanPunch · January 24, 2023, 10:21am

Hi @Ram_Shiva_Reddy ,

Could you let us know if the PDF inputs would only be digital documents or it may be scanned images as well ?

Ram_Shiva_Reddy · January 25, 2023, 4:31am

PDF inputs will be digital document.

Anil_G · January 25, 2023, 4:49am

@Ram_Shiva_Reddy

you can try the below steps

Use read pdf text and read the data
Use split function on extracted string to split only the table data…by identifying a constant start and end words before and after the table in pdf
Then if all the columns have data then use generate datatable to get the table as Datatable. Else use regex to extract each row separately

Hope this helps

cheers

system · January 28, 2023, 4:50am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extracting table in PDF document dynamically Activities activities , question , document_understanding	11	1984	February 1, 2023
Extracting table from PDF and splitting row by column Studio studio , question , properties_panel	18	4364	April 20, 2022
How to extract all data tables from PDF and save it into excel Community question , community	2	807	July 14, 2021
Data Extraciton from PDF tables Automation Suite excel , uiautomation , robot , activities , question , pdf-extraction	6	1215	January 26, 2023
PDF TABLES EXTRACTION EVEN WHEN ROWS ARE DYNAMIC Studio studio , question , document_understanding	9	2317	February 7, 2023

Query related to PDF extraction through Document

Related topics