Hello Team,
I having an issue in extracting the table from PDF. If there are 3 PDF’s with same format and each PDF has different rows of table, like 1st PDF has 4 rows of table, 2nd has 6 rows and 3rd has 2 rows. Can we make it dynamic to extract the table?
Hi @Ram_Shiva_Reddy ,
Could you let us know if the PDF inputs would only be digital documents or it may be scanned images as well ?
PDF inputs will be digital document.
you can try the below steps
- Use read pdf text and read the data
- Use split function on extracted string to split only the table data…by identifying a constant start and end words before and after the table in pdf
- Then if all the columns have data then use generate datatable to get the table as Datatable. Else use regex to extract each row separately
Hope this helps
cheers
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.