Extracting only table from unstructured pdfs

Hello Everyone, I am very new to the world of RPA and I have been trying to extract table from pdfs but have been pacing issues with extracting only the table as there is multiple pages and different number of columns and information in different pdfs.

Hi @abhishek.singh4

If document understanding was correctly parsing your table, have you thought of using a split pdf activity and feeding the pages into the document understanding model two pages at a time, and then merging the resulting tables?

Other options include using regular expressions on the text, manually populating a table row by row. Or, using a form extractor from the document understanding package which might not have the same limit.

1 Like

Hi @abhishek.singh4,

If your footer can be identified uniquely by some value, then try string manipulations and remove the footer value.

Cheers