Extracting table in PDF document dynamically

Ram_Shiva_Reddy · February 1, 2023, 4:16am

Dear Team,
Can we extract Multiple rows of table in PDF?
I have 5 rows of table in 1st PDF and 8 rows of table in 2nd PDF, is it possible to extract the tables dynamically?

Srini84 · February 1, 2023, 5:26am

@Ram_Shiva_Reddy

You might need to integrate with Python
There was an library where it is able to extract the Tables from PDF file

Check below video for your reference

Hope this may help you

Thanks,
Srini

Ram_Shiva_Reddy · February 1, 2023, 5:53am

Hello sir,
Thank you for your response.
Is it possible to extract the tables varying with rows through Document Understanding?

supermanPunch · February 1, 2023, 5:57am

Hi @Ram_Shiva_Reddy ,

Even though we could perform the task using Document Understanding, we would firstly like to understand what is the quality of the Document at hand, are there going to be documents which are digital and Scanned or only Digital Documents/PDF’s.

Let us know more about the Document Samples, It’s types, Number of templates that you would be receiving, then we should be able to make an appropriate suggestion towards the steps needed.

Ram_Shiva_Reddy · February 1, 2023, 5:59am

Please find the attached documents for your reference.
Need to extract the tables from both pdf’s
Test - 1.pdf (45.3 KB)
Test - 2.pdf (50.5 KB)

Srini84 · February 1, 2023, 6:01am

@Ram_Shiva_Reddy

Yes, you can use Document understanding, As @supermanPunch said you have to understand your PDF document of which quality etc., and later you can train the document to extract the required fileds

Hope this may help you

Thanks,
Srini

Ram_Shiva_Reddy · February 1, 2023, 6:03am

Sir,
It’s a structured format pdf. It has same Key and same format. but only the table rows varies. Once if I give the path the bot has to extract the all files with irrespective of table rows.

Srini84 · February 1, 2023, 6:07am

@Ram_Shiva_Reddy

Check below video for your reference

Hope this may help you

Thanks,
Srini

Ram_Shiva_Reddy · February 1, 2023, 6:24am

Sir,
This is partially useful. Because within the video the pdf has same number of rows. If the pdf varies with rows, then through Document Understanding the table is not extracting. If I indicate only for 4 rows then for the next pdf which contains more rows doesn’t extracts. If I indicate the highest number rows pdf as default then whatever the text present below the table also extracting within the table. So here I only need the table to extract irrespective or rows

Srini84 · February 1, 2023, 6:26am

@Ram_Shiva_Reddy

I think below post relates your question

Hope this may help you

Thanks,
Srini

supermanPunch · February 1, 2023, 6:38am

@Ram_Shiva_Reddy ,

Since, the PDF shared are digital documents. You could check with the workflow provided in the post below :

I did test it out with your PDF, it does seem to extract the data properly. Do let us know if it is still not working.

Topic		Replies	Views
Query related to PDF extraction through Document Activities activities , question , document_understanding	3	562	January 25, 2023
Extract table from PDF - Document Understanding Studio studio , question , activities_panel	5	237	October 19, 2024
Extract Varying Size PDF Using Document Understanding Action Center uiautomation , studio , question , document_understanding , action_center	2	814	February 2, 2023
Is there a way to extract dynamic table from pdf using document understanding? Activities activities , question , document_understanding	0	463	June 16, 2023
Data Extraciton from PDF tables Automation Suite excel , uiautomation , robot , activities , question , pdf-extraction	6	1272	January 26, 2023

Extracting table in PDF document dynamically

Related topics