Extraction of table data from pdf

Amit_Kumar_Charde · July 16, 2023, 5:30am

Hello all,
I am trying to extract table data from pdf and write it in Excel.“Generate data table from text activity” is not working for me because the text is in a very complex manner. Can anyone suggest me another method or any direct activity available to extract the table from pdf?“Its very urgent”

lrtetala · July 16, 2023, 5:37am

Hi @Amit_Kumar_Charde

If it is Scanned document take Read PDF with OCR activity otherwise take Read PDF Text activity

Drag and drop the “Read PDF with OCR” activity into your workflow.
Configure the activity by specifying the input PDF file path and selecting the OCR engine (e.g., Google OCR, Microsoft OCR, or Abbyy OCR).
Use the output variable of the “Read PDF with OCR” activity, let’s call it pdfText, which contains the extracted text from the PDF.
Apply text manipulation techniques, such as string splitting or regular expressions, to extract the table data from the pdfText variable.
Construct a DataTable to hold the extracted table data.
Iterate through the extracted data and populate the DataTable.
Use the “Write Range” activity to write the DataTable to an Excel file.

I hope it helps!!

Amit_Kumar_Charde · July 16, 2023, 5:39am

My text is in a very complex manner so string manipulation is not working here as I tried this many times.

lrtetala · July 16, 2023, 5:41am

@Amit_Kumar_Charde

Can you provide sample pdf how it looks then we will understand how to do.
Try with Document Understanding

Anil_G · July 16, 2023, 5:42am

@Amit_Kumar_Charde

You can try using form extractor or documen tunderstanding for the same

Or try if you are able to open the pdf using word activities if so the table can be extracted from word instead

Cheers

Amit_Kumar_Charde · July 16, 2023, 5:45am

Okk Sure I will be trying this

supermanPunch · July 16, 2023, 2:25pm

Hi @Amit_Kumar_Charde ,

We would not be able to help effectively if the details are vague, let us know what is meant by complex, If there are going to be different variations in the format/Template of the PDF, Is it PDF always going to be Digital or Scanned or Mixture of both.

These details would help us provide you with suggestions that is more towards your particular case.

Amit_Kumar_Charde · July 17, 2023, 3:23am

The pdf is digital but after it is converted to text it appears to be very jumbled means data coincides with each other.

system · July 24, 2023, 5:05am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to extract table data from pdf RPA Discussions general	10	3801	April 23, 2022
Converting Pdf table to excel Activities excel , pdf , activities , studio	23	3626	January 18, 2023
PDF tabular data extraction Studio	3	798	February 24, 2021
PDF table extraction in excel/datatable Studio studio , question , properties_panel	4	2022	June 9, 2021
How to extract a table from pdf to excel Studio excel , activities	18	6598	July 19, 2023

Most Active Users - Yesterday
sharazkm32
sonaliaggarwal47
martin.parovski
prashant1603765
postwick
Darshan_Sable
ashokkarale
Anil_G
arivu96
V_Roboto_V
More details...

Extraction of table data from pdf

Related topics