Hello all,
I am trying to extract table data from pdf and write it in Excel.“Generate data table from text activity” is not working for me because the text is in a very complex manner. Can anyone suggest me another method or any direct activity available to extract the table from pdf?“Its very urgent”
If it is Scanned document take Read PDF with OCR activity otherwise take Read PDF Text activity
- Drag and drop the “Read PDF with OCR” activity into your workflow.
- Configure the activity by specifying the input PDF file path and selecting the OCR engine (e.g., Google OCR, Microsoft OCR, or Abbyy OCR).
- Use the output variable of the “Read PDF with OCR” activity, let’s call it
pdfText
, which contains the extracted text from the PDF. - Apply text manipulation techniques, such as string splitting or regular expressions, to extract the table data from the
pdfText
variable. - Construct a DataTable to hold the extracted table data.
- Iterate through the extracted data and populate the DataTable.
- Use the “Write Range” activity to write the DataTable to an Excel file.
I hope it helps!!
My text is in a very complex manner so string manipulation is not working here as I tried this many times.
Can you provide sample pdf how it looks then we will understand how to do.
Try with Document Understanding
You can try using form extractor or documen tunderstanding for the same
Or try if you are able to open the pdf using word activities if so the table can be extracted from word instead
Cheers
Okk Sure I will be trying this
Hi @Amit_Kumar_Charde ,
We would not be able to help effectively if the details are vague, let us know what is meant by complex, If there are going to be different variations in the format/Template of the PDF, Is it PDF always going to be Digital or Scanned or Mixture of both.
These details would help us provide you with suggestions that is more towards your particular case.
The pdf is digital but after it is converted to text it appears to be very jumbled means data coincides with each other.
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.