Tabular data extraction from pdf to excel


How can i extract tabular format data from pdf into excel? i have single pdf with multiple pages with the below format of the table

the format of column names would be always
Case Format Kind. Hsm klp/kam @P

i need to extract this table data into excel.

  1. IF you have ABBYY then its a straight forward either using:
    a. Table format
    b. Repeating group.
  2. else need to check weather its a readable format or not, if it is a readable format
    a. Using generate data table you can convert that.
    b. write all extracted text to text file and import that file in excel directly.
  3. If it a OCR need to check the OCR confidence level when you read the data and then perform above actions mentioned.

Hi @sravyarao20,

if it’s digitally created pdf you can use DPF to text activity and to extract table you need to use Regex. so when using regex basically the idea is you need to create delimiter String and then you can use Datatable activity to convert created CSV into data-table and the you can use it

It’s readable format and only one pdf file where it as many tables with same column names I need to extract all the data with those column names

Yes it is digital pdf and column names are also fixed names which we need to extract from single pdf file

Hi @sravyarao20,

I am not able to see the PDF file, Kindly share sample PDF files if it is possible and does not have any sensitive data with that.

You can try with read pdf activity to read that and have to identify the line breaks using some method for each line times then add it to datatable. Finally you can write that datatable into a excel sheet.

Hi @sravyarao20

If possible plzz attach and share the pdf

