Table extraction from Pdf

hi All, I want to extract data table from bank statement Pdf but when i checked pdf properties where tagged option is “No” so please let me know how we can extract data table .

Hi
Did we try with this custom component

Cheers @ankit.jain2

yes i tried with this activity but we need to install sautinsoft package and this package allows us for limited pages but i dont want paid ocr , so is there any other way to extract table

If the Pdf is pretty aligned and native
Then we can try to read that pdf and convert to datatable

Do we have that option
@ankit.jain2

yes i tried that way but rdf pdf not extracting data table of the pdf

Hi @ankit.jain2
Try these methods

  1. Try with datascrapping option

    a. Open the pdf
    b. Then use datasrapping option to scrap the tabular data, if the datascrapping won’t works , then use screemscrapping.

  2. Document Understanding

Document Understanding feature also helps to extract the tables from pdf very easily, try that as well

  1. Try the python approach

Python via using some external modules, you can be able to extract the tables from pdfs
check this link as well

Hope this helps you

regards,
Nived N

Dear Ankit,
If the PDF is a native one,then Read the PDF and pass the string(extract the table part) to Generate Datatable activity and specify the separator.

Thanks,
Geetishree Rao

Hi @ankit.jain2

I am also facing the same issue. Have you got the solution for it ?

If yes, Please help me… :pray:

Thanks & Regards,
Pravin.