hi All, I want to extract data table from bank statement Pdf but when i checked pdf properties where tagged option is “No” so please let me know how we can extract data table .
Hi
Did we try with this custom component
Cheers @ankit.jain2
yes i tried with this activity but we need to install sautinsoft package and this package allows us for limited pages but i dont want paid ocr , so is there any other way to extract table
If the Pdf is pretty aligned and native
Then we can try to read that pdf and convert to datatable
Do we have that option
@ankit.jain2
yes i tried that way but rdf pdf not extracting data table of the pdf
Hi @ankit.jain2
Try these methods
-
Try with datascrapping option
a. Open the pdf
b. Then use datasrapping option to scrap the tabular data, if the datascrapping won’t works , then use screemscrapping. -
Document Understanding
Document Understanding feature also helps to extract the tables from pdf very easily, try that as well
- Try the python approach
Python via using some external modules, you can be able to extract the tables from pdfs
check this link as well
Hope this helps you
regards,
Nived N
Dear Ankit,
If the PDF is a native one,then Read the PDF and pass the string(extract the table part) to Generate Datatable activity and specify the separator.
Thanks,
Geetishree Rao
Hi @ankit.jain2
I am also facing the same issue. Have you got the solution for it ?
If yes, Please help me…
Thanks & Regards,
Pravin.