Hello Everyone,
I want to convert a pdf file consisting of tables into excel. I tried many ways but did not achieve the result. I tried using many activities but of no use. Can anyone help out with this?
hi @Amit_Kumar_Charde ,
Try using ocr to extract table data and save in excel
if pdf is clear to view then try using Data scraping
…
Thanks
Hi @Amit_Kumar_Charde
Follow below steps.
- Use a pdf with OCR actions to extract the datatable in your pdf. You can get the output as a datatable.
- Use the write range activity to save the Excel file.
Regards,
Kaviyarasu N
Hi @Amit_Kumar_Charde ,
Could you let us know if the PDF document that you would be receiving would be a Digital Document or a Scanned Document ?
It’s a digital document.
Here the output is in Text format.
In that case, we should be able to read the PDF data using PDF Activities and apply String/Regex manipulation to get the data into a Datatable format.
Is it possible for you to provide us with the PDF document or a Sample of that PDF having the same pattern of data as the original PDF document ?
We could check if a String/Regex manipulation would be an easier approach for the pattern of PDF data that you receive.
Additionally, Could you Check with the below workflow :
PDF_To_Excel_Demo.zip (101.2 KB)
The above workflow uses another method, where we are first converting the PDF to Word Document and then extracting the tables using the Interop.Word
methods.
Let us know if this works for your case or do provide us with more info on the Sample documents of your PDF data.
@supermanPunch
I tried the above workflow but in my case its not working. Can you suggest me direct activity that can convert pdf to excel?
you can perform string manipulation inorder to convert the PDF to Datatable.
Could you let us know what was not working ? What was the Errors Received ? You could check the Output Panel for details of errors that was received inside the Invoke Code
Activity.
I have uploaded the screenshot of the error received.
Could you also send a Screenshot of the Output Panel when this error occurs ?
Have you passed the full file path of the PDF file ?
The error specifies as the Directory name is not a valid name.
Could you double check on the File path provided and maybe also try placing the file in the Project folder and provide it’s full path and check if it works.
Yes Now it’s working fine till conversion to the word file but after that same error repeats again.
@Amit_Kumar_Charde ,
Could you also post the Output Panel details ?
It seems that the table content in your PDF is not in the normal format. In order for us to help you further, Could you provide us with a Sample document of your PDF for analysing further as to what could be made to resolve the issue.
Yes, I am sharing the pdf.
Treatment_audit_register.pdf (111.8 KB)
If there is direct conversion to excel without converting to word then it would be better. Pls, suggest me a good solution brother.
Use Read Pdf with OCR to read the pdf file & store into Variable then use write range activity & pass the variable to write that pdf data to excel
Thanks
Varun