How to extract the table format data from pdf to excel
Hi @Suraj_Gaikwad_Nuvama_Grou ,
You can follow these general steps:
- Install the UiPath.PDF.Activities package from the Manage Packages option in UiPath Studio.
- Use the Read PDF Text activity to extract the text from the PDF file.
- Use the Generate Data Table activity to convert the extracted text into a DataTable.
- Use the Filter Data Table activity to remove any unwanted rows or columns from the DataTable.
- Use the Write Range activity to write the filtered DataTable to an Excel file.
Thanks
4 point i didn’t understand
Hi @Suraj_Gaikwad_Nuvama_Grou ,
Remove the unwanted columns or rows , take only those rows or column that you needed if in case there is no unwanted data then you can skip also.
Thanks
Hi
Just now we had a similar discussion of pdf table extraction
Have a view on this thread for more details
Cheers @Suraj_Gaikwad_Nuvama_Grou
Check below post for your reference
You can use Python code and use Camelot Library
Hope this may help you
Thanks,
Srini
Can you share some example so it’s helpful for me
Hi @Suraj_Gaikwad_Nuvama_Grou ,
Maybe you could also check if the below post suits your requirements :
i m not getting understand
please let know more info to extract as it is table from pdf file
We do not know on what points you are not able to understand.
It would be better if you could provide us with a Sample Data file and then Provide us the Expected Extraction data from it formatted in an Excel maybe. This way we will be able to help you better and suggest the proper approach.
the scenario is we have 5 pdf files
2.select some specific text data and table from pdf files
3. selected data extract to the mail body
The Highlighted point above is where we would need information on, Could you provide some sample data (by masking the original data) after the data is extracted to a text file ?
Do also keep the PreserveFormat
option checked in Read PDF Text
activity.
We need to know what is the Specific text that you want to extract, Is there any anchor or Keyword that we can refer for it to be extracted and also the same for the Table Data.
We did provide some examples above which also does suggest some form of solution.
Do note on a generic level we have already many posts related to the Data extraction from PDF files but a specific case and if not encountered before, we would need to check on the data formats of the Inputs.