I have a PDF file that I received from E-Mail. This PDF document contains multiple tables and the page numbers of these tables vary. I want to read table with specific name. For example, the name of this table may be “Income Statement”. How can I do this, I have no idea.
Also how can I write a Regex Query about this topic?
Can you help with this automation please?
I would be very happy if you can help me in detail from my UiPath Forum and Personal accounts.
I think you can try
Step 1. You first read the pdf file using pdf read activity - save the string and you will spot a pattern
Step 2. Perform some string manipulation to get the string output from Step 1 to look like a csv format (replacing double spaces with “,”)
Step 3. UiPath can now save that string directly as a temporary csv file, but I see that you want the result in excel. To save the result in excel, there are two more steps.
Step 4. You can now read the temporary CSV file and save the content to a Extracted datatable. Further to keep things clean, you can delete the temporary CSV.
Step 5. Finally, using the excel activity write the Extracted datatable to an Excel file
But to detail, Can you share your PDF?
Regards,
LNV
Hi @Huseyin_Kizil
=> Use “Get PDF Page Count” activity and store the output in a variable say PageCount.
=> Initalize Count value to 1.
=> Use While loop give the condition as Count< PageCount
=> Use Read PDF Text or Read PDF with OCR to read the PDF and store the output in a variable say str_text.
=> Use an If Condition to check whether str_text contains the particular table name like below condition: str_text.Contains("Income Statement")
=> If the condition is true we can extract the table.
=> If the condition is false Increment the count by 1.