Extract specific table within PDF Form with RegEx

Hi guys,

I’m looking to extract a specific table within a PDF document with RegEx as I’m unable to scrape data with UiPath data scraping activity due to the version of Adobe reader we use in my organisation.

The PDF I’m working with is essentially a 7-page document with several tables but I’m looking to extract one specific table. for confidentiality reasons, I won’t be able to share the pdf doc itself but the table I’m looking to extract looks something like this;

image

Any pointers please?

FYI: the values in table are dynamic, some fields may or may not contain data on a case by case basis.

thank you and I look forward to your suggestions :slight_smile:

@Ronke
Please try this one.

1 Like

hi, thanks for your recommendation however it looks like the solution proposed is a 3rd party package - my company operates on a strictly No 3rd party package policy unfortunately so I won’t be able to use this solution :frowning:

@Ronke ok got it.
did you check this one.

1 Like

Hi @Ronke ,

Check this below link to extract table from pdf using regex,

Hope this may help you :slight_smile:

Hi @Ronke ,

When the table data may or may not contain data (such as empty values in some cells), we might not be able to get it right with the regex /String manipulation. We could try with the Approach of Generate Datatable Activity, trying all it’s combinations. If not successful, we give a try on the methods of Interop Word c#. You could check the post below for the example workflow.

However, even with this method, it would not work always for some of the PDF types. We would need to perform a thorough check with the types of PDF that you would receive and then confirm if this works for all cases.

Hello, thanks for your recommendation.

I tried the pdf to word suggestion but it does not scrape the row values of tables in the pdf.

also the for each activity keeps throwing an error.

could there be another approach to extract the specific table I want based on the column names and export to an excel doc?

@Ronke ,

Could you Check the Output Panel in that case. I believe the exact error message will be logged in the output panel.

Based on the error message we might be able to figure out if anything within the workflow could be adjusted to get the workflow running.

it says object reference not set to an instance of an object

also got this:

“message”: “Microsoft Word Cannot access individual rows in this collection because the table has vertically merged cells. Microsoft.Office.Interop.Word.Row get_First()”,

I have also tried to use the Get Text activity but UiPath does not seem to recognize the pdf window.

I use Adobe Acrobat Reader 22.3.2031.0

any thoughts please?

Hi @Ronke
Can you check this video! it might help.

1 Like

Hi @desineediaditya, Thanks for sharing, Good tutorial!

1 Like

Thank you @ABHIMANYU_THITE1