Extract specific table within PDF Form with RegEx

Ronke · March 6, 2023, 7:29pm

Hi guys,

I’m looking to extract a specific table within a PDF document with RegEx as I’m unable to scrape data with UiPath data scraping activity due to the version of Adobe reader we use in my organisation.

The PDF I’m working with is essentially a 7-page document with several tables but I’m looking to extract one specific table. for confidentiality reasons, I won’t be able to share the pdf doc itself but the table I’m looking to extract looks something like this;

Any pointers please?

FYI: the values in table are dynamic, some fields may or may not contain data on a case by case basis.

thank you and I look forward to your suggestions

raja.arslankhan · March 6, 2023, 7:32pm

@Ronke
Please try this one.

Ronke · March 6, 2023, 7:35pm

hi, thanks for your recommendation however it looks like the solution proposed is a 3rd party package - my company operates on a strictly No 3rd party package policy unfortunately so I won’t be able to use this solution

raja.arslankhan · March 6, 2023, 7:37pm

@Ronke ok got it.
did you check this one.

Manish540 · March 7, 2023, 5:04am

Hi @Ronke ,

Check this below link to extract table from pdf using regex,

Hope this may help you

supermanPunch · March 7, 2023, 5:22am

Hi @Ronke ,

When the table data may or may not contain data (such as empty values in some cells), we might not be able to get it right with the regex /String manipulation. We could try with the Approach of Generate Datatable Activity, trying all it’s combinations. If not successful, we give a try on the methods of Interop Word c#. You could check the post below for the example workflow.

However, even with this method, it would not work always for some of the PDF types. We would need to perform a thorough check with the types of PDF that you would receive and then confirm if this works for all cases.

Ronke · March 7, 2023, 11:23am

Hello, thanks for your recommendation.

I tried the pdf to word suggestion but it does not scrape the row values of tables in the pdf.

also the for each activity keeps throwing an error.

could there be another approach to extract the specific table I want based on the column names and export to an excel doc?

supermanPunch · March 7, 2023, 11:30am

@Ronke ,

Could you Check the Output Panel in that case. I believe the exact error message will be logged in the output panel.

Based on the error message we might be able to figure out if anything within the workflow could be adjusted to get the workflow running.

Ronke · March 7, 2023, 12:00pm

it says object reference not set to an instance of an object

also got this:

“message”: “Microsoft Word Cannot access individual rows in this collection because the table has vertically merged cells. Microsoft.Office.Interop.Word.Row get_First()”,

Ronke · March 7, 2023, 12:31pm

I have also tried to use the Get Text activity but UiPath does not seem to recognize the pdf window.

I use Adobe Acrobat Reader 22.3.2031.0

any thoughts please?

desineediaditya · March 7, 2023, 6:59pm

Hi @Ronke
Can you check this video! it might help.

ABHIMANYU_THITE1 · March 8, 2023, 5:03am

Hi @desineediaditya, Thanks for sharing, Good tutorial!

desineediaditya · March 8, 2023, 7:48pm

Thank you @ABHIMANYU_THITE1

Topic		Replies	Views
Extract table from PDF using Regex Studio	3	2281	February 24, 2021
How to extract the table using Regex or String Manipulation Studio studio , question , designer_canvas	17	2201	November 29, 2022
Unable to get 'PDF Table Data' Using Regex,String manipulation Help studio , question	4	884	February 24, 2021
How to extract tables when multiple pages in pdf file Studio studio , question , activities_panel	9	727	November 23, 2023
How can I extract the table from below pdf using string manipulation? Studio studio , question , activities_panel	2	414	July 5, 2023

Most Active Users - Yesterday
ashokkarale
Anil_G
Ruban_shanmugam
Lalit_Chaudhari
eyashb
sonaliaggarwal47
PWilliams
AzeemK
Juan_Hkahfi
More details...

Extract specific table within PDF Form with RegEx

Related topics