PDF to Excel - unstructured and structured data - methodologies and workflow

Andre_Pedroso · November 7, 2018, 2:10pm

Hi UiPath Community,

Good afternoon,

I have completed some of the basics in handling the UiPath Software through the UiPath Academy but I feel more a noob than a rookie trying to solve some real issues. So, guide me through this marvellous RPA software. and thank you in advance

In an attached file I presented the type of data in PDF I am mining/scraping. The data are inside the tables in which occasionally has images and tables inside them.

1st issue - per definition this is considered structured data ou unstructured data? Due to the nature of this particular data set aroused some doubts…

2nd issue - what the best activities/methods for this particular case? I try the following: Screen Scraping, Read PDF, PDF to Excel (as a suggested package) and Write Range…I could not get the desired outcome.

The next file is my attempt to extract the data to put in Excel…however, it seems to me that I lack some understanding in both basics workflows and advanced features in some of activities/methods (?!)

Could you suggest documentation, alternative packages and tips?

Yours Sincerely,
André Pedroso

carmen · November 7, 2018, 3:50pm

Hi Andre,

This link could help you. Learn Robotic Process Automation with RPA Tutorials for Beginners

I have a question: How many pdf do you wanna get the information inside? All of them have the same structure?

Andre_Pedroso · November 7, 2018, 4:02pm

Hi Carmen,

Thank you for your reply, I am grateful for that link and I will read carefully.

Accordingly, with a list I have, the documentation is between 150 PDFs and 300 documents from other sources. And yes, the structure is the same with a header detailing the origin of the information.

Why you are asking about this? Could affected the desired outcome?

Yours Sincerely,
André Pedroso

carmen · November 7, 2018, 4:10pm

I was wondering just because looks to me like a messy structure. But if all your documents are the same you should be able to get all the information without problem.

Andre_Pedroso · November 7, 2018, 4:15pm

Hi, Carmen,

The PDF is organized, I just cut in some places due some data confidentiality.

I am going to try to implement accordingly with the documentation you gave to me…I will give you the feedback.

Yours Sincerely,
André Pedroso

ovi · November 8, 2018, 10:38am

Hey Andre,

As a completion to Carmen’s answer it seems that your pdf has structured data after all. There is no clear recipe to extract the data, trial and error works best when you are a rookie (at least this is how i learned).

I think Lesson 3 - Data Manipulation & Lesson 10 - PDF might shed some light on a high level of extracting the data.

Try Carmen’s suggestions first, it’s a good idea and let us know what specific issues you encounter.

Andre_Pedroso · November 28, 2018, 12:31pm

Hi Ovi,

After a few changes, and relearning a few things I am unsuccessful in scrapping the information from PDF to EXCEL.

Main_1.xaml (7.1 KB)
SADI_MALL.pdf (520.2 KB)

I insert the pdf file in question (with authorization from my boss and well what I was trying to do), I did have two weeks on this only, due to other tasks I must attend at company.

Update as trying understanding the errors.

Main_v2.xaml (8.6 KB)

Yours Sincerely,
André Pedroso

Andre_Pedroso · January 16, 2019, 7:28pm

Is partial working…but I still have issues

Topic		Replies	Views
UiPath PDF Structured Table Extraction Help pdf , activities , data_scraping , question	1	953	January 6, 2020
PDF to Excel Table Extraction Robot robot , question	3	1098	January 16, 2023
PDF to Excel - Extract structured data Help excel , pdf , activities , studio	14	8663	November 28, 2018
Extraction data Studio studio , question , activities_panel	5	890	February 22, 2022
PDF data to excel in same format Studio excel , uiautomation , activities , studio , question , tools	14	995	October 20, 2023

PDF to Excel - unstructured and structured data - methodologies and workflow

Related topics