How to extract all the data from a PDF

dumua · June 1, 2017, 7:54pm

Hello everyone,

I’m new in RPA so sorry if the question sound ridiculous.

My problem is that I want to extract all the data and their signification from a very long PDF (almost 200 pages) into a data table. And I only know how to extract them manually with data scrapping.

Thank you for your help

vvaidya · June 1, 2017, 8:19pm

On what basis you want to convert your pdf into rows and columns? For instance: All paragraphs in a page forms 1 row?

dumua · June 1, 2017, 8:32pm

I want to extract all the data table (and no paragraphs) present in the PDF file into one multi rows and columns in Excel sheet.
But the problems is that some of the data tables can’t be ridden by the Data Scraping tool.

dumua · June 7, 2017, 2:51pm

Hello everyone,

I keep needing your help.

I want to extract all the data from numerous pdf file, and then reorganize them in one unique DataBase.

My loop work pretty well, but I don’t understand why I can’t extract the data.
I’m using, Inside my loop, read pdf text → extract structured data → excel application scope → write range.

There is my workflow if you want to see : Demo.xaml (9.5 KB)

Regards
Antoine

vvaidya · June 7, 2017, 8:15pm

Please check this

ExtractMetadata
An XML string that enables you to define what data to extract from the indicated web page.

https://www.uipath.com/activities-guide/extract-structured-data

dumua · June 7, 2017, 8:58pm

Thanks for the help, but i already look at this and didn’t find the answer.

The problem is that, when i enter PDF in the Metadata of Extract Structured Data they said : “Unable to cast object of type ‘Newtonsoft.Json.JValue’ to type ‘System.String’.”

I don’t know how to do

dumua · June 8, 2017, 4:30pm

Hi,

I try something else for my problem : I use Read PDF, then i use Assigne with Substring, in order to extract the data that I wan’t from the string of the Read PDF.
But it doesn’t work again.
Can someone look at my workflow ?

Specific Data.xaml (13.7 KB)

Regards

ddpadil · June 8, 2017, 4:43pm

Bcz your assignment activity doesn’t have new LHS
Your assigning Read PDF output variable ExtractDT in LHS and in RHS your passing substring correctly.
Please create new variable of type string and pass it in LHS and give a try.

vvaidya · June 8, 2017, 4:44pm

WF looks fine to me now.

Substring gives error or empty?

Substring gives wrong text?

To troubleshoot Writeline your pdf output and get the text and perform substring manually and see.

dumua · June 8, 2017, 6:09pm

When you say LHS and RHS, does it mean left and right box in the assignement ?

If it that it doesn’t work.
When I Debug the automation (with or whitout) a new variable, a “exception type : ArgumentOutOfRangeException” is detecteted.

Maybe it’s my first index who doesn’t work ?

ddpadil · June 9, 2017, 4:58am

yes.

dumua · June 9, 2017, 1:38pm

So, what first index to write in order to find a figure, preceded by “Active total …” and followed by $ ? Knowing that the position and the structure around the figure is not the same in every PDF ?

Regards

dumua · June 12, 2017, 3:57pm

Hello, it’s me again

I try to extract the data with Extract Structured Data, and specifie the value that I want in the selector but it doesn’t work.

There is my workflow : Data extract.xaml (13.5 KB)

And there is some PDF where I want to extratc the value of “Actif total” for every years : ra2004_rapport_annuel_fr.pdf (1.2 MB)
ra2014_rapport_annuel_fr.pdf (2.4 MB)
ra1989_rapport_annuel_fr.pdf (3.5 MB)

Regards

harjyot123 · June 14, 2018, 1:08am

Does this gets checkboxes as well? i am unable to get the radio buttons information.

Grace_Simmons · February 2, 2022, 11:31pm

PDF to Excel

Topic		Replies	Views
How to extract table data from pdf RPA Discussions general	10	3845	April 23, 2022
PDF to Excel - Extract structured data Help excel , pdf , activities , studio	14	8663	November 28, 2018
Convert table from PDF to Excel Help	4	8785	October 29, 2018
How to extract data from PDF tables Help activities , studio	19	7631	December 12, 2018
Extract pdf table data into an Excel Help	32	10328	May 8, 2020

How to extract all the data from a PDF

Related topics