I’m new in RPA so sorry if the question sound ridiculous.
My problem is that I want to extract all the data and their signification from a very long PDF (almost 200 pages) into a data table. And I only know how to extract them manually with data scrapping.
I want to extract all the data table (and no paragraphs) present in the PDF file into one multi rows and columns in Excel sheet.
But the problems is that some of the data tables can’t be ridden by the Data Scraping tool.
I want to extract all the data from numerous pdf file, and then reorganize them in one unique DataBase.
My loop work pretty well, but I don’t understand why I can’t extract the data.
I’m using, Inside my loop, read pdf text → extract structured data → excel application scope → write range.
There is my workflow if you want to see : Demo.xaml (9.5 KB)
Thanks for the help, but i already look at this and didn’t find the answer.
The problem is that, when i enter PDF in the Metadata of Extract Structured Data they said : “Unable to cast object of type ‘Newtonsoft.Json.JValue’ to type ‘System.String’.”
I try something else for my problem : I use Read PDF, then i use Assigne with Substring, in order to extract the data that I wan’t from the string of the Read PDF.
But it doesn’t work again.
Can someone look at my workflow ?
Bcz your assignment activity doesn’t have new LHS
Your assigning Read PDF output variable ExtractDT in LHS and in RHS your passing substring correctly.
Please create new variable of type string and pass it in LHS and give a try.
When you say LHS and RHS, does it mean left and right box in the assignement ?
If it that it doesn’t work.
When I Debug the automation (with or whitout) a new variable, a “exception type : ArgumentOutOfRangeException” is detecteted.
So, what first index to write in order to find a figure, preceded by “Active total …” and followed by $ ? Knowing that the position and the structure around the figure is not the same in every PDF ?