Extract number from pdf without elements to click on

Danyel_Ural · November 14, 2018, 4:03pm

Hi guys, I am currently trying to do the following program:

download pdf from outlook
save pdf in a folder
open it
get a price (which is different everytime)
save it in an excel sheet

Step 4 does make problems because I can not use elements (OCR etc shows me the whole pdf page as one element).

Another idea was to grab all of the pdf as a string but the pdf has 28 pages and the number does change every month so I dont know if this is doable.

I also tried to do it by double clicking the part but UIPath seems not to go at the same spot.

Maybe there is an easy way to do it?

Mr_JDavey · November 14, 2018, 4:12pm

Hi there @Danyel_Ural,
Just to clarify, the PDF documents are not digitised, i.e, they are scanned images?

In this case, you’d have to read with OCR, as you’ve noted, then parse the extracted text, either via Regex or Text Manipulation.

Given that there are 28 pages, is there a range of pages the price can appear in, for instance:

The first 3
The last 3
Pages 10-12

This will allow you to refine the extraction to only the relevant text, ultimately speeding up execution, as OCR can take considerable time to complete.

If you can provide some example (dummy) documents, I may be able to provide a rough example.

Danyel_Ural · November 15, 2018, 9:00am

Thank you for your answer.

The data is always on the same page at the same spot but a different number.

For example here I want the 1961 (Spot prices in the lower left corner).
I used “click activity” to show that he gets the whole page as one big element window. Thats my problem. I tried with “double click acitivity” and send hotkey ctrl+c and ctrl+v to insert it into an excel sheet but he copied the 2 from 2-Nov-18.

indra · November 15, 2018, 9:02am

@Danyel_Ural Is it possible to share sample pdf files

Danyel_Ural · November 15, 2018, 9:09am

Cennik_ALUMINIUM_ALUPROF_DE_v3.pdf (2.2 MB)

I cannot share the original pdf of the topic sorry but I can share this pdf, where it is the same situation

Here I have the same problem when I try to copy the “49” for example on page 9 in the upper left corner

Mr_JDavey · November 15, 2018, 9:11am

Hi there @Danyel_Ural,
Have you tried the “Read PDF Text” Activity, found below:

Failing that, use the “Read PDF With OCR” Activity.

Set the Range to be the relevant page.

Then, once you have the desired page’s text extracted, you can leverage Regex to retrieve the necessary date (via the “Matches” Activity).

Danyel_Ural · November 15, 2018, 9:34am

This is a nice idea.
I tried it out and now I have a text file and I see that the number I am looking for is always in row 21.

Now I could paste it into an excel sheet and delete all except row 21.
Is there a way to extract row 21 from a .txt file in an easier way?

I sadly dont know all the possible ways but I could imagine that there is a way to put the string into a datatable and put only a certain row from the datatable into an excel sheet?

Alex_Cross · November 15, 2018, 10:09am

yourTextString.Split(ControlChars.Cr)(20) might work to get the 20th row from the string of data?

Danyel_Ural · November 15, 2018, 10:29am

works perfect thank you

system · November 19, 2018, 2:43pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract number from pdf Help	7	4837	June 25, 2018
Looking up numbers in PDF, cross-referencing in excel to retrieve another number and renaming the file with that number in a different folder Studio studio , question	6	764	May 27, 2022
Extract specific text from pdf to excel Help	12	2828	June 11, 2019
Extract info from a changing pdf Help robot	13	1061	July 2, 2019
Extract order data from PDF Help pdf	1	1557	May 13, 2018

Extract number from pdf without elements to click on

Related topics