Extract number from pdf without elements to click on

Hi guys, I am currently trying to do the following program:

  1. download pdf from outlook
  2. save pdf in a folder
  3. open it
  4. get a price (which is different everytime)
  5. save it in an excel sheet

Step 4 does make problems because I can not use elements (OCR etc shows me the whole pdf page as one element).

Another idea was to grab all of the pdf as a string but the pdf has 28 pages and the number does change every month so I dont know if this is doable.

I also tried to do it by double clicking the part but UIPath seems not to go at the same spot.

Maybe there is an easy way to do it?

Hi there @Danyel_Ural,
Just to clarify, the PDF documents are not digitised, i.e, they are scanned images?

In this case, you’d have to read with OCR, as you’ve noted, then parse the extracted text, either via Regex or Text Manipulation.

Given that there are 28 pages, is there a range of pages the price can appear in, for instance:

  • The first 3
  • The last 3
  • Pages 10-12

This will allow you to refine the extraction to only the relevant text, ultimately speeding up execution, as OCR can take considerable time to complete.

If you can provide some example (dummy) documents, I may be able to provide a rough example.

Thank you for your answer.

The data is always on the same page at the same spot but a different number.

For example here I want the 1961 (Spot prices in the lower left corner).
I used “click activity” to show that he gets the whole page as one big element window. Thats my problem. I tried with “double click acitivity” and send hotkey ctrl+c and ctrl+v to insert it into an excel sheet but he copied the 2 from 2-Nov-18.

@Danyel_Ural Is it possible to share sample pdf files

Cennik_ALUMINIUM_ALUPROF_DE_v3.pdf (2.2 MB)

I cannot share the original pdf of the topic sorry but I can share this pdf, where it is the same situation :grin:

Here I have the same problem when I try to copy the “49” for example on page 9 in the upper left corner

Hi there @Danyel_Ural,
Have you tried the “Read PDF Text” Activity, found below:

Failing that, use the “Read PDF With OCR” Activity.

Set the Range to be the relevant page.

Then, once you have the desired page’s text extracted, you can leverage Regex to retrieve the necessary date (via the “Matches” Activity).

1 Like

This is a nice idea.
I tried it out and now I have a text file and I see that the number I am looking for is always in row 21.

Now I could paste it into an excel sheet and delete all except row 21.
Is there a way to extract row 21 from a .txt file in an easier way?

I sadly dont know all the possible ways but I could imagine that there is a way to put the string into a datatable and put only a certain row from the datatable into an excel sheet?

yourTextString.Split(ControlChars.Cr)(20) might work to get the 20th row from the string of data?

1 Like

works perfect thank you

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.