Recognition of the individual items on a PDF Document

Hello

When I work with the PDF in the studio, it does not recognize the individual items when I press “indicate on screen”.


I have made all the settings in the PDF “Change the reading options”.

Why does the program not recognize the individual numbers, but only the entire number block? Also I pressed F4 and with all variants it does not work.

Does Data Scraping work with PDF Documents? I get the following error message: “This control does not support data exctration. Please select a table cell”.

Thanks for the support!

1 Like

Hi @morseil ,

We have also observed that for some of the PDF documents we would not be able to perform the data Scraping.

However, Could you also let us know if the PDF is a Digital document or a Scanned one. If it is a Digital document then we could also move towards extracting the data using PDF Activities and with String/Regex manipulation.

Let us know if it is possible to move towards different approaches for your case. Additionally, samples of the document or of the data extracted from PDF would be of greater help to analyse better.

Hi @supermanPunch

Thanks for your reply! It is a digital document.

I need the following:

Give me for Client01 the net transfer of 343’376.75
Give me for Client02 the net transfer of 137’894.22
and so on.

I then need to save these amounts in an Excel file.

How would you proceed?

@morseil ,

Could you try with the below Steps :

  1. Use Read PDF Text Activity with PreserveFormat as True. You would get the output in the form of a String type, say stored in variable pdfText.

  2. We could now use Regex operations on this data to get the data you need. We will first recognise the pattern that is present in the data. The pattern that is observable is that each item in the Table is separated by more than 2 space atleast. Hence, we could use this pattern to capture these values separately like shown below :

image

(.*?)\s{2,}(.*?)\s{2,}(.*?)\s{2,}(.*)

You could try it out yourself with the data you get from reading PDF: regex101: build, test, and debug regex

  1. Next, we could use a Build Datatable Activity and prepare the columns needed :
    image

Also, assign the Output value to it, say OutputDT.

  1. Next, we use the Matches Activity to get the captured results from the regex pattern. Here, mc variable is created as the Output of Matches activity.

  2. Next, we use For EachActivity to iterate through the matching data and add it as a row to the OutputDT like below :
    image

Visuals from Debug :
image

The above approach is suggested based on the assumption of your data pattern, we haven’t yet properly analysed your data.

However, you could check with the approach mentioned and let us know if you’re not able to get the required output and also mention if there is any error received.

It worked!! Thanks a lot!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.