How to extract and validate data from PDF files

How to get some portion of data from pdf that is in text format?

1 Like

Hi @NikhilRPA,
You can use get PDF text activity and perform some string manipulation on the output variable to extract the required data.

1 Like

So, with out text or data manipulation, is there any other way to get specific data which is in again text format? is there any chance of using any activities like get text, copy-text activities etc…?

Try to use Abbyy OCR,
Check out

@NikhilRPA

Do you want to read particular text ?

If yes then use Read PDF text Activity to read the file and will give you output as String. And then you can use string manipulation functions or REGEX to get particular text from it.

3 Likes

Hello @NikhilRPA

I Think you can use Read PDF text activity just like lakshman mentioned to get your task done.

However, this is good if the structure of the PDF files are always the same or with very few variations. If the structure of the PDF files are always different and the location of the value that you want to extract always change, then you have to consider something more complex and which can handle such scenarios efficiently…

In such cases, I would suggest to go for Intelligent OCR activities which will help you achieve the objective. There are many free OCR activities as well. If the accuracy is still a problem and if you need to consider to have different templates, then go for either Abbyy or Rossum Elis that use AI capabilities as well… However, these two will incur additional costs

3 Likes

When I am trying to use Abbyy OCR. It shows me the error “Abbyy was not installed”
Can I know how to install it

1 Like

Yeah, I can use string manipulating after I captured the text with Read PDF text activity. But I am looking for other option of getting a particular text like “Address:xxxxxx” from pdf file

1 Like

@NikhilRPA

Do you have license or a trial version of Abbyy with you?

Its a trial community edition…

I have another doubt, when I am trying to indicate pdf by using find element in Anchor base. It is indicating the whole pdf file not particular element like “Date” field. Do I need to add any extension in pdf to indicate particular element

@NikhilRPA

He @Lahiru.Fernando is not asking about Uipath studio.

If you want to work with Abbyy OCR then first we need to install it our machine and we need seperate licence for that and i guess they will 1 month free trail licence and for this you have raise request in their portal and then will send installation files and guide to how to install it.

1 Like

Cool @lakshman thanks… I will raise a request for it

1 Like

You might also want to have a look at the OmniPage OCR activity - it does not require installation or licensing and has decent results… Might cover your use case!

Ioana

When I search with the name OmniPage in Activities pannel. I got no results found

You need to install the UiPath.OmniPage.Activities package from the Official feed first.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.