Extract variable data from .PDF files

Hi,

I have multiple .pdf documents with the elements in the same location. How can I extract the data without using anchor base. I have tried to use Get Full Text via scrapping but unsuccessful as the 'name = ??? is always different (variable).

Would be grateful for any help.

Thanks!

1 Like

Hi Carwyn,

Welcome to the Forum!!

You can add UiPath PDF activities by adding UiPath.PDF.activities in Manage Package.
Then use Read PDF Text activity for extracting the entire pdf content, then on the result you can use Matches activity to extract Name using a Regular Expression.

Or if you want to extract only one single field from the PDF use GetTextActivity .

1 Like

Thanks Kaybot,

I was thinking more of the coding in the selector.

I want the name to be variable for diffrent .pdf documents with AND, ALB, TXP etc… I have tried using name=‘*’ but it looses the element location and does not pick up the text from each document as the name is always the element name.

I Would be appreciate any help in this urgent matter!

Thanks,

Carwyn

Sorry for delayed response
I am not clear on the requirement. Can you give me a sample pdf and what exactly to extract I will try to develop a workflow for you.

@Carwynf -

  1. Convert PDF to Text using Read Text activity and Preserve Format should be set to True.
  2. Using Regular Expression , ‘Matches’ activity we can extract the name from the docs…

If one is successful, then we can loop the same for all the files in the directory.

Is it possible to share a sample? If there is any PII data, then you can replace it with dummy data(After converting it to a text) and share it.