PDF help

Hi all

I’m having a bit of an issue with PDFs and was hoping somebody could help.

In my process, I need to obtain bits of information from a PDF file, e.g. if there are any “No” in the “Safety Device Correct” columns. This is what the table looks like that has the information

When I use a Get Text and highlight a certain area I want, text further down the document appears, such as below

If I try the same highlighting method with Get Full Text, the whole document comes up. I’ve also tried with Get Visible Text and Get OCR Text and only parts of the document comes up.

I could use Regex, however the text in the table will constantly change so there’s no permanent anchor, e.g. the first location on one document could be “Kitchen”, on another it could be “Downstairs hall cupboard” so I can’t look for the nth word as the number of words will constantly change. The answers will also be either N/A, Yes, No, Pass, Fail so again I can’t tell it which word to look after.

Does anyone have any solutions please? Or are there any packages I could download to help?

Thanks :slight_smile:

Hi

Fine in that case you can try converting the pdf to excel first and read that excel and save as datatable
With that datatable I hope you will be easily able to manipulate the data

For pdf to excel conversion

Cheers @Short

Hello @Short,

have you tried to use “Extract Table Data” activity? I’ve just tested it with a simple table in PDF and it worked. :slight_smile:

Best regards,

Artur

Hi @artur.stepniak

Tried this and unfortunately got this error :frowning:

image

Hello,

can you share the PDF with me? Or is it security violation? :slight_smile:

Best,

Artur

Hi @artur.stepniak

I would if I could, it would be a security violation :frowning:

Thanks