Read PDF in background?

I’m studying for Assoc. cert using a practice test. Here’s the question that confuses me:

According to best practices, which activity should be used to retrieve individual pieces of data from a digital PDF in the background using UI Automation?

  • Get OCR Text
  • Read PDF with OCR
  • Get Text
  • Get Text Exists

According to the creator of the tutorial, the answer is Get Text. But I thought the 2 activities to read a PDF file are: Read PDF Text and Read PDF Text with OCR.

Why would Get Text be better than the others?

Official trainings present both as valid options but most Solution Architects will decide on background activities (Read PDF Test or Read PDF Text with OCR)

Hope it helps

Hi,

Because the question says “using UI Automation”.

ReadPDFwithOCR and ReadPDFText are included in not UiAutoamtion but PDF activities package.

Note: In this case, Background means not background process but the following background automation.

Regards,

Hi @Terry_Marr

To read a PDF in the background and extract specific data, “Get Text” is recommended when using UI Automation for PDFs that aren’t directly accessible with “Read PDF Text” or “Read PDF Text with OCR”.

Explanation of activities:

  • Get OCR Text: extracts text using OCR (Optical Character Recognition), ideal for scanned PDFs, but not recommended for background execution.
  • Read PDF with OCR: also uses OCR and requires Adobe Acrobat installation. It’s not ideal for background use due to OCR dependence and slower performance.
  • Get Text: allows extracting specific data and works well in the background when the PDF contains digital text, enabling focus on specific UI elements.
  • Get Text Exists: only checks if text is present without extracting content.

Summary: “Get Text” is suggested as it’s faster and suitable for background use when the PDF allows UI-based interaction.

Thank you! Your detailed explanation is very helpful. So, just so I am clear, to use Get Text Activity, you first set the browser to the pdf path, whether local or over the internet, correct? Then, Get Text will “read” just the text in the file, not the scanned image-text correct? Since the page is pdf and not a website, I am curious how you would set a selector on a pdf page, say for a specific paragraph, because it’s not structured like html. Am I missing something?