Read PDF in background?

Terry_Marr · November 2, 2024, 8:33pm

I’m studying for Assoc. cert using a practice test. Here’s the question that confuses me:

According to best practices, which activity should be used to retrieve individual pieces of data from a digital PDF in the background using UI Automation?

Get OCR Text
Read PDF with OCR
Get Text
Get Text Exists

According to the creator of the tutorial, the answer is Get Text. But I thought the 2 activities to read a PDF file are: Read PDF Text and Read PDF Text with OCR.

Why would Get Text be better than the others?

dokumentor · November 2, 2024, 9:02pm

Official trainings present both as valid options but most Solution Architects will decide on background activities (Read PDF Test or Read PDF Text with OCR)

Hope it helps

Yoichi · November 2, 2024, 11:37pm

Hi,

Because the question says “using UI Automation”.

ReadPDFwithOCR and ReadPDFText are included in not UiAutoamtion but PDF activities package.

Note: In this case, Background means not background process but the following background automation.

Regards,

Erimateia · November 3, 2024, 5:37pm

Hi @Terry_Marr

To read a PDF in the background and extract specific data, “Get Text” is recommended when using UI Automation for PDFs that aren’t directly accessible with “Read PDF Text” or “Read PDF Text with OCR”.

Explanation of activities:

Get OCR Text: extracts text using OCR (Optical Character Recognition), ideal for scanned PDFs, but not recommended for background execution.
Read PDF with OCR: also uses OCR and requires Adobe Acrobat installation. It’s not ideal for background use due to OCR dependence and slower performance.
Get Text: allows extracting specific data and works well in the background when the PDF contains digital text, enabling focus on specific UI elements.
Get Text Exists: only checks if text is present without extracting content.

Summary: “Get Text” is suggested as it’s faster and suitable for background use when the PDF allows UI-based interaction.

Terry_Marr · November 3, 2024, 5:53pm

Thank you! Your detailed explanation is very helpful. So, just so I am clear, to use Get Text Activity, you first set the browser to the pdf path, whether local or over the internet, correct? Then, Get Text will “read” just the text in the file, not the scanned image-text correct? Since the page is pdf and not a website, I am curious how you would set a selector on a pdf page, say for a specific paragraph, because it’s not structured like html. Am I missing something?

Topic		Replies	Views
Which activity should be used to retrieve individual pieces of data from a digital PDF using UI Automation? Activities pdf	3	327	October 25, 2024
How to read and reteieve data from PDF Help pdf , activities , question	3	1054	January 26, 2020
Difference "Read PDF Text" and "Read PDF With OCR" Help	5	4908	October 29, 2018
How can i read a pdf Help activities	3	855	July 19, 2019
How to get a particular text in a PDF file using "Read PDF With OCR" Activity? Activities pdf , ocr , activities , question	8	1853	June 19, 2023

Read PDF in background?

Related topics