Text aus Bild

Klang · April 30, 2026, 1:48pm

Hallo zusammen,

ich habe eine Liste von PDF-Dateien, aus denen ich eine bestimmte Information extrahieren möchte. Dabei handelt es sich um eine Nummer, die irgendwo auf der Seite steht und mit „4423…“ beginnt.

Da ich die PDFs nicht direkt als PDF-Dateien bearbeiten oder auslesen kann, bin ich aktuell wie folgt vorgegangen:

*For Each File in Folder
*Öffnen der PDF im Word-Format (Word Scope)
*Erstellung eines Screenshots
*OCR mit UiPath Screen OCR

Hierbei stoße ich jedoch auf ein Problem:
Ich kann mit OCR nur auf klar definierte UI-Elemente zugreifen, da der Selektor eine eindeutige Zuordnung zu einem bestimmten Dokument verlangt. Das funktioniert in meinem Fall nicht, beim Aufrufen des 2ten Dokuments findet er die Zuordnung nicht mehr.

Falls es hierfür eine bessere oder sinnvollere Lösung gibt, wäre ich für jede Hilfe sehr dankbar.

Hello everyone,

I have a list of PDF files from which I would like to extract a specific piece of information. It is a number that appears somewhere on the page and always starts with “4423…”.

Since I am not able to process or extract data directly from PDF files, my current approach is as follows:

For Each File in Folder
Open the PDF in Word scope
Take a screenshot
Use OCR with UiPath Screen OCR

However, I am facing an issue:
With OCR, I can only interact with clearly defined UI elements, as the selector requires a specific reference to a particular document. This does not work in my case—when opening the second document, the selector can no longer find a match.

If there is a better or more efficient solution, I would greatly appreciate any suggestions.

JarrydScott · April 30, 2026, 1:55pm

Hello @Klang

May I ask why you can’t extract text from the PDF files directly? This way you could just use a “Read PDF” activity and then use Regular Expressions to extract the number you’re looking for.

Alternatively, I would suggest using AI. Use Generative AI activities. You can take a screenshot of the PDF file and then pass it into the GenAI activity. You need to define a prompt of what you’re looking for. But it will easily extract the data you need.

Let me know if this helps

shrikrushna.bhoi · April 30, 2026, 4:27pm

hey @Klang
Best Solution: Use UiPath PDF Text Extraction
For Each file in folder → Extract Text activity (from PDF plugin) → Use Matches activity with regex: “4423\d+” → Get your number you know why
No OCR needed
No selector issues
Works on all PDFs automatically
Fast & reliable

Please let me know if it works

Klang · May 4, 2026, 9:51am

The problem is that some parts of the PDF are not recognized as text, and therefore the PDF text extraction does not always work.
Is there also a solution for this?

shrikrushna.bhoi · May 5, 2026, 3:34pm

Yes @Klang There Are Multiple Solutions
Problem Analysis
Your PDFs contain scanned content or images mixed with text The text extraction only works on actual text layers, not on embedded images or scanned pages.
Solution Use OCR on PDFs (BEST for Mixed Content)
For Each File in Folder
Use “Read PDF with OCR” Activity (instead of Extract Text)
This converts scanned areas to readable text
Apply Regex: “4423\d+”
Get your number
Why this works OCR reads both text AND scanned images Package needed UiPath.PDF.Activities (with OCR enabled)

Klang · May 7, 2026, 9:33am

Thank you
everything works Fine now (Y)

shrikrushna.bhoi · May 7, 2026, 11:19am

Hey @Klang
Could you please make it as solution

Topic		Replies	Views
Extract number from pdf Help	7	4972	June 25, 2018
Pdf data extraction for specific element Help pdf , activities , question	6	1852	April 17, 2021
Extract number from pdf without elements to click on Help	8	1617	November 15, 2018
Extract specific data from Scanned pdf Academy Feedback studio	2	2904	October 4, 2019
Unable to extract specific elements & Selector doesn't show the elements I need Help activities , studio	7	3529	June 1, 2019

Text aus Bild

Related topics