Hi everyone,
I’m working on a PoC where I need to extract specific text (like names or codes) from PDFs containing technical drawings.
The challenge is that the text appears in random positions and varies in format, so standard activities like Read PDF Text aren’t working well.
Has anyone faced a similar case? Any tools, packages, or approaches you’d recommend?
Thanks in advance!
arivu96
(Arivazhagan A)
March 31, 2025, 2:44am
2
Hi @Gonza_Espindola ,
Use OCR (If Text is in an Image Format)
If the text is part of an image (like scanned PDFs or technical drawings), you need to use OCR:
Use “Read PDF with OCR”.
Choose an OCR engine such as:
Tesseract OCR (default, but lower accuracy)
Microsoft OCR (better for structured text)
Google Cloud OCR / ABBYY OCR (best for complex layouts)
The UiPath Documentation Portal - the home of all our valuable information. Find here everything you need to guide you in your automation journey in the UiPath ecosystem, from complex installation guides to quick tutorials, to practical business...
Regards,
Arivu
Gonza_Espindola:
I’m working on a PoC where I need to extract specific text (like names or codes) from PDFs containing technical drawings.
The challenge is that the text appears in random positions and varies in format, so standard activities like Read PDF Text aren’t working well.
Has anyone faced a similar case? Any tools, packages, or approaches you’d recommend?
Hi @Gonza_Espindola
First, use the “Read PDF with OCR” activity in UiPath and select Tesseract OCR to extract text from the PDF. Next, store the extracted text in a variable. Then, use the “Matches” activity with Regex to find specific names or codes. Finally, process the extracted data as needed.
If the OCR accuracy is low, consider converting the PDF to an image before applying OCR extraction.
If you found helpful mark as a solution.
Happy Automation
lrtetala
(Lakshman Reddy)
March 31, 2025, 5:04am
4
Hi @Gonza_Espindola
Can you try with Extract Document Data activity
Regards,
@Gonza_Espindola
Use Read PDF with OCR, it can extract the data from PDF.
After that use regex to extract required data from output of the Read PDF with OCR activity.
If you have any doubts to write the regex send your sample input data, will provide regex.
Happy Automation!!