How to extract text from technical drawings (PDF)?

Hi everyone,

I’m working on a PoC where I need to extract specific text (like names or codes) from PDFs containing technical drawings.

The challenge is that the text appears in random positions and varies in format, so standard activities like Read PDF Text aren’t working well.

Has anyone faced a similar case? Any tools, packages, or approaches you’d recommend?

Thanks in advance!

Hi @Gonza_Espindola,
Use OCR (If Text is in an Image Format)

If the text is part of an image (like scanned PDFs or technical drawings), you need to use OCR:

Use “Read PDF with OCR”.

Choose an OCR engine such as:

Tesseract OCR (default, but lower accuracy)

Microsoft OCR (better for structured text)

Google Cloud OCR / ABBYY OCR (best for complex layouts)

Regards,
Arivu

Hi @Gonza_Espindola

First, use the “Read PDF with OCR” activity in UiPath and select Tesseract OCR to extract text from the PDF. Next, store the extracted text in a variable. Then, use the “Matches” activity with Regex to find specific names or codes. Finally, process the extracted data as needed.

If the OCR accuracy is low, consider converting the PDF to an image before applying OCR extraction.

If you found helpful mark as a solution.
Happy Automation

Hi @Gonza_Espindola

Can you try with Extract Document Data activity

Regards,

@Gonza_Espindola

Use Read PDF with OCR, it can extract the data from PDF.
After that use regex to extract required data from output of the Read PDF with OCR activity.

If you have any doubts to write the regex send your sample input data, will provide regex.

Happy Automation!!