OCR for extracting arabic text from the scanned pdf documents

Hello UiPath community Fam!

iam searching for the ocr which matches with arabic scanned documents to extract the arabic words using our uipathh functionality

@Ranganathan_M

Tesseract is an open-source OCR engine that can be used with UiPath. It supports Arabic language, and you can integrate it using custom activities or scripts in UiPath.

UiPath has its own OCR engines, such as “Google OCR” and “Microsoft OCR,” which support various languages, including Arabic. You can use these OCR engines in UiPath activities like “Read PDF with OCR”

Leave the profile Empty

@rlgandu
Can you share sample code for the same

it will be more helpful for us

@Ranganathan_M

If your pdf data is not extracting the data properly in"Scale" you assign it to 1 change the scale upto to you get the data accurately.

HI,

You might be also able to use OmnipageOCR. Can you check the following document.

https://docs.uipath.com/activities/other/latest/document-understanding/omnipage-ocr

And if you want to use Tesseract OCR for non-English language, the following will help you.

https://docs.uipath.com/studio/standalone/2023.4/user-guide/installing-an-ocr-engine-and-changing-the-language#tesseract-ocr

Regards,

Hi there - you can try Sanad.ai (also available on UiPath marketplace: Sanad AI OCR and DU - RPA Component | UiPath Marketplace | Overview) as it works pretty well for us.