What are the steps to be taken to have better extraction results when dealing with different orientations of documents?
The rotate option does not always work, so the recommend approach is to feed document with a correct orientation (horizontal).
Recognition quality depends on the quality of the hardcopy document and on scanning settings. Low-quality images result in lower recognition accuracy, so it is important to take the characteristics of the source document into account and specify the appropriate scanning settings.
- When scanning, align the edges of the page with the edges of the scanning surface as closely as possible. If text on the resulting image is skewed, it may be recognized incorrectly.
- A document from a printer should be scanned in grayscale mode with a resolution of 300 dpi. Documents with small fonts should be scanned with a resolution of 600 dpi.
- Make sure to specify the correct brightness. If the brightness setting is too high, characters will be too bright, thin, and disjointed. If it is too low, characters will be too thick and may blend together.
Recognition quality is affected by the resolution of the source image. Low-resolution images may produce poorly recognized texts.
The recommended resolution setting is 300 dpi.
Important! The vertical and horizontal resolutions of the image must be the same.
Some more information with regards to scanning can be also found at The ABBY page .
Also the fcdot file doesn't keep all the proprieties from Administrator Station, and that is why some differences might be seen between what is extracted from Admin Station and what is extracted using UiPath. This is a known limitation.
To take out of the equation the rotation issue, use the below custom code which can be invoked using "Invoke code" activity. Need to make sure that the iText 7 Community package is installed and import the needed namespaces.
string ORIG = @"C:\Users\xxxx\Downloads\Abby Data\Abby Data\Not Working.pdf"; //your source folder for input
string OUTPUT_FOLDER = @"C:\Users\xxxx\Downloads\Abby Data\Abby Data\"; //output folder
int ROTATION_DEGREES = 90; //you can pass your rotation like example 90 or 180 etc.
PdfDocument pdfDocument = new PdfDocument(new PdfReader(ORIG), new PdfWriter(OUTPUT_FOLDER + "Rotated.pdf")); //this is hardcoded name you can pass dynamic name
for (int p = 1; p <= pdfDocument.GetNumberOfPages(); p++)
PdfPage page = pdfDocument.GetPage(p);
int rotate = page.GetRotation();
if (rotate != 0)
page.SetRotation((rotate + ROTATION_DEGREES) % 360);