Detect the orientation of a pdf file

Hello everyone,

I am in a project which I need to read a series of documents scanned in pdf format, the question is that cases arise where the file can be in an erroneous orientation (either horizontal or inverted),
There is some way to determine its orientation and that the robot can read it correctly by OCR

Thank :slight_smile:

Hello @askPWC

Thank you for your inquiry.

Sure thing, just let me know how are you reading the pdf and if there is are words that can be identified.

I’ll be more than happy to help.


happy automation.

hi @beesheep,
I read the file with the activity Read PDF with OCR,
There are certain words such as name: - for example, that can be identied, but the file is not always the same (there are two types of files or the file may not be any of the 2 and I discard it) and they are scanned.
I have to verify if any of the two files and extract certain data such as name, company, etc.
if there is any other way to identify the file (1 or 2), determine the orientation and extract the data according to the file, it would be very helpful,

Thanks in advance :slight_smile:

Hi @askPWC, you can use the PDFSharp library to accomplish that. I used the below code to convert a .TIF to .PDF. You can easily modify to determine the page orientation.

Dim destinaton As String = in_PDF_Destination
Dim MyImage As System.Drawing.Image = system.Drawing.Image.FromFile(in_TIF_Location)
Dim doc As PdfDocument = New PdfDocument()

For PageIndex As Integer = 0 To MyImage.GetFrameCount(FrameDimension.Page) - 1
    MyImage.SelectActiveFrame(FrameDimension.Page, PageIndex)
    Dim img As pdfsharp.Drawing.XImage = pdfsharp.Drawing.XImage.FromGdiPlusImage(MyImage)
    Dim page As PdfPage = New PdfPage()

    If img.Width > img.Height Then
        page.Orientation = PageOrientation.Landscape
    Else
        page.Orientation = PageOrientation.Portrait
    End If

    doc.Pages.Add(page)
    Dim xgr As XGraphics = XGraphics.FromPdfPage(doc.Pages(PageIndex))
    xgr.DrawImage(img, 0, 0)
Next

doc.Save(destinaton)
doc.Close()
MyImage.Dispose()
1 Like

Hello @bradsterling

but the orientation has to be defined by the user right, so we are in the same situation, if not, can you please elaborate more.

regards

hi @bradsterling
What does this code do?
sorry I am something new with the subject of programming
Tranks :slight_smile:

Hi @beesheep
The file is downloaded directly from the company’s website, which is scanned by a third-party provider,
The robot must access the site, download the file and make the corresponding validations and extractions depending on the type of file, but when it comes from a third party and scanned it can come inverted or sideways, here is the inconvenience

Thanks :slight_smile:

The code determines the orientation without the user defining it, making it more dynamic. The below snippet determines the orientation.

If img.Width > img.Height Then
page.Orientation = PageOrientation.Landscape
Else
page.Orientation = PageOrientation.Portrait
End If

@askPWC The code will read a .TIF file (you can replace with .pdf) and get the number of pages in the file. Then, for each page, it will determine the page orientation. Once orientation is determined, it will get the .TIF page and page the image into a .PDF. This is helpful for you because you are dealing with scanned images, and not readable .pdf’s. After it saves each page into the destination variable, it ends.

Is it possible to upload examples of what you need done?

@bradsterling

Because it is a project of a company I do not have permission to publish the files, but it is simply a pdf file that is scanned, which can contain several pages.
The question is that there are cases such as for example it may come in a vertical format but it is inverted, that is, the words are pointing up. i just need detect if is inverted, horizontal or is just correct.

Thanks :slight_smile:

Hi Bradsterling,
I came across your code and tried using it but am getting some errors like PDFDocument not defined. Can you please let me know what libraries do I need to import?

Thanks