Detect the orientation of a pdf file

askPWC · October 1, 2019, 2:33pm

Hello everyone,

I am in a project which I need to read a series of documents scanned in pdf format, the question is that cases arise where the file can be in an erroneous orientation (either horizontal or inverted),
There is some way to determine its orientation and that the robot can read it correctly by OCR

Thank

beesheep · October 1, 2019, 4:13pm

Hello @askPWC

Thank you for your inquiry.

Sure thing, just let me know how are you reading the pdf and if there is are words that can be identified.

I’ll be more than happy to help.

happy automation.

askPWC · October 1, 2019, 4:27pm

hi @beesheep,
I read the file with the activity Read PDF with OCR,
There are certain words such as name: - for example, that can be identied, but the file is not always the same (there are two types of files or the file may not be any of the 2 and I discard it) and they are scanned.
I have to verify if any of the two files and extract certain data such as name, company, etc.
if there is any other way to identify the file (1 or 2), determine the orientation and extract the data according to the file, it would be very helpful,

Thanks in advance

bradsterling · October 1, 2019, 4:27pm

Hi @askPWC, you can use the PDFSharp library to accomplish that. I used the below code to convert a .TIF to .PDF. You can easily modify to determine the page orientation.

Dim destinaton As String = in_PDF_Destination
Dim MyImage As System.Drawing.Image = system.Drawing.Image.FromFile(in_TIF_Location)
Dim doc As PdfDocument = New PdfDocument()

For PageIndex As Integer = 0 To MyImage.GetFrameCount(FrameDimension.Page) - 1
    MyImage.SelectActiveFrame(FrameDimension.Page, PageIndex)
    Dim img As pdfsharp.Drawing.XImage = pdfsharp.Drawing.XImage.FromGdiPlusImage(MyImage)
    Dim page As PdfPage = New PdfPage()

    If img.Width > img.Height Then
        page.Orientation = PageOrientation.Landscape
    Else
        page.Orientation = PageOrientation.Portrait
    End If

    doc.Pages.Add(page)
    Dim xgr As XGraphics = XGraphics.FromPdfPage(doc.Pages(PageIndex))
    xgr.DrawImage(img, 0, 0)
Next

doc.Save(destinaton)
doc.Close()
MyImage.Dispose()

beesheep · October 1, 2019, 4:36pm

Hello @bradsterling

but the orientation has to be defined by the user right, so we are in the same situation, if not, can you please elaborate more.

regards

askPWC · October 1, 2019, 4:41pm

hi @bradsterling
What does this code do?
sorry I am something new with the subject of programming
Tranks

askPWC · October 1, 2019, 4:45pm

Hi @beesheep
The file is downloaded directly from the company’s website, which is scanned by a third-party provider,
The robot must access the site, download the file and make the corresponding validations and extractions depending on the type of file, but when it comes from a third party and scanned it can come inverted or sideways, here is the inconvenience

Thanks

bradsterling · October 1, 2019, 5:34pm

The code determines the orientation without the user defining it, making it more dynamic. The below snippet determines the orientation.

If img.Width > img.Height Then
page.Orientation = PageOrientation.Landscape
Else
page.Orientation = PageOrientation.Portrait
End If

bradsterling · October 1, 2019, 5:38pm

@askPWC The code will read a .TIF file (you can replace with .pdf) and get the number of pages in the file. Then, for each page, it will determine the page orientation. Once orientation is determined, it will get the .TIF page and page the image into a .PDF. This is helpful for you because you are dealing with scanned images, and not readable .pdf’s. After it saves each page into the destination variable, it ends.

Is it possible to upload examples of what you need done?

askPWC · October 2, 2019, 4:18pm

@bradsterling

Because it is a project of a company I do not have permission to publish the files, but it is simply a pdf file that is scanned, which can contain several pages.
The question is that there are cases such as for example it may come in a vertical format but it is inverted, that is, the words are pointing up. i just need detect if is inverted, horizontal or is just correct.

Thanks

Rudrava_Som · February 8, 2020, 1:33am

Hi Bradsterling,
I came across your code and tried using it but am getting some errors like PDFDocument not defined. Can you please let me know what libraries do I need to import?

Thanks

Topic		Replies	Views
Detect the PDF file orientation Studio studio , question , project_panel	1	728	April 26, 2022
Detect orientation of PDF with iTextSharp package and a code snippet Tutorials pdf	0	1525	April 27, 2022
Handling landscape view in a pdf and extracting data from a pdf Help	9	4903	March 4, 2021
Rotate pdf file Help	3	2916	September 20, 2017
Reading multiple pdf files and detecting landscape or portrait Random and other categories pdf , ocr , activities	7	2479	December 17, 2023

Most Active Users - Yesterday
Anil_G
ashokkarale
jinal.shah
Gautham_Pattabiraman
postwick
chandreshsinh.jadeja
vrdabberu
Ajay_Mishra
sven.wullum1
Vyshnavi_Nalumachu
More details...

Detect the orientation of a pdf file

Related Topics