Reading multiple pdf files and detecting landscape or portrait

sashatheitguy · October 12, 2020, 9:25pm

I am trying to read different multiple (hundreds of) pdf files.
But there is a serious problem. Some files might be portrait, some files might be landscape mode. I must take the all data at each pdf file.
I think the intelligent OCR is not a solution. How can I detect if pdf portrait or landscape? or Should I do this?
But If I tried to read a landscape pdf with omni ocr the characters might be corrupted.

Can anybody help?

RPAForEveryone · October 13, 2020, 2:20am

Hi @sashatheitguy

There have been similar posts on the forum earlier, asking the same question. Here is a comment by the Lead Developer of the Document Understanding module:

The Document Model should give you the information whether it’s a landscape or portrait.

sashatheitguy · October 13, 2020, 7:53am

Thank you. I could not check the other questions entirely.
Sorry for duplicate question. I will make a search for it. But I have a found something about ABBY. I will also search for this.

RPAForEveryone · October 13, 2020, 1:00pm

Great. ABBYY is quite advanced and very flexible in configuring these things (Hence the name FlexiCapture )
The big plus for IntelligentOCR is that you can configure and use it for free.

Wish you luck with your project.
Cheers.

sashatheitguy · October 16, 2020, 8:07pm

Thank you for your answer.
Yes, I have tried Intelligent OCR like omni. But I had a problem because of landscape or portrait types in one pdf file.

I have another question here. I have contacted the ABBY team. They can provice me a trial version. But They are asking two types of OCR Packages.

Finereader and Flexicapture.

Which one should I choose?
In my case, there are pdf files, which has portrait and landscape pages. But, I need to extract all the text data to the plain text without any character error. (Like saving .txt file automatically. Because I will have hundreds of documents)

I read something about a connector but Im not sure about it. How can I integrate ABBY in to UIPath? (Can I integrate finereader? or Can I use only just flexicapture at uipath?)

Sorry for asking too much questions. Im new here.

Stay safe!

RPAForEveryone · October 17, 2020, 11:59am

Hi @sashatheitguy
A brief intro (and a non-technical differentiation) on FlexiCapture and FineReader
ABBYY FlexiCapture - Smart Data Capture from structured or semi-structured documents like invoices, purchase orders or even application forms (banks, credit cards, you name it)
ABBYY FineReader - High Volume Document Conversion.

While FineReader focuses on just converting hand-printed/scanned documents into editable text for word or PDF files, FlexiCapture focuses on highly customisable and trainable document templates to extract structured data out of various documents.

So, if you deal with a lot of long, free-flowing text documents, FIneReader is the way to go.
FlexiCapture works for a scenario like - if you are dealing with documents where you need specific information, not the entire text transcribed for you (For example, find the label ‘Invoice Number’ and getting the data from the right or bottom of that label, or extracting line items of invoices whether or not they are separated by clearly marked lines, etc.)

I haven’t researched if UiPath integrates with FineReader. UiPath FlexiCapture activities are available and work alright.

sashatheitguy · October 17, 2020, 12:54pm

Thank you for your reply.

M_L1 · December 17, 2023, 6:54am

That UI solution might be great but how do we do so with adobe professional. Thank you.

Topic		Replies	Views
Handling landscape view in a pdf and extracting data from a pdf Help	9	5415	March 4, 2021
Excellent PDF Digitization with Intelligent OCR Engines (Portrait and Landscape) Help activities	2	1518	March 30, 2021
How to read a multipage scanned PDF file with multiple orientations Activities pdf , question	1	41	October 9, 2024
PDF Text Extraction for landscape pages Activities uiautomation	4	34	June 24, 2025
Auto rotate pdf for OCR Help ocr , studio	5	5132	May 21, 2019

Reading multiple pdf files and detecting landscape or portrait

Related topics