Handling landscape view in a pdf and extracting data from a pdf

lissynikkytha · August 9, 2017, 6:23am

Hi,

It would be great if you could help me out for the below scenarios.

Is there an intelligence to check if a pdf page is in portrait view or landscape view. In a pdf, some pages are in portrait view and some in landscape view. I need to read the text in that pdf using OCR. Any suggestions?
I tried to extract text from a structured pdf document. I need the text from all the pages - Tabular and non-tablur formatted text. Below are the options i tried but it doesnt help. Let me know if we can achieve this by any other ways.
2a) “Read pdf with OCR” (With choosing inverted option and without choosing inverted options were tried) - Returns empty result.
2b) Read Pdf text - output is empty
2c) Scraping helps. But how do we know the number of pages and how to extract text from all the pages?
I am trying to extract text from a pdf and trying to move it to another folder. But it says “The process cannot access the file because it is being used by other process”. How do we resolve it? The document is not open anywhere else.

Disha_Jain · February 21, 2018, 12:22am

Hi ,

I am looking for the answers on this as well, were you able to figure out something?
thanks

lissynikkytha · March 2, 2018, 1:01pm

Below are the solutions that i used

UiPath does not have the intelligence to check if a page is in portrait or landscape mode. However, some of the OCR auto rotate the pages to extract the data
Read PDF with OCR works finally
I copied the file to destination folder and then deleted the file after processing

arvind8pandey · April 16, 2019, 2:37pm

Can you please share the OCR which supports auto rotation.

Priya_Pandey · August 5, 2019, 1:19pm

Hello
can you please explain about OCR you used to extract data from landscape view.
I am also stuck in the same situation can you please help me out @lissynikkytha
I too used read pdf with ocr its giving correct result for all the pages except for rotated one.

lissynikkytha · August 6, 2019, 5:52am

Try with Abbyy OCR

irahmat · August 8, 2019, 4:39am

I assume u extract pdf by OCR page per page. I suggest u to add more logic, this logic will rotate automatically until extracted data is readable.

hope is work.

Ioana_Gligan · September 18, 2019, 7:37am

Hello @lissynikkytha, @Disha_Jain, @arvind8pandey, @Priya_Pandey, and @irahmat,

to get page rotation and skew angle, please use the Digitize Document activity from the IntelligentOCR 3 activity package. It exposes this information on a page by page basis in the DocumentObjectModel output. Please feel free to navigate through the output (you can do it using the newest debug features in Studio directly) to see where to grab that information from.
Data extraction - I recommend building your own custom activity for data extraction or trying to use the newly released Regex Based Extractor - this applies whatever regex expressions you configure for certain fields, to the Text version of the document fed into the Data Extraction Scope.

To get you started, you might want to check out this: How to use the IntelligentOCR Package

sashatheitguy · October 17, 2020, 1:25pm

Your answer is very helpful for me. Thank you.
I found something about abbyy finereader or flexicapture. (I did not understand which product will be great for me yet.)
I have hundreds of pdf files which may be portrait or landscape. (Some pdf’s may be first page portrait other pages landscape.) So, If you know something about abbyy, which product can be implement on UIPath successfully?

I found a connector plugin for connecting abby and uipath. But I think it works just abbyy flexicapture not with finereader. In this point, I need to read all the pdf file and take all the text data to dom. So flexicapture works with fields. But the finereader works with entire pdf. Which one should I use do you know?

Thanks.

ernesto.limon · March 4, 2021, 5:23am

is not working, I tested a few scenarios of page rotation examples and is showing the following:

Rotation: None
SkewAngle=0

Even though the page is obviously rotated
Is it possible to change these values in the object model and adjust the pages values ?

Topic		Replies	Views
Reading multiple pdf files and detecting landscape or portrait Off-Topic Discussions pdf , ocr , activities	7	2989	December 17, 2023
How to read a multipage scanned PDF file with multiple orientations Activities pdf , question	1	38	October 9, 2024
Excellent PDF Digitization with Intelligent OCR Engines (Portrait and Landscape) Help activities	2	1509	March 30, 2021
Extract Information in vertical position Document Understanding document_understanding , pdf-extraction , content-disposition	4	1015	January 12, 2023
Is there a way to detect rotated page in PDF using Uipath Help	9	3343	March 20, 2023

Handling landscape view in a pdf and extracting data from a pdf

Related topics