Handling landscape view in a pdf and extracting data from a pdf


#1

Hi,

It would be great if you could help me out for the below scenarios.

  1. Is there an intelligence to check if a pdf page is in portrait view or landscape view. In a pdf, some pages are in portrait view and some in landscape view. I need to read the text in that pdf using OCR. Any suggestions?
  2. I tried to extract text from a structured pdf document. I need the text from all the pages - Tabular and non-tablur formatted text. Below are the options i tried but it doesnt help. Let me know if we can achieve this by any other ways.
    2a) “Read pdf with OCR” (With choosing inverted option and without choosing inverted options were tried) - Returns empty result.
    2b) Read Pdf text - output is empty
    2c) Scraping helps. But how do we know the number of pages and how to extract text from all the pages?
  3. I am trying to extract text from a pdf and trying to move it to another folder. But it says “The process cannot access the file because it is being used by other process”. How do we resolve it? The document is not open anywhere else.

#2

Hi ,

I am looking for the answers on this as well, were you able to figure out something?
thanks


#3

Below are the solutions that i used

  1. UiPath does not have the intelligence to check if a page is in portrait or landscape mode. However, some of the OCR auto rotate the pages to extract the data
  2. Read PDF with OCR works finally
  3. I copied the file to destination folder and then deleted the file after processing