Hi, I’m trying to scrap ocr data from multiple pdf files. All of the files are having the same content but are not on same format. Those are scanned pdf’s.
Here i’m having a thought
- find the anchor image.
- Where ever the image found, I want to scrape the data using OCR from the certain size of the region next to the anchor image from that pdf.
Help me here with the steps of workflow. Or any other suggestions will be appreciated.
Here is what i’ve tried so far.
- tried to make a standard selector but getting some other data from ocr when it is working on other pdf files
- Tried Anchor base activity but didn’t work.
Some times what i’m tring to scrap might be on top or middle or bottom of the pdf, So Even i try to read pdf with OCR it won’t work.