Error at looping through pdf files of different formats to scrap data

Hi, I’m trying to scrap ocr data from multiple pdf files. All of the files are having the same content but are not on same format. Those are scanned pdf’s.
Here i’m having a thought

  1. find the anchor image.
  2. Where ever the image found, I want to scrape the data using OCR from the certain size of the region next to the anchor image from that pdf.

Help me here with the steps of workflow. Or any other suggestions will be appreciated.
Here is what i’ve tried so far.

  1. tried to make a standard selector but getting some other data from ocr when it is working on other pdf files
  2. Tried Anchor base activity but didn’t work.

Some times what i’m tring to scrap might be on top or middle or bottom of the pdf, So Even i try to read pdf with OCR it won’t work.

Did we try with CV activities

Cheers @Chaitanya_podilapu

1 Like

Hey @Palaniyappan. How’s your weekend! :smiley: Never tried CV elements before and I didn’t see any relevant activity useful to scrap data from pdf. Installed IntellegentOCR and MachineLearningExtractor packages. Not understanding how to use them and.
Inside CV scope i’ve got this error
response from this server not valid [404]
Coppied my api key in double quotes
and URL as[domainname]
Gone through some topics but not rectified

Thank you!

1 Like

Kindly have a view on this thread


Yeah! But i’m already on latest version 2019.11.0 beta and recently updated and installed those packages. Tried with stable version also! But couldn’t clear that error