How to scrape [email&@160;protected]

Working on a project to scrap some data, found this:

<li class="object-inline"> <i class="fa fa-envelope-o"></i> <a href="/cdn-cgi/l/email-protection#771e19111837161b15120510181607181b1b18591e03"><span class="__cf_email__" data-cfemail="563f38303916373a34332431393726393a3a39783f22">[email&#160;protected]</span></a> </li>

Source: www.apolloboutiquehotel.com

Is there any chance to scrape such email address which is “protected” (Cloudfare etc)?

I am able to see the email ID directly, not getting the “protected” email id that you mentioned
image


Is the email id visible in your screen when you open the webpage , you can try get OCR test to extract the email id too if you are not getting it through normal way

Yes, it is visible in browser screen. I’ll try Get OCR.
But why you can see it in code format and I cannot?! Hmmm…

UPDATED: I cannot use GetORC since I have a community license only, I guess.

Get OCR Text 'A  mailto:info@albergoa...': Error performing OCR: Invalid API key specified UiPathOCRInvalidApiKey

Hi,

Use tesseract OCR. It is free and we don’t require any api key all other OCR require api key or license. Thanks.

1 Like

Hi @IPIX

You can make use of this.

The only limitations is this activity does not support rotated documents and thus results are unpredictable on such documents.

Hope this will be helpful. Thank you.

1 Like

Hmm GET OCR TEXT doesn’t require any api I believe as it uses GOOGLE OCR engine by default

There are many ocr engines for free like
-Microsoft OCR Engine
-OmniPage OCR Engine

I would suggest to use omnipage ocr engine as it is pretty accurate

For that go to Design tab-> manage packages → all packages and search for UiPath.OmniPage.Activities and install it

For more details
https://docs.uipath.com/activities/docs/about-the-omnipage-activities-pack

Cheers @IPIX

Yes, Tesseract OCR is a very interesting solution.
But I faced this issue:

image

For the picture above, I get “holel” insteaf of “hotel” … “nel” instead of “net”. So, it’s confusing letter “l” with “t”.
Further, I’ve tried to improve it changing the Profile to Legacy, changing Invert to True. No success!
That’s why I would like also to try “scale” the image…

How the code for that specific function (Scale) it looks like?