Problem with reading scanned pdf

Hello Guys

I have a slight problem with my scanned pdf. Im just learning with UiPath so I made up some random scanned PDFs with which I can try OCR (UiPath Screen OCR). I want OCR to read the receiver Amazon (as shown in the screenshot) but It just cant read it. Im using assing with regex System.Text.RegularExpressions.Regex.Match(pdfText, “Receiver\s+n..\n([^\n])”).Groups(1).Value.Trim and I have data row in which the receiver should be enrolled but after every try its blank. Is there any way it can be fixed?

Hello @michalhyza12

can you try below regex

System.Text.RegularExpressions.Regex.Match(pdfText, “receiver n\. \(\d+\) (.+)”).Groups(1).Value.Trim

it will give the ouput as “Amazon.com, Inc.”

image

Hi @michalhyza12

You can check with diff. OCR or please try below,

System.Text.RegularExpressions.Regex.Match(“Amazon.com, Inc.”, “([^\s,]+)”).Groups(1).Value.Trim

Happy Automation!

Hey @michalhyza12 try to use different ocr like Google OCR,Microsoft OCR,Tesseract OCR an in the property of ocr try to finetune the property of OCR for the better result otuput.
and can you try with this regex pattern.

System.Text.RegularExpressions.Regex.Match(yourvariable, “Receiver\s+([^\n]+)”).Groups(1).Value.Trim

cheers

when I tried codes from you guys this window apperaed on the assign where I put the code

@michalhyza12

Please share the workflow screenshot, might be we can help better.

Hi @michalhyza12

I’ve tried with same OCR , it’s working correctly.
Could you please share the input scanned pdf so i can try with that.

@michalhyza12 Can you remove " - double quotes and add them again. then try to run

sometimes copy paste changes the format of double quotes

Why would you use UiPath Screen OCR on a scanned document? Screen OCR is for…things on the screen. You should be using UiPath Document OCR. To go a step further, I would use Digitize Document with UiPath Document OCR.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.