Get text from OCR scrape - string manipulation

rachelfonseca · February 15, 2019, 5:18pm

Hi,

I am trying to extract a few items from scanned images. The google OCR engine extracts the following

I need to extract the items highlighted in red. The location of the items will be exactly the same each time and will be the same character length.

Any help you can provide would be much appreciated!

Thanks

Lahiru.Fernando · February 15, 2019, 6:35pm

HI @rachelfonseca,

May I know what exactly you are scanning using OCR scrape? is it some sort of a PDF file that contains paragraphs of data? or does it have some sort of a standard structure? Bit more explanation about your source would help pinpoint a solution…

In your source, just before the value you need to extract, does it have a unique identification? some thing like, say if you want to extract a name “Lahiru Fernando”, Lahiru being the first & Fernando being the last, does it have unique identification like ‘First Name’, ‘Last Name’ etc?

Thanks
Lahiru

nimin · February 15, 2019, 7:21pm

Hi @rachelfonseca,

If the strings are static and only the highlighted values changes, you can capture them with regex.

Please assign String, StringVar1= System.Text.RegularExpressions.Regex.Match(your_text_variable,"(?<=Amount\s\(\d\))[\s\S]+(?=DSA)").ToString.Trim

StringVar2 = System.Text.RegularExpressions.Regex.Match((your_text_variable,"(?<=helpers)[\s\S]+(?=payment)").ToString.Trim

StringVar3 = System.Text.RegularExpressions.Regex.Match(str1,"(?<=reference)[\s\S]+(?=Sort)").ToString.Trim

If it doesn’t work in any case, please share your scrapped output as string, not screenshot.

Warm regards,
Nimin

rachelfonseca · February 18, 2019, 9:43am

Hi Nimin,

Thanks very much that has solved it and is a very handy piece we can re-use.

system · February 21, 2019, 9:43am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finding text Help	4	4687	August 3, 2017
String Manipulation with Split(item, " ")(1).ToString Activities database , pdf , ocr , activities , data_scraping	21	3157	July 22, 2021
Using OCR to extract Small text Data ( not working ) Help	9	2534	June 21, 2019
Extract Text From Image Help ocr , studio , data_scraping	14	36198	October 29, 2018
OCR Scraping Help	3	1773	April 16, 2018

Get text from OCR scrape - string manipulation

Related topics