Get text from OCR scrape - string manipulation


I am trying to extract a few items from scanned images. The google OCR engine extracts the following

I need to extract the items highlighted in red. The location of the items will be exactly the same each time and will be the same character length.

Any help you can provide would be much appreciated!


1 Like

HI @rachelfonseca,

May I know what exactly you are scanning using OCR scrape? is it some sort of a PDF file that contains paragraphs of data? or does it have some sort of a standard structure? Bit more explanation about your source would help pinpoint a solution…

In your source, just before the value you need to extract, does it have a unique identification? some thing like, say if you want to extract a name “Lahiru Fernando”, Lahiru being the first & Fernando being the last, does it have unique identification like ‘First Name’, ‘Last Name’ etc?



Hi @rachelfonseca,

If the strings are static and only the highlighted values changes, you can capture them with regex.

Please assign String, StringVar1= System.Text.RegularExpressions.Regex.Match(your_text_variable,"(?<=Amount\s\(\d\))[\s\S]+(?=DSA)").ToString.Trim

StringVar2 = System.Text.RegularExpressions.Regex.Match((your_text_variable,"(?<=helpers)[\s\S]+(?=payment)").ToString.Trim

StringVar3 = System.Text.RegularExpressions.Regex.Match(str1,"(?<=reference)[\s\S]+(?=Sort)").ToString.Trim

If it doesn’t work in any case, please share your scrapped output as string, not screenshot. :slightly_smiling_face:

Warm regards,


Hi Nimin,

Thanks very much that has solved it and is a very handy piece we can re-use.


This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.