Get text from OCR scrape - string manipulation

Hi,

I am trying to extract a few items from scanned images. The google OCR engine extracts the following

I need to extract the items highlighted in red. The location of the items will be exactly the same each time and will be the same character length.

Any help you can provide would be much appreciated!

Thanks

1 Like

HI @rachelfonseca,

May I know what exactly you are scanning using OCR scrape? is it some sort of a PDF file that contains paragraphs of data? or does it have some sort of a standard structure? Bit more explanation about your source would help pinpoint a solutionā€¦

In your source, just before the value you need to extract, does it have a unique identification? some thing like, say if you want to extract a name ā€œLahiru Fernandoā€, Lahiru being the first & Fernando being the last, does it have unique identification like ā€˜First Nameā€™, ā€˜Last Nameā€™ etc?

Thanks
Lahiru

2 Likes

Hi @rachelfonseca,

If the strings are static and only the highlighted values changes, you can capture them with regex.

Please assign String, StringVar1= System.Text.RegularExpressions.Regex.Match(your_text_variable,"(?<=Amount\s\(\d\))[\s\S]+(?=DSA)").ToString.Trim

StringVar2 = System.Text.RegularExpressions.Regex.Match((your_text_variable,"(?<=helpers)[\s\S]+(?=payment)").ToString.Trim

StringVar3 = System.Text.RegularExpressions.Regex.Match(str1,"(?<=reference)[\s\S]+(?=Sort)").ToString.Trim

If it doesnā€™t work in any case, please share your scrapped output as string, not screenshot. :slightly_smiling_face:

Warm regards,
Nimin

3 Likes

Hi Nimin,

Thanks very much that has solved it and is a very handy piece we can re-use.

2 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.