To Extract the text (data) from Images

Hi Team,

I got stuck at this point where I extracted the image using Tesseract OCR.

This is the output am able to get. But I want the individual values and write them back to EXCEL

I want to write the values in EXCEL template below,

Input images:

Thanks in advance.


Hope the below steps would help you resolve this

Use a build datatable activity and create a table as you need with column names and get the output as dt

We can split the obtained text based on NewLine and then add them to datatable through add DataRow activity

To split the string input

arr_split = Split(Strinput.ToString,Environment.NewLine.ToArray())

  1. Then use a FOR EACH activity and pass the above array variable as input and change the type argument as string

  2. Now use a IF activity like this

NOT item.ToString.Contains(“any keyword you feel it separates each data”)

If true it will go to THEN block where. Use a ADD TO COLLECTIONS activity
In that activity

Mention in collection property as finallist which is of type System.Collections.Generic.List(of String) with default value as New List(of String) defined in the variable panel
In item mention as item.ToString

  1. Then use a Add data row activity and mention the ArrayRow as finallist.ToArray
    And in datatable mention as dt

  2. Now after for each use a write range activity with dt as input

Cheers @sushmithaelluru

Thanks for replying @Palaniyappan . :slightly_smiling_face:

“9‘24 .nuU '3‘ 6:1)»\r\n/:\r\n\\r\n(CAM)\r\nAdfitfi Kumar\r\nSr. RPA Analyst\r\nHearst\r\nUnited States\r\ni? Star\r\nC)\r\nHome | Agenda |Speakers | Network | o o o”,

I am getting as above the output.

Can I please know what can I use logic in IF activity.


this will give you the array(say arrayStr) and then you can assign the required values to corresponding variables.

arrayStr(4) will be name
arrayStr(6) will be company
arrayStr(5) wiil be role
and so on