To Extract the text (data) from Images

Hi Team,

I got stuck at this point where I extracted the image using Tesseract OCR.

This is the output am able to get. But I want the individual values and write them back to EXCEL

I want to write the values in EXCEL template below,

Input images:

Thanks in advance.

Hi

Hope the below steps would help you resolve this

Use a build datatable activity and create a table as you need with column names and get the output as dt

We can split the obtained text based on NewLine and then add them to datatable through add DataRow activity

To split the string input

arr_split = Split(Strinput.ToString,Environment.NewLine.ToArray())

  1. Then use a FOR EACH activity and pass the above array variable as input and change the type argument as string

  2. Now use a IF activity like this

NOT item.ToString.Contains(“any keyword you feel it separates each data”)

If true it will go to THEN block where. Use a ADD TO COLLECTIONS activity
In that activity

Mention in collection property as finallist which is of type System.Collections.Generic.List(of String) with default value as New List(of String) defined in the variable panel
And
In item mention as item.ToString

  1. Then use a Add data row activity and mention the ArrayRow as finallist.ToArray
    And in datatable mention as dt

  2. Now after for each use a write range activity with dt as input

Cheers @sushmithaelluru

Thanks for replying @Palaniyappan . :slightly_smiling_face:

“9‘24 .nuU '3‘ 6:1)»\r\n/:\r\n\\r\n(CAM)\r\nAdfitfi Kumar\r\nSr. RPA Analyst\r\nHearst\r\nUnited States\r\ni? Star\r\nC)\r\nHome | Agenda |Speakers | Network | o o o”,

I am getting as above the output.

Can I please know what can I use logic in IF activity.

Split(strExtractedText,"\r\n")

this will give you the array(say arrayStr) and then you can assign the required values to corresponding variables.

arrayStr(4) will be name
arrayStr(6) will be company
arrayStr(5) wiil be role
and so on

Great,thanks.

Hi @rahulsharma , am getting as below. Index was outside the bounds of an array. Can I please know how to resolve it.

Thanks.

That means your target data can have a different number of lines.

Array index out of bound means that, the index that you are trying to use does not exist in the array.

This is because the target from where you are scraping the data is not having all the fields you want.

Request you to check the text variable’s content after the robot scrape the data, that text will have. You will understand that extracted text doesn’t have the desired number of lines

As we are splitting the data via newline. To avoid this you can first get the count of the lines and then only access those lines that exists.

strYourExtractedText.Count("\r\n") → this will give you the count of the lines

then you can check it with the standard number of line (for ideal case) if it is same, all values will be assigned. If it is less then you need to analyze that in those cases which field is missing and remove the assignment of those fields. this can be done using IF condition.

Hope this helps to understand the scenario a bit more

Yes, Array index out of bound means that, the index that you are trying to use does not exist in the array.Agreeing with this point.

Ok will check with this strYourExtractedText.Count("\r\n").

1 Like