Generate Data Table from Get OCR Activity

I have a PDF of a scanned document and have used Get OCR Text to strip out the words and words info (this works fine). I want to use the Generate Data Table but cannot work out how to use the WordsInfo output from the Get OCR activity in the Positions input field for the Generate Data Table.

Use Get OCR Text without word info. Then use Generate Datatable


image
image.png650x503 15.9 KB

Thanks for the answer. I was more trying to work out how to use the “locations” field in generate data table activity using the wordinfo output. A lot of PDFs of images don’t allow you to generate a table from the screen scrape activity so am trying to understand howyou can generate the table from the wordinfo.

if you have text as output of your ORC output
input: your ORC TEXT output, then col separator may be ‘,’ or tab or whatever on which basis you want to separate a col.
new line separator may be Environment.newLine.

this way you can generate data table by text as input.

Ok. Thanks. It still doesnt explain the use of the “positions” field but I get the method you are suggesting.

I’m also looking for some info / guidance on the positions parameter if anyone has any.

The WordInfo outputs of ‘Get OCR Text’ and ‘Get Visible Text’ are both of type IEnumerable<TextInfo>, but the Positions input for the ‘Generate Data Table’ activity is of type IEnumerable <KeyValuePair<Rectangle,String>>

Putting aside the question of why they are different data types when the tooltip seems to suggest that the whole point of the Positions input is to take the output from the Get OCR Text activity, for a moment, can anyone tell me how to convert IEnumerable<TextInfo> to IEnumerable <KeyValuePair<Rectangle,String>>??

1 Like

Okay, so several more hours down the drain… I tried to create a dictionary of Rectangle/String Key/Value pairs that I could pass into the Generate Data Table Position parameter:

By looping through each item in the TextInfo IEnumerable I was able to use Item.Region.Rectangle and Item.Text.ToString to get the rectangle and string info I need, but when I came to assign the Rectangle to the Rectangle key in the dictionary I get the error

Option Strict On disallows implicit conversions from ‘System.Drawing.Rectangle?’ to System.Drawing.Rectangle’.

Why is the Rectangle property of the Get Visible Text property ‘System.Drawing.Rectangle?’ with a question mark? and how is that different to System.Drawing.Rectangle (without a question mark)?

I feel like I’m disappearing deep into a rabbit hole here… @ClaytonM Do you know how I can get this to work?

1 Like

Hey. I’ve never seen the .Rectangle? in my life, lol. However, you can get the X, Y, Width, and Height by using .Value.X .Value.Y Value.Width Value.Height

So, you can use that to create a new Rectangle of the correct type when you add it to your dictionary key.
Here is what it would look like:

new System.drawing.Rectangle(item.Region.Rectangle.Value.X, item.Region.Rectangle.Value.Y, item.Region.Rectangle.Value.Width, item.Region.Rectangle.Value.Height)

Hopefully, that helps you out.

Regards.

2 Likes

Hey Clayton, thanks for jumping in on this. I had thought that might be a way to tackle it, I was just really hoping to not have to go down that road.

Thanks for giving the code example, that already saves me hours of head-scratching right there. I’ll give it a go and see if it works!

Hi Foehl, did ClaytonM’s suggestion work? If so, do you mind posting the solution?

Thanks