I am dealing with an application where I cannot get the text of a table and am having to rely on the GetOCRText activity to be able to read the contents. The table itself, the thing that I am able to select, has a header included. The header is getting read and causing issues.
The obvious solution to me is to trim out the header so that I am just reading the table contents. However, when I do so, I am getting an error. It appears that the activity cannot handle it.
As a workaround, I am using the WordsInfo enumerable to get the x/y coordinates and using some logic to figure out which text appears on what line of the table, but this is considerably more work than just trimming out the header and getting text back.
I would also like to note that I have done everything possible to reduce the selector down to just the table without the header, but the application was created in such a way that they are inseparable.
Can you provide more details (screenshots, app name, any other relevant info) so we can reproduce the issue? The short story, if I understand correctly, is that the GetOCRText activity is outputting an error when using Clipping Region - what error is this?
As an alternative, until we figure out what’s wrong with GetOCRText approach, since I understand you’re not able to use the Extract Table activity (for some reason, it seems you only have access to image-based OCR extraction for the table), I was thinking maybe you can try CV Extract Table? it’s also OCR based (+ uses the CV AI Model to detect the table) and outputs a datatable from which you can then just trim the header off.
Computer Vision activities are included in the UIA Automation package and the CV license is free (Community is limited to 30 MP/min, while Enterprise is limited to 240 MP/min).
I am going to use Notepad here because I cannot give out information about my company’s CRM. Note that this scenario does not make much sense, but, in my actual context, it does.
Let’s say that we have a text document open in Notepad as below:
We want to skip the first line and then grab the text of the second and later lines if they are present. I have used a selector that will grab the entire editable text area.
What I have done is used the Get OCR Text activity and attempted to skip the first line by using the cropping region setting by setting the top to 16 as I have counted the pixels and found that to be adequate.
When doing so, I get the following exception:
Changing the clipping region back to 0 causes it to function as expected but grabbing the first line as well.
The expected behavior is that I can use the clipping region to exclude that first line without receiving an exception.
The issue is, when setting ClippingRegion to (0,16,0,0), since the format is (left, top, right, bottom), you’re defining a zero width region. If you want to extract the “Hello World” text in your example, you should use something similar to this:
I’ve filed an internal ticket to improve our error messaging (to guide you towards the real issue; in this case, width = right - left = 0), but also to document this more clearly.
Let us know, please, if the suggestion above works and thank you for posting this.
That’s not how I would expect it to work and I have used clipping regions before. For each of the values, I would expect that to be the amount clipped from the side, not the values that I have to calculate myself to get to the region that I wish to use.