Scraping different kinds of images


#1

Has anyone experience with scraping different invoices (images) and then setting the rules to read varibales from it? The problem is that there are 15 different kinds of invoices and these variables are on different areas on each invoice. Is there a way how the robot recognizes on which kind of invoice it is? Lets say based on what it looks like. Or the only way is to identify key words?


#2

I maybe wrong. But just give this a try.

  1. for every type of PDF that is visually similar, you can have a unique identifier (which can be the same image at different locations or completely different images in each or texts)

  2. For an example, if we have two types. In one, we have the logo at the left top and the other right top. Here we can find image-> imagefound.GetAbsolutePosition() can be used as parameters to differentiate between the two PDF types.


#3

Thank you very much.

The problem is that there is only text on them. But the form is very different.


#4

Did you try using unique form element as an identifier?


#5

No I have not. WHat is that?


#6

I’m pretty sure thare are parts of the invoices that are similar enough to find the “type”. But it’s very hard to tell without examples. Could you upload 2-3 examples of a single type of invoice?


#7

Unique form element can be anything that is unique to a particular form.
The first question in one type can be name and it can be the unique identifier.
In another type the first question could be Age.

So you will have to analyse the PDFs and first find an identifier before trying to automate.