Best practice for Data Collection & Validation

I’m working on an automation that will encounter Invoices and Credit Notes. It will throw out the credit notes as a business exception and then process the invoices (for which it will need to extract information from the invoice file). To know whether the file is a credit note or an invoice it will need to open the file and check.

My question is: In a scenario like this, is it best practice to

  • A) Open the file, check if it’s a credit note or not, and only proceed to extract the information and process the document if it is an invoice, so as to not waste ‘effort’ extracting data from files that are to be ignored.

or

  • B) Open the file, extract all information that might be needed and then run any validation checks required since the robot doesn’t really care about ‘effort’ as such.

I realise this is probably subjective since A is ‘faster’ in the event of an exception, but B is probably neater and easier to read/maintain from a development perspective, but I wondered if there is an established best approach for this within the RPA community?

I guess it depends on how you are extracting the data. You probably don’t need to Open the file just to verify what kind of file it is, because Read PDF Text can store the data to a string and you can check if it contains certain keywords.

Ideally, you would create a workflow that all it does is determine if it is an Invoice or Credit Note. You can call this using Invoke Workflow file or using the new Library system. This component would return a Boolean to identify if it is a Credit Note.

Then, using that value, you can either process the data or skip it. I would say that processing the data would also include data extraction and formulation, where it takes it and creates a data table if needed before processing it, which can also be another workflow file that just formulates and organizes the data into something that can be outputted and processed.

Those are my thoughts.

In Summary would look like this:

Invoke ProjectName_IsCreditNote
    IN argument: in_PdfFilepath As String
    OUT argument: out_IsCreditNote As Boolean

    Read PDF text
    Assign out_IsCreditNote = System.Text.RegularExpressions.Regex.Match(pdfText.ToUpper, cnPattern).Success


If Not isCreditNote
    Invoke ProjectName_ExtractInvoice
        IN argument: in_PdfFilepath As String
        OUT argument: out_InvData As DataTable
        
        // formulate data table


    Process Invoice Sequence
        // process invoice items

There are various approaches similar to this.

Regards.

3 Likes

That works fine @ClaytonM :+1: