Build Log "T-Shirt size M+": AI pair programming

Starting a new RPA project at work. Want to try something different: Abstracting the work to remove all internal references, and publish about it.

The year is 2025 and development must happen >70% in pair programming with LLMs. My showcase in this build log: To document design decisions, prompting techniques and even UiPath Studio code.

Environment: UiPath Studio 2023.10.z and UiPath Orchestrator + Unattended Robot. The lowest common denominator. Letโ€™s see how to push the envelope.
M365 CoPilot, a GPT5.x webfrontend and maybe a bit more.

2 Likes

[plan]
~2h from screenshots to a v0.1 of the data model

Got a requirement clickpath from business SMEs, with screenshots of the applications containing the data. Plus a rudimentary testing plan, and some data structure in .docx
Comprehensive, I am happy

Pasted a screenshot(!) of input data into M365 CoPilot.
Then prompted as following:

abbreviated transcript of LLM interactions

extract the txt

transform into a table, ready to paste into excel

Yes, create a downloadable Excel file

I am looking for a way to pseudonymize it, for use in external publications

โ€ฆ

let us take a step back, and discuss the taxonomy

  • generate a list of systems and their orinigal and public names

โ€ฆ

(~10 times back and forth to come up with generic system names)

give me this as excel sheet

(Microsoft CoPilot acted up, and over the course of ~15 Q&A did not deliver a downloadable file. GPT5 came to the rescue)

Outcome:

initial data model of the process

Now that will need minor refinement, but I feel that I achieved a lot in little time.
Will not show the requirement document, but business view was

And I abstracted hardcoded โ€œn itemsโ€ into a future โ€œarray of itemsโ€.

I need a table data naming convention, because I develop in a test-driven way.
Therefore I need urgently a corpus of test data, and the naming decisions inevitably will carry forward.

[plan]
freeform โ€œclickpathโ€ to pseudocode
1h

I love me some โ€œpseudocodeโ€. It is the lingua franca of software engineering.

And LLMs are trained sufficiently on pseudocode. Which as of 2025 is the #1 factor when deciding about technology: โ€œin-distributionโ€ yes or no. Pseudocode is.

I uploaded to M365 CoPilot

  • the taxonomy mapping from the last step
  • a freestyle โ€œclickpathโ€ PDF

Here my instructions to the LLM:

OneShot instructions

attached is a process description, and a taxonomy with a mapping from german process-specific terminology to generic data entities

extract from the process description in PSEUDOCODE the validation logic,
then apply the mapping to the generic terminology.

Finally, generate and output pseudocode about the validation. Strip all GUI intetractions from the pseudocode and return only validation logic

Here the result:

pseudocode
###############################################################################
# VALIDATION LOGIC (GUI interactions removed)
# Inputs follow the generic data model (datamodel.xlsx) and array semantics.
# - AccountNumber: string  (customer/account reference)
# - ServicePoints: array of ServicePoint items (0..n), each with:
#     Enabled?: boolean
#     PostalCode: string
#     Identifier?: string
#     Type?: string
#     City?: string
#
# External validation sources:
# - Address Directory (read-only): provides existence checks and classifications.
###############################################################################

CONSTANT VALID_CLASSIFICATIONS = {"FOO", "BAR"}

FUNCTION ValidateServicePoints(AccountNumber, ServicePoints):
    results = []

    FOR EACH sp IN ServicePoints:
        # Skip items explicitly disabled or not provided
        IF sp.Enabled IS DEFINED AND sp.Enabled == FALSE:
            CONTINUE

        # --- Rule: PostalCode is mandatory for any validation ---
        IF NOT Present(sp.PostalCode):
            results.APPEND(Result(sp, status="ESCALATE", reason="MissingPostalCode"))
            CONTINUE

        # --- Case A: PostalCode + Identifier provided ---
        IF Present(sp.Identifier):
            # Must exist as a valid combination for the customer
            IF ExistsCombination(AccountNumber, sp.PostalCode, sp.Identifier):   # from Address Directory
                results.APPEND(Result(sp, status="VALID"))
            ELSE:
                results.APPEND(Result(sp, status="ESCALATE", reason="CombinationNotFound"))
            CONTINUE

        # --- Case B: Only PostalCode provided ---
        # 1) PostalCode must exist for the customer context
        IF NOT ExistsPostalCode(AccountNumber, sp.PostalCode):                  # from Address Directory
            results.APPEND(Result(sp, status="ESCALATE", reason="PostalCodeNotFound"))
            CONTINUE

        # 2) PostalCode must have a valid classification (FOO or BAR)
        cls = GetClassification(sp.PostalCode)                                   # from Address Directory
        IF cls IN VALID_CLASSIFICATIONS:
            results.APPEND(Result(sp, status="VALID"))
        ELSE:
            results.APPEND(Result(sp, status="ESCALATE", reason="InvalidClassification"))

    RETURN results


# -----------------------------
# Helper types (for clarity)
# -----------------------------
TYPE ServicePoint:
    Enabled?: boolean
    PostalCode: string                 # maps to ServicePoint.n.PostalCode
    Identifier?: string                # maps to ServicePoint.n.Identifier
    Type?: string                      # maps to ServicePoint.n.Type
    City?: string                      # maps to ServicePoint.n.City

TYPE Result:
    item: ServicePoint
    status: "VALID" | "ESCALATE"
    reason?: string

# -----------------------------
# External validation contracts
# -----------------------------
FUNCTION ExistsCombination(AccountNumber, PostalCode, Identifier) -> boolean
FUNCTION ExistsPostalCode(AccountNumber, PostalCode) -> boolean
FUNCTION GetClassification(PostalCode) -> string

This is only 1 step in the process, and data already needed to be fetched during processing for the validation to work.
But that structured, concise code will be indispensible for futire LLM pair programming instructions.

[plan]
1h

Instructions to the LLM

based on the pseudocode, generate a mermaid diagram of the decision logic.
Embed that mermaid diagram in a Markdown file

โ€ฆ some iterations with invalid syntax until the LLM gets the Mermaid syntax right. No model seems capable of this, expecially when non ASCII characters are involved

Given the unstructured input from business, I now achieve a interim deliverable: A diagram with the core decision logic.

Cross-refencing the requirements doc this looks correct in the sense that it contains all gates and data attributes mentioned in the document.

And my notes from the earlier meetings also align.

BUT!
I do not trust the requirements.

screenshot of the diagram

VS Code extension
Markdown Preview Mermaid Support
by Matt Bierner

shows Mermaid diagrams, when embedded into a Markdown file.

Mermaid can also style the diagram elements. LLMs do this well, and I use the corporate colors.

I achieved a lot in half a day. Now I run the diagram for the first time across the business SME.

But I will still not trust that review response :slight_smile:
Thatโ€™s why the next steps will

  • begin to prepare test driven development
  • keep flexibility in the decision process

Follow me on Linkedin for a miniseries about ๐—ง๐—ต๐—ฒ ๐— ๐—ถ๐˜€๐˜€๐—ถ๐—ป๐—ด ๐—–๐—ผ๐—ป๐˜๐—ฟ๐—ฎ๐—ฐ๐˜

Traditional RPA will fail in the new โ€œMaestroโ€ era. You MUST rethink your solution design or risk getting stuck and left behind.

Proof: Look at the Maestro templates and PoC publications, and inspect what happens โ€œat the end of a BPMN edge when data enters the nodeโ€.

Thatโ€™s where I have yet to see a correct example. And it has to do with the lack of conceptual clarity about โ€œfunctional programmingโ€.

This build log will also set a few things straight.

I consider this dishonesty at best, and to be frank: I have even more precise words for what is published by the vendor:

NONE of the โ€œtemplatesโ€ seem to have ANY minimum dataflow.
No input arguments, no output arguments.

Maybe that is why all these โ€œsafe harborโ€ slidedeck openers exist: The emperor has no data flow.

That is worrisome.

Even in my traditional RPA build log I will showcase:

  • the FUNDAMENTAL shift that โ€œorchestrationโ€ requires
  • how traditional RPA can achieve it
  • and become a 1st class player in the agentic era

In case you did not pay attention: Agents are nothing without data.

For the developer this means: We better learn to manage nested JSON. And learn fast.
Did you do plenty of API automation in the past years? No? You are in a bad position.

This build log will PUSH THE LIMITS of what is possible in traditional RPA. Because that is what we know. That is the foundation from where we have to pivot.

[plan]

Back to TDD test driven development.

There is the recording from a UiPath Community Event in the Stuttgart, Germany chapter from last year:
Zukunftssicheres RPA: Testing-Methoden und Praxisbeispiele

This shows the mechanics of test-driven-development in UiPath Studio.

I assume that the recording is auto-translated or at least subtitled.

Core slide:

No, despite RPA having โ€œselectorsโ€ slurped in, the RPA process does in no way own anything about the GUI.
The GUI is external. And we need to treat it as such, not just in testing, but already before.

And typically there are >1 GUI in a traditional RPA automation. Extra reason to handle it properly.

Test-driven-development, at the core, defines first the testcases, then implements.
This is how I will proceed in this build log.

[plan]
2 hours

Never trust the clickpath to give a robust datamodel.

Went through the datamodel and asked these questions:

  • during processing, will I need to enrich the record from another system?
    If yes, I add a โ€œExternalReferenceโ€
  • do I have to refence a bunch of data at several times? (โ€œbunch of dataโ€ as in "would be a table in a normalized data scheme)
    If yes, then I add an Id that I will autogenerate
  • are there properties implicitly assumed that I should make explicit, so that I can better target a automated system?
    Always candidates: timezone, localized address schema that should be interationalized, implicit โ€œENUMsโ€
  • do I have an initial idea about the datatype?
  • Always I add some metadata, a โ€œcreated_atโ€ and โ€œupdated_atโ€ almost always comes in handy
updated dataschema

Why is that important?

When the process automation industry moves

  • from scheduling (done in e.g. UiPath Orchestrator)
  • to orchestration (done e.g. in UiPath โ€œMaestroโ€)
    data will need to be exposed to the outer surface (what in UiPath Studio are the input/output arguments of the entrypoints (commonly Main.xaml, but any workflow can be an entrypoint)).

[plan]
1h

Transform the data model (LLM-extracted from an unstructured clickpath, and manually amended) into something that LLM coding agents can work with.

Instructions and answers

Instructions to Sonnet4.5

generate a python script in ./tools/ that (hardcoded) takes the @dat/data-model/incoming/data-model.xlsx and creates a JSON schema for it.

Done. Created tools/xlsx_to_json_schema.py which:

  1. Reads dat/data-model/incoming/data-model.xlsx
  2. Converts each row to a JSON Schema property
  3. Maps datatypes (string, integer, boolean, enum โ†’ JSON Schema types)
  4. Handles multiplicity notation (1 = required, 0..1 = optional, 0..n = array)
  5. Outputs to dat/data-model/incoming/data-model.schema.json
    Run it with:
    source .venv/bin/activate && python tools/xlsx_to_json_schema.py

Instructions to GPT5:

read @plan/dat/data-model/incoming/data-model.schema.json and critically revise the schema.
it is autogenerated from a script

I then pasted these findings into a chat with Sonnet4.5

GPT5 seems useless to write good code, the Anthropic models as of 2025-Q4 are way superior.
But GPT5 excels at critiquing other modelโ€™s code!

Preliminary json schema:

The target system to automate handles data in an object-relational mapper, so I move from the dot notation to a proper nested object representation.

xlsx: Account.Id
json: account { id, โ€ฆ}

plan/dat/data-model/incoming/data-model.schema.json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/data-model/v1.0.0",
  "title": "Data Model Record",
  "description": "Schema for a single data model record representing customer/account data with contacts, addresses, payment information, and service points.",
  "type": "object",
  "properties": {
    "record": {
      "type": "object",
      "properties": {
        "sourcesystem": {
          "type": "string",
          "default": "RPA-Process",
          "x-tier": "defaulted",
          "description": "Upstream system code from which the record originates; used for routing and audit."
        },
        "createdat": {
          "type": "string",
          "format": "date-time",
          "x-tier": "generated",
          "description": "ISOโ€‘8601 timestamp when the record was created in the system of record."
        },
        "updatedat": {
          "type": "string",
          "format": "date-time",
          "x-tier": "generated",
          "description": "ISOโ€‘8601 timestamp of the last material update; supports incremental sync."
        },
        "version": {
          "type": "integer",
          "x-tier": "generated",
          "description": "Monotonically increasing revision counter for optimistic concurrency control."
        }
      },
      "additionalProperties": false
    },
    "account": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "x-tier": "generated",
          "description": "Opaque internal unique identifier of the account."
        },
        "number": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Human-readable account/customer reference used for billing and reconciliation."
        },
        "externalreference": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Optional partner-side mapping reference used for cross-system correlation."
        }
      },
      "additionalProperties": false,
      "required": [
        "number"
      ]
    },
    "contact": {
      "type": "object",
      "properties": {
        "email": {
          "type": "string",
          "format": "email",
          "x-tier": "trigger",
          "description": "RFC-compliant email address; validated and normalized."
        },
        "fullname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Display name composed from salutation/title/given/family name."
        },
        "phone": {
          "allOf": [
            {
              "$ref": "#/$defs/phoneNumber"
            }
          ],
          "x-tier": "trigger",
          "description": "Primary fixed-line phone number; canonical format recommended."
        },
        "callbackwindow": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Free-text or structured callback availability window; stored as opaque payload."
        },
        "preferredchannel": {
          "type": "string",
          "enum": [
            "PHONE",
            "EMAIL",
            "MAIL"
          ],
          "default": "PHONE",
          "x-tier": "defaulted",
          "description": "Preferred communication channel (PHONE, EMAIL, MAIL)."
        },
        "timezone": {
          "allOf": [
            {
              "$ref": "#/$defs/timezone"
            }
          ],
          "default": "Europe/Berlin",
          "x-tier": "defaulted",
          "description": "IANA time zone identifier for local-time interpretation."
        }
      },
      "additionalProperties": false,
      "required": [
        "email",
        "fullname",
        "phone",
        "callbackwindow"
      ]
    },
    "organization": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Registered legal name of the organization."
        },
        "legalform": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Legal entity type (e.g., GmbH, AG)."
        },
        "ownername": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Owner/shareholder name(s), including special cases."
        },
        "vatid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "VAT identification number; validated where possible."
        },
        "taxid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Local tax identification number."
        },
        "registrationnumber": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Official commercial register number."
        },
        "peppolid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "PEPPOL participant identifier for e-invoicing."
        }
      },
      "additionalProperties": false,
      "required": [
        "name",
        "legalform",
        "ownername"
      ]
    },
    "address": {
      "type": "object",
      "properties": {
        "addressline1": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Primary address line including street and building details."
        },
        "addressline2": {
          "type": "string",
          "default": "",
          "x-tier": "defaulted",
          "description": "Optional secondary line (suite, floor, c/o)."
        },
        "housenumber": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Building/house number including suffixes."
        },
        "additional": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Additional address information not captured elsewhere."
        },
        "postalcode": {
          "allOf": [
            {
              "$ref": "#/$defs/postalCode"
            }
          ],
          "x-tier": "trigger",
          "description": "Postal/ZIP code; stored as string to preserve formatting."
        },
        "city": {
          "type": "string",
          "x-tier": "trigger",
          "description": "City or locality name."
        },
        "region": {
          "type": "string",
          "default": "",
          "x-tier": "defaulted",
          "description": "Administrative region (state/province)."
        },
        "country": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Display country/region label."
        },
        "countrycode": {
          "allOf": [
            {
              "$ref": "#/$defs/countryCode"
            }
          ],
          "x-tier": "enriched",
          "description": "ISOโ€‘3166โ€‘1 alphaโ€‘2 country code."
        }
      },
      "additionalProperties": false,
      "required": [
        "addressline1",
        "housenumber",
        "additional",
        "postalcode",
        "city",
        "country"
      ]
    },
    "payment": {
      "type": "object",
      "properties": {
        "iban": {
          "allOf": [
            {
              "$ref": "#/$defs/iban"
            }
          ],
          "x-tier": "trigger",
          "description": "International Bank Account Number; validated via checksum rules."
        },
        "bic": {
          "allOf": [
            {
              "$ref": "#/$defs/bic"
            }
          ],
          "x-tier": "trigger",
          "description": "Bank Identifier Code (SWIFT)."
        },
        "bookingtext": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Booking/reference text included with payment transactions."
        },
        "schedule": {
          "type": "string",
          "enum": [
            "TEN_DAYS",
            "MONTHLY",
            "QUARTERLY",
            "SEMI_ANNUALLY",
            "ANNUALLY"
          ],
          "x-tier": "trigger",
          "description": "Billing frequency (MONTHLY, QUARTERLY, etc.)."
        },
        "currency": {
          "allOf": [
            {
              "$ref": "#/$defs/currencyCode"
            }
          ],
          "default": "EUR",
          "x-tier": "defaulted",
          "description": "ISOโ€‘4217 threeโ€‘letter currency code."
        },
        "method": {
          "type": "string",
          "enum": [
            "SEPA_DIRECT_DEBIT",
            "BANK_TRANSFER",
            "CREDIT_CARD",
            "INVOICE"
          ],
          "default": "SEPA_DIRECT_DEBIT",
          "x-tier": "defaulted",
          "description": "Payment method (SEPA_DIRECT_DEBIT, BANK_TRANSFER, etc.)."
        },
        "mandatereference": {
          "type": "string",
          "x-tier": "trigger",
          "description": "SEPA mandate reference; unique per creditor."
        },
        "mandatesignaturedate": {
          "type": "string",
          "format": "date",
          "x-tier": "defaulted",
          "description": "ISOโ€‘8601 date when the SEPA mandate was signed."
        }
      },
      "additionalProperties": false,
      "required": [
        "iban",
        "bic",
        "bookingtext",
        "schedule",
        "mandatereference"
      ]
    },
    "payer": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "x-tier": "generated",
          "description": "Internal identifier of the payer entity."
        },
        "type": {
          "type": "string",
          "enum": [
            "PERSON",
            "ORGANIZATION"
          ],
          "x-tier": "trigger",
          "description": "Payer classification (PERSON, ORGANIZATION)."
        },
        "firstname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Payer given name."
        },
        "lastname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Payer family name."
        },
        "dateofbirth": {
          "type": "string",
          "format": "date",
          "x-tier": "trigger",
          "description": "Payer date of birth in ISOโ€‘8601 format."
        }
      },
      "additionalProperties": false,
      "required": [
        "type",
        "firstname",
        "lastname",
        "dateofbirth"
      ]
    },
    "servicepoint": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": {
            "type": "string",
            "x-tier": "generated",
            "description": "Internal identifier of a service point."
          },
          "externalreference": {
            "type": "string",
            "x-tier": "enriched",
            "description": "External mapping reference for the service point."
          },
          "enabled": {
            "type": "boolean",
            "x-tier": "trigger",
            "description": "Flag indicating if the service point is active."
          },
          "type": {
            "type": "string",
            "enum": [
              "POSTBOX",
              "ADDRESS",
              "PICKUP_POINT",
              "LOCKER"
            ],
            "x-tier": "trigger",
            "description": "Type/category of service point (e.g., postbox, address)."
          },
          "identifier": {
            "type": "string",
            "x-tier": "trigger",
            "description": "Identifier/number of the service point."
          },
          "postalcode": {
            "allOf": [
              {
                "$ref": "#/$defs/postalCode"
              }
            ],
            "x-tier": "trigger",
            "description": "Postal code associated with the service point."
          },
          "city": {
            "type": "string",
            "x-tier": "trigger",
            "description": "City/town for the service point."
          }
        },
        "required": [
          "enabled",
          "type",
          "identifier",
          "postalcode",
          "city"
        ],
        "additionalProperties": false
      },
      "description": "Array of service points associated with the account.",
      "minItems": 1,
      "x-tier": "trigger"
    }
  },
  "additionalProperties": false,
  "required": [
    "account",
    "address",
    "contact",
    "organization",
    "payer",
    "payment",
    "servicepoint"
  ],
  "$defs": {
    "iban": {
      "type": "string",
      "pattern": "^[A-Z]{2}[0-9]{2}[A-Z0-9]{4,30}$",
      "description": "International Bank Account Number (IBAN)"
    },
    "bic": {
      "type": "string",
      "pattern": "^[A-Z]{4}[A-Z]{2}[A-Z0-9]{2}([A-Z0-9]{3})?$",
      "description": "Bank Identifier Code (BIC/SWIFT)"
    },
    "countryCode": {
      "type": "string",
      "pattern": "^[A-Z]{2}$",
      "description": "ISO 3166-1 alpha-2 country code"
    },
    "currencyCode": {
      "type": "string",
      "pattern": "^[A-Z]{3}$",
      "description": "ISO 4217 three-letter currency code"
    },
    "phoneNumber": {
      "type": "string",
      "pattern": "^\\+?[0-9\\s\\-\\(\\)]{6,20}$",
      "description": "Phone number in international or local format"
    },
    "postalCode": {
      "type": "string",
      "pattern": "^[A-Z0-9\\-\\s]{3,10}$",
      "description": "Postal/ZIP code"
    },
    "timezone": {
      "type": "string",
      "pattern": "^[A-Za-z]+(?:_[A-Za-z]+)?/[A-Za-z0-9_+-]+$",
      "description": "IANA timezone identifier (e.g., Europe/Berlin)"
    }
  },
  "allOf": [
    {
      "if": {
        "properties": {
          "payment": {
            "properties": {
              "method": {
                "const": "SEPA_DIRECT_DEBIT"
              }
            },
            "required": [
              "method"
            ]
          }
        }
      },
      "then": {
        "properties": {
          "payment": {
            "required": [
              "mandatereference",
              "mandatesignaturedate"
            ]
          }
        }
      }
    }
  ]
}

The script is in source control, prepared for the development easily re-created on different developer machines.

This is all still initial โ€œplanโ€ solution design. Any data structure is subject to change. But I never hesitate to put it in code even early.

[plan]
4h

Data, data, data โ€“ tata!

I have yet to open UiPath Studio for even a first time in this project.

But spent the next half a day generating comfortable, repeatable, high-quality test data files.

  1. based on the JSON schema and the datamodel .xlsx
  2. leveraging common Python testdata libraries
  3. a project-specific cli tool now generates template-based output
    a) into a .xlsx corpus file
    b) into individual text files resembling the multiline input that the process works on

I am not re-creating the LLM instructions here, but Python scripting worked again well with Anthropic Sonnet 4.5
As of 2025-Q3 I am following somewhat the BMAD method, and diligently have the pair programming coding agent write out a PLAN.md file.

cli commands
(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/generate_testdata.py --help

 Usage: generate_testdata.py [OPTIONS]

 Generate synthetic test data for the data model.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --count               -n      INTEGER  Number of test cases to generate [default: 20]                                                                                        โ”‚
โ”‚ --append              -a               Append to existing corpus instead of overwriting                                                                                      โ”‚
โ”‚ --output              -o      PATH     Output file path                                                                                                                      โ”‚
โ”‚ --dry-run                              Show what would be generated without writing files                                                                                    โ”‚
โ”‚ --install-completion                   Install completion for the current shell.                                                                                             โ”‚
โ”‚ --show-completion                      Show completion for the current shell, to copy it or customize the installation.                                                      โ”‚
โ”‚ --help                                 Show this message and exit.                                                                                                           โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ


(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/render_testdata.py --help

 Usage: render_testdata.py [OPTIONS] COMMAND [ARGS]...

 Render test data corpus using Jinja2 templates.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ --install-completion          Install completion for the current shell.                                                                                                      โ”‚
โ”‚ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                                               โ”‚
โ”‚ --help                        Show this message and exit.                                                                                                                    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
โ•ญโ”€ Commands โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ list     List available profiles.                                                                                                                                            โ”‚
โ”‚ render   Render test data corpus using a profile.                                                                                                                            โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/render_testdata.py render --help

 Usage: render_testdata.py render [OPTIONS]

 Render test data corpus using a profile.

โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ *  --profile       -p      TEXT  Profile name (without .yaml extension) [required]                                                                                           โ”‚
โ”‚    --rows          -r      TEXT  Row range (e.g., '1', '1-5', 'all')                                                                                                         โ”‚
โ”‚    --sheet         -s      TEXT  Sheet name to read from [default: Synthetic]                                                                                                โ”‚
โ”‚    --corpus        -c      PATH  Corpus file path                                                                                                                            โ”‚
โ”‚    --profiles-dir  -d      PATH  Profiles directory                                                                                                                          โ”‚
โ”‚    --dry-run                     Show what would be rendered without writing files                                                                                           โ”‚
โ”‚    --help                        Show this message and exit.                                                                                                                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

A sizeable amount of time was spent on the repo organization: I am pushing git worktree further than I ever did before. And subtle issues with parallel use on Windows and in WSL Windows Subsystem for Linux took some time to figure out.

The Python package faker is great to generate even localized test data. I have now hundreds of pseudo-random testdata files available, and the data is rendered through templates for maximum flexibility.

Because I still do not trust any requirement unless I really fetch it from source systems. Documentation is best treated with caution, and every little flexibililty to react to changes is wisely invested time.

Over the weekend I should find some time to polish a few rough edges of the repository, then I can publish a link and reference e.g. the scripts or BMAD-inspired โ€œPLAN.mdโ€ files directly.

With the data model, data validation testdata and sufficiently out of the way now the focus will shift to the logic.

Anticipating the era of orchestration the emphasis on data is the right decision, as orchestration is based on passing data around, and even AIs cannot do much without access to and understanding of data.

[plan]
0h (because fun stuff in the free time on a weekend)

The power of pseudocode: LLMs understand it

I am pushing hard into the logic territory now.
But guess what? Logic deals with data.

Next milestone, and it will take a couple of posts/steps to achieve, is about a bunch of booleans.

Google โ€œguard clausesโ€. Long story short: They make your โ€œexpand allโ€ UiPath Studio view to remain horizontally lean.
If you go the extra mile: Research how the programming language Go / Golang deals with err.

โ€œGuard clausesโ€. Schedule the next 3 weekends to grok those. It will transform your coding style.

I will now, that the v0.1 data model was established, focus on the logic.
And with logic I do not mean mickey-mouse flowcharts, but hard facts:

  1. I got payload data
  2. data is enriched and looked-up in systems
  3. which decisions am I going to make on the basis of looked-up data
  4. long story short: I am aiming for a bunch of (aggregate) booleans:
    • isAutomatable
    • isValid

If you did read the above carefully, then you notices the โ€œarray of somethingโ€.
An โ€œaggregate isAutomatableโ€ is in the making.

Stay tuned!

[plan]

Provenance Tracking: Understanding the Challenge

The Core Question

For each piece of data in a record, we need to answer:

  1. Where did this value come from? (Payload, Address Directory, Config DB)
  2. Was it validated? (Against what? With what result?)
  3. Was it overridden? (By which authoritative source?)
  4. Does it contribute to the go/no-go decision?

How to take the foundation of the data model conceptually to the next level?

Option A

Inline Metadata (minimal)

Each field carries its own provenance:

ServicePoint {
PostalCode: โ€œ12345โ€
PostalCode_Source: โ€œPAYLOADโ€
PostalCode_Validated: true
PostalCode_ValidationResult: โ€œEXISTSโ€

 Classification: "FOO"
 Classification_Source: "ADDRESS_DIRECTORY"

}

Pros: Self-contained, easy to serialize
Cons: Verbose, duplicates structure

Option B

Parallel Provenance Object

Separate object mirrors the data structure:

Record {
servicePoint: [{ postalCode: โ€œ12345โ€, โ€ฆ }]
}

Provenance {
servicePoint: [{
postalCode: { source: โ€œPAYLOADโ€, validated: true, validator: โ€œADDRESS_DIRECTORYโ€ }
classification: { source: โ€œADDRESS_DIRECTORYโ€, enriched: true }
}]
organization: {
name: { source: โ€œCONFIG_DBโ€, overridden: true, originalValue: โ€œPayload Nameโ€ }
}
}

Pros: Clean separation, no field name pollution
Cons: Must keep structures in sync

Option C

Validation Result Object (event-sourcing style)

Track the sequence of validations/enrichments:

ValidationTrace {
steps: [
{ step: โ€œPARSE_PAYLOADโ€, timestamp: T1, fields: [โ€œ*โ€] }
{ step: โ€œVALIDATE_SP_0โ€, timestamp: T2,
input: { postalCode: โ€œ12345โ€, identifier: โ€œABCโ€ },
lookup: โ€œExistsCombinationโ€,
result: โ€œVALIDโ€ }
{ step: โ€œLOOKUP_CONFIGDBโ€, timestamp: T3,
input: { accountNumber: โ€œACC-123โ€ },
result: โ€œEXISTSโ€,
overrides: { โ€œorganization.nameโ€: { from: โ€œPayloadโ€, to: โ€œMasterโ€ } } }
]
finalDecision: โ€œALL_VALIDโ€
}

Pros: Full audit trail, can replay decisions
Cons: More complex, larger payload

And then there is a common challenge: 1 transaction item, many subordinate things to validate, and proceed only if aggregate valid. And handover to human-in-the-loop with actionable findings.

 Aggregate Decision

 FOR ALL enabled ServicePoints:
   ALL must be VALID
 THEN: Proceed to write configuration
 ELSE: Escalate (with per-item reasons)

Here is a generalized view on the logic (I still have to publish a diagram or other documentation, bear with me for now).

The requirements present a multi-source, conflict-tolerant decision model:

  • the RPA process receives an initial payload, then enriches it with data from independent external systems.
  • Each system may provide overlapping or contradictory values.
  • Decisions depend not only on the final merged state but on which system asserted which fact and whether values across sources are consistent, missing, or conflicting.

The core challenge is designing an application model that

  • can represent parallel views of the same fields,
  • preserve provenance, support comparisons,
  • and drive deterministic routing logic without embedding unscalable metadata into the core schema.

I come across such non-linear logic all the time.
Maybe because thatโ€™s when I am called in.

DO +++ NOT +++ LET +++ CLICKPATHS +++ CONFUSE +++ YOU

The clickpath typically contains further down the path some decisions and which data to use.
It is tempting to โ€œdeal with it laterโ€.
But when you come across late in the clickpath requirements like โ€œif a record exists with the value of โ€ฆ (input data xyz), then update via this path, else create from scratchโ€

THEN +++ YOU +++ DEAL +++ WITH THIS +++ FROM THE START

โ€œDealing withโ€ is 2 layered:

  • you precisely collect data
  • you design an extensible decision logic around that

Do not trust that the requirements are final. Design for adaptability.

The following is only for the software-engineering-intellectually-curious personality, and just โ€œcontextโ€, not necessarily the implementation detail.

Here ten modeling options, each with distinct trade-offs.
I already made up my mind about two or three being candidates to model this process.

Option A: Separate Per-Source Payloads

Maintain distinct objects for each data source.


{

"request": { "address": { "postalCode": "12345" } },

"addressDirectory": { "classification": "FOO" },

"configDatabase": { "address": { "postalCode": "99999" } },

"decisions": { }

}

Pros

  • Provenance is trivial: the valueโ€™s location indicates its source

  • Each source view remains immutable and auditable

  • Easy to store full snapshots for replay and debugging

Cons

  • Multiple copies of overlapping structures increase payload size

  • Decision logic must explicitly compare across sections

  • Adding a new source system widens the structure

Good when

  • Clarity and auditability outweigh compactness concerns

  • A comparison layer can be built to evaluate cross-source consistency

Option B: Provenance Embedded into Data Model

Attach provenance metadata directly to each field in a single canonical object.

{
  "address": {
    "postalCode": "12345",
    "postalCode_provenance": {
      "source": "REQUEST",
      "enrichedFrom": "CONFIG_DB",
      "status": "OVERRIDDEN"
    }
  }
}

Pros

  • Single canonical object simplifies downstream consumers
  • Schema can encode tier hints (trigger, enriched, generated, defaulted)
  • Works well for simple provenance (who wrote the current value)

Cons

  • Does not scale when all competing values must be retained
  • Complex for fields with many sources or update histories
  • Tooling must handle { value, metadata } instead of simple types

Good when

  • Only the final chosen value and its source matter
  • Equality decisions on raw competing values are not required downstream
Option C: Parallel Views with Explicit Comparison Layer

Combine per-source views with a dedicated comparison section.

{
  "request": { },
  "addressDirectory": { },
  "configDatabase": { },
  "comparisons": {
    "address.postalCode": {
      "request": "12345",
      "configDatabase": "99999",
      "status": "CONFLICT",
      "preferred": "configDatabase"
    }
  }
}

Pros

  • Provenance and equality logic are explicit and localized
  • Core per-source models remain simple
  • Decisions can attach directly to comparison results

Cons

  • Requires an additional step to build and maintain comparisons
  • A stable field addressing scheme is necessary

Good when

  • Multiple decisions depend on nuanced consistency across systems
  • Human-readable logs and analysis are important
Option D: Fact Table (Evidence List)

Model each value as an independent fact with source and path.

{
  "facts": [
    { "path": "address.postalCode", "source": "REQUEST", "value": "12345", "ts": "..." },
    { "path": "address.postalCode", "source": "CONFIG_DB", "value": "99999", "ts": "..." }
  ]
}

Application logic queries facts by path, checks consistency, and resolves by precedence.

Pros

  • Scales with more sources and temporal changes
  • Generic equality checks and conflict detection
  • Easy to audit and replay

Cons

  • Higher conceptual overhead
  • Requires helpers to materialize a canonical view for downstream consumers
  • Path and filtering logic must be implemented carefully

Good when

  • Provenance and conflict analysis are first-class requirements
  • The system is expected to grow with additional sources and rules
Option E: Event Log of Enrichment Steps

Model each enrichment as an event capturing input, output, and result.

{
  "events": [
    { "step": "VALID_010", "system": "RPA", "input": { }, "output": { }, "result": "OK" },
    { "step": "ADDR_LOOKUP", "system": "ADDR", "input": { "postalCode": "12345" }, "output": { "classification": "FOO" }, "result": "OK" }
  ]
}

Pros

  • Excellent for audit, debugging, and re-simulation
  • Every decision links to an explicit event
  • State at any step can be reconstructed

Cons

  • Not ideal as the primary query model for decisions
  • Requires aggregation into another pattern for final state
  • Can become verbose

Good when

  • Traceability and replay are primary concerns
  • Another pattern (A, C, or D) provides the main decision substrate
Option F: Layered Override Model

Model data as a stack of layers with explicit precedence.

base:     request       (lowest priority)
layer 1:  addressDirectory
layer 2:  configDatabase (highest priority)
final:    computed by collapsing layers

Each layer contains only deltas (changed fields). The final view is computed by merging layers in order.

Pros

  • Simple, deterministic precedence
  • Provenance is implicit in layer origin
  • Familiar pattern (CSS cascading, Terraform merge)

Cons

  • Hides conflicts: last writer wins by design
  • Cannot express โ€œconflict detectedโ€ as a distinct state
  • Less suitable when conflict is a business-relevant signal

Good when

  • Precedence rules are fixed and unambiguous
  • Conflicts should be resolved silently rather than surfaced
Option G: Field-Level State Machine

Come on!?! You are NOT reading this, arenโ€™t? You actually care that much, and go down each rabbit hole? Letโ€™s connect https://www.linkedin.com/in/cprima/

Each field transitions through a lifecycle.

PENDING -> VALIDATED -> ENRICHED -> OVERRIDDEN -> FINAL
              \-> CONFLICT ->/

Provenance is implicit in the transition history.

Pros

  • Explicit lifecycle stages per field
  • Conflicts and overrides are distinct states
  • Good for workflows with discrete processing phases

Cons

  • Complex to implement and maintain
  • Overkill if only before/after states matter
  • Requires state management infrastructure

Good when

  • Fields move through well-defined processing stages
  • Lifecycle visibility is a requirement
Option H: Dual-Object Pattern (Input vs Resolved)

Maintain two distinct objects: raw input and resolved output.


{

"input": { },

"resolved": { }

}

Pros

  • Simple and clear separation of concerns

  • Input remains immutable for audit

  • Downstream consumers use only the resolved view

Cons

  • Loses intermediate provenance (which system contributed to resolved)

  • Cannot distinguish between sources that agreed vs single-source resolution

Good when

  • Only raw input and final output matter

  • Intermediate provenance is not required

Option I: Schema-Defined Source Binding

Declare in the schema which source is authoritative for each field.


address.postalCode:

authoritative: CONFIG_DB

fallback: [ADDRESS_DIRECTORY, REQUEST]

organization.name:

authoritative: CONFIG_DB

fallback: [REQUEST]

servicePoint[].classification:

authoritative: ADDRESS_DIRECTORY

fallback: []

Resolution logic becomes generic and configuration-driven.

Pros

  • Declarative and easy to reason about

  • Resolution logic is consistent and centralized

  • Schema serves as documentation

Cons

  • Less flexible for conflict-as-signal scenarios

  • Requires schema extensions or external configuration

  • Dynamic precedence rules are harder to express

Good when

  • Source authority is stable and well-defined

  • A declarative, configuration-driven approach is preferred

Option J: Graph or Triple Store

Model data as RDF-style triples or a property graph.


(address.postalCode, assertedBy, REQUEST)

(address.postalCode, hasValue, "12345")

(address.postalCode, assertedBy, CONFIG_DB)

(address.postalCode, hasValue, "99999")

(address.postalCode, conflictsWith, address.postalCode)

Pros

  • Maximum flexibility for querying relationships

  • Can express arbitrary provenance and conflict relationships

  • Extensible to complex scenarios

Cons

  • Heavy infrastructure requirements

  • Rarely justified for typical business automation

  • Steeper learning curve for teams

Good when

  • Building a knowledge graph or semantic data platform

  • Query flexibility outweighs implementation complexity

I prefer to read systems and collect facts,
only to decide late when I have all my facts collected.

I even open and close GUIs just to read, and conditionally re-open them in case I need to write later.

In any non-trivial, requirements-volatile process, NEVER go down the quagmire of opening multiple systems, in nested logic, haphazardly passing data. That is amateurish and you MUST strive for a better solution design.

Why is that important? Agents.

Undoubtedly you did not live under a rock, so you know future automations will make autonomous decisions. Even traditional RPA needs to adapt, to remain relevant, at least as a โ€œtoolโ€ (no pun intended).

Enlightening use of LLMs as pair programming:


Current reasoning LLMs rarely get a shout out for their educational value:

  1. you prompt
  2. you observe its reasoning
  3. you ask follow-up questions

Found a highly promising way to make inroads into pro-code development for the UiPath ecosystem:

Polyglot Notebooks with CSharp and Python

I expect to prototype rapidly the coded source files for UiPath Studio, and in parallel work with the UiPath Python SDK and cli.

Will attempt to implement the (mocked) logic on top of the testdata, with focus on a .cs file for use in UiPath Studio.

This process requires 50-100 data to be processed, enriched and validated. In the past I have experimented with

  • Adding to the TransactionItem.SpecificContent Dictionary (it is mutable!)
  • used (nested) Dictionary(Of String, Object) as input argument โ€œRequestโ€ or output argument โ€œResponseโ€
  • using a CSharp class as โ€œcoded source fileโ€ with properties, getter and setter

This time I want to explore

  • passing in JSON and returning JSON
  • improving on the coded source file appraoch

In case I will decide to use a CSharp class, borrowing heavily from the dreaded โ€œDTO data transfer objectโ€ approch, this calss might look like in the screenshot below.

LLMs handle a conversion from the JSON schema to such a class with ease.
I then refined my whole data-model to avoid any potential CSharp issues like shadowing built-in types. Therefore in the data model I renamed Payer.Type to Payer.Kind (to prevent accidental shadowing of System.Type)

Again, I might now use such a CSharp coded source file, but a bunch or records with a few setter and getter are a powerful approach.

Criteria #1 is that everything remains serializable.
Not only for job suspension, but if this data ends up at the outer surface then such code only remains usable in BPMN processes with a human-in-the-loop if the output is serializable.