Build Log "T-Shirt size M+": AI pair programming

cprima · December 4, 2025, 3:50pm

Starting a new RPA project at work. Want to try something different: Abstracting the work to remove all internal references, and publish about it.

The year is 2025 and development must happen >70% in pair programming with LLMs. My showcase in this build log: To document design decisions, prompting techniques and even UiPath Studio code.

Environment: UiPath Studio 2023.10.z and UiPath Orchestrator + Unattended Robot. The lowest common denominator. Let’s see how to push the envelope.
M365 CoPilot, a GPT5.x webfrontend and maybe a bit more.

cprima · December 4, 2025, 5:02pm

[plan]
~2h from screenshots to a v0.1 of the data model

Got a requirement clickpath from business SMEs, with screenshots of the applications containing the data. Plus a rudimentary testing plan, and some data structure in .docx
Comprehensive, I am happy

Pasted a screenshot(!) of input data into M365 CoPilot.
Then prompted as following:

abbreviated transcript of LLM interactions

extract the txt

transform into a table, ready to paste into excel

Yes, create a downloadable Excel file

I am looking for a way to pseudonymize it, for use in external publications

…

let us take a step back, and discuss the taxonomy

generate a list of systems and their orinigal and public names

…

(~10 times back and forth to come up with generic system names)

give me this as excel sheet

(Microsoft CoPilot acted up, and over the course of ~15 Q&A did not deliver a downloadable file. GPT5 came to the rescue)

Outcome:

initial data model of the process

Now that will need minor refinement, but I feel that I achieved a lot in little time.
Will not show the requirement document, but business view was

And I abstracted hardcoded “n items” into a future “array of items”.

I need a table data naming convention, because I develop in a test-driven way.
Therefore I need urgently a corpus of test data, and the naming decisions inevitably will carry forward.

cprima · December 4, 2025, 6:03pm

[plan]
freeform “clickpath” to pseudocode
1h

I love me some “pseudocode”. It is the lingua franca of software engineering.

And LLMs are trained sufficiently on pseudocode. Which as of 2025 is the #1 factor when deciding about technology: “in-distribution” yes or no. Pseudocode is.

I uploaded to M365 CoPilot

the taxonomy mapping from the last step
a freestyle “clickpath” PDF

Here my instructions to the LLM:

OneShot instructions

attached is a process description, and a taxonomy with a mapping from german process-specific terminology to generic data entities

extract from the process description in PSEUDOCODE the validation logic,
then apply the mapping to the generic terminology.

Finally, generate and output pseudocode about the validation. Strip all GUI intetractions from the pseudocode and return only validation logic

Here the result:

pseudocode

###############################################################################
# VALIDATION LOGIC (GUI interactions removed)
# Inputs follow the generic data model (datamodel.xlsx) and array semantics.
# - AccountNumber: string  (customer/account reference)
# - ServicePoints: array of ServicePoint items (0..n), each with:
#     Enabled?: boolean
#     PostalCode: string
#     Identifier?: string
#     Type?: string
#     City?: string
#
# External validation sources:
# - Address Directory (read-only): provides existence checks and classifications.
###############################################################################

CONSTANT VALID_CLASSIFICATIONS = {"FOO", "BAR"}

FUNCTION ValidateServicePoints(AccountNumber, ServicePoints):
    results = []

    FOR EACH sp IN ServicePoints:
        # Skip items explicitly disabled or not provided
        IF sp.Enabled IS DEFINED AND sp.Enabled == FALSE:
            CONTINUE

        # --- Rule: PostalCode is mandatory for any validation ---
        IF NOT Present(sp.PostalCode):
            results.APPEND(Result(sp, status="ESCALATE", reason="MissingPostalCode"))
            CONTINUE

        # --- Case A: PostalCode + Identifier provided ---
        IF Present(sp.Identifier):
            # Must exist as a valid combination for the customer
            IF ExistsCombination(AccountNumber, sp.PostalCode, sp.Identifier):   # from Address Directory
                results.APPEND(Result(sp, status="VALID"))
            ELSE:
                results.APPEND(Result(sp, status="ESCALATE", reason="CombinationNotFound"))
            CONTINUE

        # --- Case B: Only PostalCode provided ---
        # 1) PostalCode must exist for the customer context
        IF NOT ExistsPostalCode(AccountNumber, sp.PostalCode):                  # from Address Directory
            results.APPEND(Result(sp, status="ESCALATE", reason="PostalCodeNotFound"))
            CONTINUE

        # 2) PostalCode must have a valid classification (FOO or BAR)
        cls = GetClassification(sp.PostalCode)                                   # from Address Directory
        IF cls IN VALID_CLASSIFICATIONS:
            results.APPEND(Result(sp, status="VALID"))
        ELSE:
            results.APPEND(Result(sp, status="ESCALATE", reason="InvalidClassification"))

    RETURN results


# -----------------------------
# Helper types (for clarity)
# -----------------------------
TYPE ServicePoint:
    Enabled?: boolean
    PostalCode: string                 # maps to ServicePoint.n.PostalCode
    Identifier?: string                # maps to ServicePoint.n.Identifier
    Type?: string                      # maps to ServicePoint.n.Type
    City?: string                      # maps to ServicePoint.n.City

TYPE Result:
    item: ServicePoint
    status: "VALID" | "ESCALATE"
    reason?: string

# -----------------------------
# External validation contracts
# -----------------------------
FUNCTION ExistsCombination(AccountNumber, PostalCode, Identifier) -> boolean
FUNCTION ExistsPostalCode(AccountNumber, PostalCode) -> boolean
FUNCTION GetClassification(PostalCode) -> string

This is only 1 step in the process, and data already needed to be fetched during processing for the validation to work.
But that structured, concise code will be indispensible for futire LLM pair programming instructions.

cprima · December 4, 2025, 6:20pm

[plan]
1h

Instructions to the LLM

based on the pseudocode, generate a mermaid diagram of the decision logic.
Embed that mermaid diagram in a Markdown file

… some iterations with invalid syntax until the LLM gets the Mermaid syntax right. No model seems capable of this, expecially when non ASCII characters are involved

Given the unstructured input from business, I now achieve a interim deliverable: A diagram with the core decision logic.

Cross-refencing the requirements doc this looks correct in the sense that it contains all gates and data attributes mentioned in the document.

And my notes from the earlier meetings also align.

BUT!
I do not trust the requirements.

screenshot of the diagram

VS Code extension
Markdown Preview Mermaid Support
by Matt Bierner

shows Mermaid diagrams, when embedded into a Markdown file.

Mermaid can also style the diagram elements. LLMs do this well, and I use the corporate colors.

I achieved a lot in half a day. Now I run the diagram for the first time across the business SME.

But I will still not trust that review response
That’s why the next steps will

begin to prepare test driven development
keep flexibility in the decision process

cprima · December 4, 2025, 6:59pm

Follow me on Linkedin for a miniseries about 𝗧𝗵𝗲 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁

Traditional RPA will fail in the new “Maestro” era. You MUST rethink your solution design or risk getting stuck and left behind.

Proof: Look at the Maestro templates and PoC publications, and inspect what happens “at the end of a BPMN edge when data enters the node”.

That’s where I have yet to see a correct example. And it has to do with the lack of conceptual clarity about “functional programming”.

This build log will also set a few things straight.

cprima · December 4, 2025, 8:11pm

I consider this dishonesty at best, and to be frank: I have even more precise words for what is published by the vendor:

NONE of the “templates” seem to have ANY minimum dataflow.
No input arguments, no output arguments.

Maybe that is why all these “safe harbor” slidedeck openers exist: The emperor has no data flow.

That is worrisome.

Even in my traditional RPA build log I will showcase:

the FUNDAMENTAL shift that “orchestration” requires
how traditional RPA can achieve it
and become a 1st class player in the agentic era

In case you did not pay attention: Agents are nothing without data.

For the developer this means: We better learn to manage nested JSON. And learn fast.
Did you do plenty of API automation in the past years? No? You are in a bad position.

This build log will PUSH THE LIMITS of what is possible in traditional RPA. Because that is what we know. That is the foundation from where we have to pivot.

cprima · December 4, 2025, 8:33pm

[plan]

Back to TDD test driven development.

There is the recording from a UiPath Community Event in the Stuttgart, Germany chapter from last year:
Zukunftssicheres RPA: Testing-Methoden und Praxisbeispiele

This shows the mechanics of test-driven-development in UiPath Studio.

I assume that the recording is auto-translated or at least subtitled.

Core slide:

No, despite RPA having “selectors” slurped in, the RPA process does in no way own anything about the GUI.
The GUI is external. And we need to treat it as such, not just in testing, but already before.

And typically there are >1 GUI in a traditional RPA automation. Extra reason to handle it properly.

Test-driven-development, at the core, defines first the testcases, then implements.
This is how I will proceed in this build log.

cprima · December 5, 2025, 6:06am

[plan]
2 hours

Never trust the clickpath to give a robust datamodel.

Went through the datamodel and asked these questions:

during processing, will I need to enrich the record from another system?
If yes, I add a “ExternalReference”
do I have to refence a bunch of data at several times? (“bunch of data” as in "would be a table in a normalized data scheme)
If yes, then I add an Id that I will autogenerate
are there properties implicitly assumed that I should make explicit, so that I can better target a automated system?
Always candidates: timezone, localized address schema that should be interationalized, implicit “ENUMs”
do I have an initial idea about the datatype?
Always I add some metadata, a “created_at” and “updated_at” almost always comes in handy

updated dataschema

Why is that important?

When the process automation industry moves

from scheduling (done in e.g. UiPath Orchestrator)
to orchestration (done e.g. in UiPath “Maestro”)
data will need to be exposed to the outer surface (what in UiPath Studio are the input/output arguments of the entrypoints (commonly Main.xaml, but any workflow can be an entrypoint)).

cprima · December 5, 2025, 7:44am

[plan]
1h

Transform the data model (LLM-extracted from an unstructured clickpath, and manually amended) into something that LLM coding agents can work with.

Instructions and answers

Instructions to Sonnet4.5

generate a python script in ./tools/ that (hardcoded) takes the @dat/data-model/incoming/data-model.xlsx and creates a JSON schema for it.

Done. Created tools/xlsx_to_json_schema.py which:

Reads dat/data-model/incoming/data-model.xlsx

Converts each row to a JSON Schema property

Maps datatypes (string, integer, boolean, enum → JSON Schema types)

Handles multiplicity notation (1 = required, 0..1 = optional, 0..n = array)

Outputs to dat/data-model/incoming/data-model.schema.json
Run it with:
source .venv/bin/activate && python tools/xlsx_to_json_schema.py

Instructions to GPT5:

read @plan/dat/data-model/incoming/data-model.schema.json and critically revise the schema.
it is autogenerated from a script

I then pasted these findings into a chat with Sonnet4.5

GPT5 seems useless to write good code, the Anthropic models as of 2025-Q4 are way superior.
But GPT5 excels at critiquing other model’s code!

Preliminary json schema:

The target system to automate handles data in an object-relational mapper, so I move from the dot notation to a proper nested object representation.

xlsx: Account.Id
json: account { id, …}

plan/dat/data-model/incoming/data-model.schema.json

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/data-model/v1.0.0",
  "title": "Data Model Record",
  "description": "Schema for a single data model record representing customer/account data with contacts, addresses, payment information, and service points.",
  "type": "object",
  "properties": {
    "record": {
      "type": "object",
      "properties": {
        "sourcesystem": {
          "type": "string",
          "default": "RPA-Process",
          "x-tier": "defaulted",
          "description": "Upstream system code from which the record originates; used for routing and audit."
        },
        "createdat": {
          "type": "string",
          "format": "date-time",
          "x-tier": "generated",
          "description": "ISO‑8601 timestamp when the record was created in the system of record."
        },
        "updatedat": {
          "type": "string",
          "format": "date-time",
          "x-tier": "generated",
          "description": "ISO‑8601 timestamp of the last material update; supports incremental sync."
        },
        "version": {
          "type": "integer",
          "x-tier": "generated",
          "description": "Monotonically increasing revision counter for optimistic concurrency control."
        }
      },
      "additionalProperties": false
    },
    "account": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "x-tier": "generated",
          "description": "Opaque internal unique identifier of the account."
        },
        "number": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Human-readable account/customer reference used for billing and reconciliation."
        },
        "externalreference": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Optional partner-side mapping reference used for cross-system correlation."
        }
      },
      "additionalProperties": false,
      "required": [
        "number"
      ]
    },
    "contact": {
      "type": "object",
      "properties": {
        "email": {
          "type": "string",
          "format": "email",
          "x-tier": "trigger",
          "description": "RFC-compliant email address; validated and normalized."
        },
        "fullname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Display name composed from salutation/title/given/family name."
        },
        "phone": {
          "allOf": [
            {
              "$ref": "#/$defs/phoneNumber"
            }
          ],
          "x-tier": "trigger",
          "description": "Primary fixed-line phone number; canonical format recommended."
        },
        "callbackwindow": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Free-text or structured callback availability window; stored as opaque payload."
        },
        "preferredchannel": {
          "type": "string",
          "enum": [
            "PHONE",
            "EMAIL",
            "MAIL"
          ],
          "default": "PHONE",
          "x-tier": "defaulted",
          "description": "Preferred communication channel (PHONE, EMAIL, MAIL)."
        },
        "timezone": {
          "allOf": [
            {
              "$ref": "#/$defs/timezone"
            }
          ],
          "default": "Europe/Berlin",
          "x-tier": "defaulted",
          "description": "IANA time zone identifier for local-time interpretation."
        }
      },
      "additionalProperties": false,
      "required": [
        "email",
        "fullname",
        "phone",
        "callbackwindow"
      ]
    },
    "organization": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Registered legal name of the organization."
        },
        "legalform": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Legal entity type (e.g., GmbH, AG)."
        },
        "ownername": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Owner/shareholder name(s), including special cases."
        },
        "vatid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "VAT identification number; validated where possible."
        },
        "taxid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Local tax identification number."
        },
        "registrationnumber": {
          "type": "string",
          "x-tier": "enriched",
          "description": "Official commercial register number."
        },
        "peppolid": {
          "type": "string",
          "x-tier": "enriched",
          "description": "PEPPOL participant identifier for e-invoicing."
        }
      },
      "additionalProperties": false,
      "required": [
        "name",
        "legalform",
        "ownername"
      ]
    },
    "address": {
      "type": "object",
      "properties": {
        "addressline1": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Primary address line including street and building details."
        },
        "addressline2": {
          "type": "string",
          "default": "",
          "x-tier": "defaulted",
          "description": "Optional secondary line (suite, floor, c/o)."
        },
        "housenumber": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Building/house number including suffixes."
        },
        "additional": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Additional address information not captured elsewhere."
        },
        "postalcode": {
          "allOf": [
            {
              "$ref": "#/$defs/postalCode"
            }
          ],
          "x-tier": "trigger",
          "description": "Postal/ZIP code; stored as string to preserve formatting."
        },
        "city": {
          "type": "string",
          "x-tier": "trigger",
          "description": "City or locality name."
        },
        "region": {
          "type": "string",
          "default": "",
          "x-tier": "defaulted",
          "description": "Administrative region (state/province)."
        },
        "country": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Display country/region label."
        },
        "countrycode": {
          "allOf": [
            {
              "$ref": "#/$defs/countryCode"
            }
          ],
          "x-tier": "enriched",
          "description": "ISO‑3166‑1 alpha‑2 country code."
        }
      },
      "additionalProperties": false,
      "required": [
        "addressline1",
        "housenumber",
        "additional",
        "postalcode",
        "city",
        "country"
      ]
    },
    "payment": {
      "type": "object",
      "properties": {
        "iban": {
          "allOf": [
            {
              "$ref": "#/$defs/iban"
            }
          ],
          "x-tier": "trigger",
          "description": "International Bank Account Number; validated via checksum rules."
        },
        "bic": {
          "allOf": [
            {
              "$ref": "#/$defs/bic"
            }
          ],
          "x-tier": "trigger",
          "description": "Bank Identifier Code (SWIFT)."
        },
        "bookingtext": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Booking/reference text included with payment transactions."
        },
        "schedule": {
          "type": "string",
          "enum": [
            "TEN_DAYS",
            "MONTHLY",
            "QUARTERLY",
            "SEMI_ANNUALLY",
            "ANNUALLY"
          ],
          "x-tier": "trigger",
          "description": "Billing frequency (MONTHLY, QUARTERLY, etc.)."
        },
        "currency": {
          "allOf": [
            {
              "$ref": "#/$defs/currencyCode"
            }
          ],
          "default": "EUR",
          "x-tier": "defaulted",
          "description": "ISO‑4217 three‑letter currency code."
        },
        "method": {
          "type": "string",
          "enum": [
            "SEPA_DIRECT_DEBIT",
            "BANK_TRANSFER",
            "CREDIT_CARD",
            "INVOICE"
          ],
          "default": "SEPA_DIRECT_DEBIT",
          "x-tier": "defaulted",
          "description": "Payment method (SEPA_DIRECT_DEBIT, BANK_TRANSFER, etc.)."
        },
        "mandatereference": {
          "type": "string",
          "x-tier": "trigger",
          "description": "SEPA mandate reference; unique per creditor."
        },
        "mandatesignaturedate": {
          "type": "string",
          "format": "date",
          "x-tier": "defaulted",
          "description": "ISO‑8601 date when the SEPA mandate was signed."
        }
      },
      "additionalProperties": false,
      "required": [
        "iban",
        "bic",
        "bookingtext",
        "schedule",
        "mandatereference"
      ]
    },
    "payer": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "x-tier": "generated",
          "description": "Internal identifier of the payer entity."
        },
        "type": {
          "type": "string",
          "enum": [
            "PERSON",
            "ORGANIZATION"
          ],
          "x-tier": "trigger",
          "description": "Payer classification (PERSON, ORGANIZATION)."
        },
        "firstname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Payer given name."
        },
        "lastname": {
          "type": "string",
          "x-tier": "trigger",
          "description": "Payer family name."
        },
        "dateofbirth": {
          "type": "string",
          "format": "date",
          "x-tier": "trigger",
          "description": "Payer date of birth in ISO‑8601 format."
        }
      },
      "additionalProperties": false,
      "required": [
        "type",
        "firstname",
        "lastname",
        "dateofbirth"
      ]
    },
    "servicepoint": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "id": {
            "type": "string",
            "x-tier": "generated",
            "description": "Internal identifier of a service point."
          },
          "externalreference": {
            "type": "string",
            "x-tier": "enriched",
            "description": "External mapping reference for the service point."
          },
          "enabled": {
            "type": "boolean",
            "x-tier": "trigger",
            "description": "Flag indicating if the service point is active."
          },
          "type": {
            "type": "string",
            "enum": [
              "POSTBOX",
              "ADDRESS",
              "PICKUP_POINT",
              "LOCKER"
            ],
            "x-tier": "trigger",
            "description": "Type/category of service point (e.g., postbox, address)."
          },
          "identifier": {
            "type": "string",
            "x-tier": "trigger",
            "description": "Identifier/number of the service point."
          },
          "postalcode": {
            "allOf": [
              {
                "$ref": "#/$defs/postalCode"
              }
            ],
            "x-tier": "trigger",
            "description": "Postal code associated with the service point."
          },
          "city": {
            "type": "string",
            "x-tier": "trigger",
            "description": "City/town for the service point."
          }
        },
        "required": [
          "enabled",
          "type",
          "identifier",
          "postalcode",
          "city"
        ],
        "additionalProperties": false
      },
      "description": "Array of service points associated with the account.",
      "minItems": 1,
      "x-tier": "trigger"
    }
  },
  "additionalProperties": false,
  "required": [
    "account",
    "address",
    "contact",
    "organization",
    "payer",
    "payment",
    "servicepoint"
  ],
  "$defs": {
    "iban": {
      "type": "string",
      "pattern": "^[A-Z]{2}[0-9]{2}[A-Z0-9]{4,30}$",
      "description": "International Bank Account Number (IBAN)"
    },
    "bic": {
      "type": "string",
      "pattern": "^[A-Z]{4}[A-Z]{2}[A-Z0-9]{2}([A-Z0-9]{3})?$",
      "description": "Bank Identifier Code (BIC/SWIFT)"
    },
    "countryCode": {
      "type": "string",
      "pattern": "^[A-Z]{2}$",
      "description": "ISO 3166-1 alpha-2 country code"
    },
    "currencyCode": {
      "type": "string",
      "pattern": "^[A-Z]{3}$",
      "description": "ISO 4217 three-letter currency code"
    },
    "phoneNumber": {
      "type": "string",
      "pattern": "^\\+?[0-9\\s\\-\\(\\)]{6,20}$",
      "description": "Phone number in international or local format"
    },
    "postalCode": {
      "type": "string",
      "pattern": "^[A-Z0-9\\-\\s]{3,10}$",
      "description": "Postal/ZIP code"
    },
    "timezone": {
      "type": "string",
      "pattern": "^[A-Za-z]+(?:_[A-Za-z]+)?/[A-Za-z0-9_+-]+$",
      "description": "IANA timezone identifier (e.g., Europe/Berlin)"
    }
  },
  "allOf": [
    {
      "if": {
        "properties": {
          "payment": {
            "properties": {
              "method": {
                "const": "SEPA_DIRECT_DEBIT"
              }
            },
            "required": [
              "method"
            ]
          }
        }
      },
      "then": {
        "properties": {
          "payment": {
            "required": [
              "mandatereference",
              "mandatesignaturedate"
            ]
          }
        }
      }
    }
  ]
}

The script is in source control, prepared for the development easily re-created on different developer machines.

This is all still initial “plan” solution design. Any data structure is subject to change. But I never hesitate to put it in code even early.

cprima · December 5, 2025, 1:08pm

[plan]
4h

Data, data, data – tata!

I have yet to open UiPath Studio for even a first time in this project.

But spent the next half a day generating comfortable, repeatable, high-quality test data files.

based on the JSON schema and the datamodel .xlsx
leveraging common Python testdata libraries
a project-specific cli tool now generates template-based output
a) into a .xlsx corpus file
b) into individual text files resembling the multiline input that the process works on

I am not re-creating the LLM instructions here, but Python scripting worked again well with Anthropic Sonnet 4.5
As of 2025-Q3 I am following somewhat the BMAD method, and diligently have the pair programming coding agent write out a PLAN.md file.

cli commands

(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/generate_testdata.py --help

 Usage: generate_testdata.py [OPTIONS]

 Generate synthetic test data for the data model.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --count               -n      INTEGER  Number of test cases to generate [default: 20]                                                                                        │
│ --append              -a               Append to existing corpus instead of overwriting                                                                                      │
│ --output              -o      PATH     Output file path                                                                                                                      │
│ --dry-run                              Show what would be generated without writing files                                                                                    │
│ --install-completion                   Install completion for the current shell.                                                                                             │
│ --show-completion                      Show completion for the current shell, to copy it or customize the installation.                                                      │
│ --help                                 Show this message and exit.                                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/render_testdata.py --help

 Usage: render_testdata.py [OPTIONS] COMMAND [ARGS]...

 Render test data corpus using Jinja2 templates.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                                                                                                      │
│ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                                               │
│ --help                        Show this message and exit.                                                                                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ list     List available profiles.                                                                                                                                            │
│ render   Render test data corpus using a profile.                                                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

(pivotalfluxion) me@home:/mnt/d/github.com/rpapub/Pivotal-Fluxion-worktrees/plan$ .venv/bin/python tools/render_testdata.py render --help

 Usage: render_testdata.py render [OPTIONS]

 Render test data corpus using a profile.

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --profile       -p      TEXT  Profile name (without .yaml extension) [required]                                                                                           │
│    --rows          -r      TEXT  Row range (e.g., '1', '1-5', 'all')                                                                                                         │
│    --sheet         -s      TEXT  Sheet name to read from [default: Synthetic]                                                                                                │
│    --corpus        -c      PATH  Corpus file path                                                                                                                            │
│    --profiles-dir  -d      PATH  Profiles directory                                                                                                                          │
│    --dry-run                     Show what would be rendered without writing files                                                                                           │
│    --help                        Show this message and exit.                                                                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

A sizeable amount of time was spent on the repo organization: I am pushing git worktree further than I ever did before. And subtle issues with parallel use on Windows and in WSL Windows Subsystem for Linux took some time to figure out.

The Python package faker is great to generate even localized test data. I have now hundreds of pseudo-random testdata files available, and the data is rendered through templates for maximum flexibility.

Because I still do not trust any requirement unless I really fetch it from source systems. Documentation is best treated with caution, and every little flexibililty to react to changes is wisely invested time.

Over the weekend I should find some time to polish a few rough edges of the repository, then I can publish a link and reference e.g. the scripts or BMAD-inspired “PLAN.md” files directly.

cprima · December 5, 2025, 3:16pm

With the data model, data validation testdata and sufficiently out of the way now the focus will shift to the logic.

Anticipating the era of orchestration the emphasis on data is the right decision, as orchestration is based on passing data around, and even AIs cannot do much without access to and understanding of data.

cprima · December 5, 2025, 5:11pm

[plan]
0h (because fun stuff in the free time on a weekend)

The power of pseudocode: LLMs understand it

I am pushing hard into the logic territory now.
But guess what? Logic deals with data.

Next milestone, and it will take a couple of posts/steps to achieve, is about a bunch of booleans.

Google “guard clauses”. Long story short: They make your “expand all” UiPath Studio view to remain horizontally lean.
If you go the extra mile: Research how the programming language Go / Golang deals with err.

“Guard clauses”. Schedule the next 3 weekends to grok those. It will transform your coding style.

I will now, that the v0.1 data model was established, focus on the logic.
And with logic I do not mean mickey-mouse flowcharts, but hard facts:

I got payload data
data is enriched and looked-up in systems
which decisions am I going to make on the basis of looked-up data
long story short: I am aiming for a bunch of (aggregate) booleans:
- isAutomatable
- isValid

If you did read the above carefully, then you notices the “array of something”.
An “aggregate isAutomatable” is in the making.

Stay tuned!

cprima · December 5, 2025, 5:48pm

[plan]

Provenance Tracking: Understanding the Challenge

The Core Question

For each piece of data in a record, we need to answer:

Where did this value come from? (Payload, Address Directory, Config DB)
Was it validated? (Against what? With what result?)
Was it overridden? (By which authoritative source?)
Does it contribute to the go/no-go decision?

How to take the foundation of the data model conceptually to the next level?

Option A

Inline Metadata (minimal)

Each field carries its own provenance:

ServicePoint {
PostalCode: “12345”
PostalCode_Source: “PAYLOAD”
PostalCode_Validated: true
PostalCode_ValidationResult: “EXISTS”

 Classification: "FOO"
 Classification_Source: "ADDRESS_DIRECTORY"

}

Pros: Self-contained, easy to serialize
Cons: Verbose, duplicates structure

Option B

Parallel Provenance Object

Separate object mirrors the data structure:

Record {
servicePoint: [{ postalCode: “12345”, … }]
}

Provenance {
servicePoint: [{
postalCode: { source: “PAYLOAD”, validated: true, validator: “ADDRESS_DIRECTORY” }
classification: { source: “ADDRESS_DIRECTORY”, enriched: true }
}]
organization: {
name: { source: “CONFIG_DB”, overridden: true, originalValue: “Payload Name” }
}
}

Pros: Clean separation, no field name pollution
Cons: Must keep structures in sync

Option C

Validation Result Object (event-sourcing style)

Track the sequence of validations/enrichments:

ValidationTrace {
steps: [
{ step: “PARSE_PAYLOAD”, timestamp: T1, fields: [“*”] }
{ step: “VALIDATE_SP_0”, timestamp: T2,
input: { postalCode: “12345”, identifier: “ABC” },
lookup: “ExistsCombination”,
result: “VALID” }
{ step: “LOOKUP_CONFIGDB”, timestamp: T3,
input: { accountNumber: “ACC-123” },
result: “EXISTS”,
overrides: { “organization.name”: { from: “Payload”, to: “Master” } } }
]
finalDecision: “ALL_VALID”
}

Pros: Full audit trail, can replay decisions
Cons: More complex, larger payload

And then there is a common challenge: 1 transaction item, many subordinate things to validate, and proceed only if aggregate valid. And handover to human-in-the-loop with actionable findings.

 Aggregate Decision

 FOR ALL enabled ServicePoints:
   ALL must be VALID
 THEN: Proceed to write configuration
 ELSE: Escalate (with per-item reasons)

cprima · December 5, 2025, 6:15pm

cprima · December 6, 2025, 1:19am

Here is a generalized view on the logic (I still have to publish a diagram or other documentation, bear with me for now).

The requirements present a multi-source, conflict-tolerant decision model:

the RPA process receives an initial payload, then enriches it with data from independent external systems.
Each system may provide overlapping or contradictory values.
Decisions depend not only on the final merged state but on which system asserted which fact and whether values across sources are consistent, missing, or conflicting.

The core challenge is designing an application model that

can represent parallel views of the same fields,
preserve provenance, support comparisons,
and drive deterministic routing logic without embedding unscalable metadata into the core schema.

I come across such non-linear logic all the time.
Maybe because that’s when I am called in.

DO +++ NOT +++ LET +++ CLICKPATHS +++ CONFUSE +++ YOU

The clickpath typically contains further down the path some decisions and which data to use.
It is tempting to “deal with it later”.
But when you come across late in the clickpath requirements like “if a record exists with the value of … (input data xyz), then update via this path, else create from scratch”

THEN +++ YOU +++ DEAL +++ WITH THIS +++ FROM THE START

“Dealing with” is 2 layered:

you precisely collect data
you design an extensible decision logic around that

Do not trust that the requirements are final. Design for adaptability.

cprima · December 6, 2025, 2:07am

The following is only for the software-engineering-intellectually-curious personality, and just “context”, not necessarily the implementation detail.

Here ten modeling options, each with distinct trade-offs.
I already made up my mind about two or three being candidates to model this process.

Option A: Separate Per-Source Payloads

Maintain distinct objects for each data source.


{

"request": { "address": { "postalCode": "12345" } },

"addressDirectory": { "classification": "FOO" },

"configDatabase": { "address": { "postalCode": "99999" } },

"decisions": { }

}

Pros

Provenance is trivial: the value’s location indicates its source
Each source view remains immutable and auditable
Easy to store full snapshots for replay and debugging

Cons

Multiple copies of overlapping structures increase payload size
Decision logic must explicitly compare across sections
Adding a new source system widens the structure

Good when

Clarity and auditability outweigh compactness concerns
A comparison layer can be built to evaluate cross-source consistency

Option B: Provenance Embedded into Data Model

Attach provenance metadata directly to each field in a single canonical object.

{
  "address": {
    "postalCode": "12345",
    "postalCode_provenance": {
      "source": "REQUEST",
      "enrichedFrom": "CONFIG_DB",
      "status": "OVERRIDDEN"
    }
  }
}

Pros

Single canonical object simplifies downstream consumers
Schema can encode tier hints (trigger, enriched, generated, defaulted)
Works well for simple provenance (who wrote the current value)

Cons

Does not scale when all competing values must be retained
Complex for fields with many sources or update histories
Tooling must handle { value, metadata } instead of simple types

Good when

Only the final chosen value and its source matter
Equality decisions on raw competing values are not required downstream

Option C: Parallel Views with Explicit Comparison Layer

Combine per-source views with a dedicated comparison section.

{
  "request": { },
  "addressDirectory": { },
  "configDatabase": { },
  "comparisons": {
    "address.postalCode": {
      "request": "12345",
      "configDatabase": "99999",
      "status": "CONFLICT",
      "preferred": "configDatabase"
    }
  }
}

Pros

Provenance and equality logic are explicit and localized
Core per-source models remain simple
Decisions can attach directly to comparison results

Cons

Requires an additional step to build and maintain comparisons
A stable field addressing scheme is necessary

Good when

Multiple decisions depend on nuanced consistency across systems
Human-readable logs and analysis are important

Option D: Fact Table (Evidence List)

Model each value as an independent fact with source and path.

{
  "facts": [
    { "path": "address.postalCode", "source": "REQUEST", "value": "12345", "ts": "..." },
    { "path": "address.postalCode", "source": "CONFIG_DB", "value": "99999", "ts": "..." }
  ]
}

Application logic queries facts by path, checks consistency, and resolves by precedence.

Pros

Scales with more sources and temporal changes
Generic equality checks and conflict detection
Easy to audit and replay

Cons

Higher conceptual overhead
Requires helpers to materialize a canonical view for downstream consumers
Path and filtering logic must be implemented carefully

Good when

Provenance and conflict analysis are first-class requirements
The system is expected to grow with additional sources and rules

Option E: Event Log of Enrichment Steps

Model each enrichment as an event capturing input, output, and result.

{
  "events": [
    { "step": "VALID_010", "system": "RPA", "input": { }, "output": { }, "result": "OK" },
    { "step": "ADDR_LOOKUP", "system": "ADDR", "input": { "postalCode": "12345" }, "output": { "classification": "FOO" }, "result": "OK" }
  ]
}

Pros

Excellent for audit, debugging, and re-simulation
Every decision links to an explicit event
State at any step can be reconstructed

Cons

Not ideal as the primary query model for decisions
Requires aggregation into another pattern for final state
Can become verbose

Good when

Traceability and replay are primary concerns
Another pattern (A, C, or D) provides the main decision substrate

Option F: Layered Override Model

Model data as a stack of layers with explicit precedence.

base:     request       (lowest priority)
layer 1:  addressDirectory
layer 2:  configDatabase (highest priority)
final:    computed by collapsing layers

Each layer contains only deltas (changed fields). The final view is computed by merging layers in order.

Pros

Simple, deterministic precedence
Provenance is implicit in layer origin
Familiar pattern (CSS cascading, Terraform merge)

Cons

Hides conflicts: last writer wins by design
Cannot express “conflict detected” as a distinct state
Less suitable when conflict is a business-relevant signal

Good when

Precedence rules are fixed and unambiguous
Conflicts should be resolved silently rather than surfaced

Option G: Field-Level State Machine

Come on!?! You are NOT reading this, aren’t? You actually care that much, and go down each rabbit hole? Let’s connect https://www.linkedin.com/in/cprima/

Each field transitions through a lifecycle.

PENDING -> VALIDATED -> ENRICHED -> OVERRIDDEN -> FINAL
              \-> CONFLICT ->/

Provenance is implicit in the transition history.

Pros

Explicit lifecycle stages per field
Conflicts and overrides are distinct states
Good for workflows with discrete processing phases

Cons

Complex to implement and maintain
Overkill if only before/after states matter
Requires state management infrastructure

Good when

Fields move through well-defined processing stages
Lifecycle visibility is a requirement

Option H: Dual-Object Pattern (Input vs Resolved)

Maintain two distinct objects: raw input and resolved output.


{

"input": { },

"resolved": { }

}

Pros

Simple and clear separation of concerns
Input remains immutable for audit
Downstream consumers use only the resolved view

Cons

Loses intermediate provenance (which system contributed to resolved)
Cannot distinguish between sources that agreed vs single-source resolution

Good when

Only raw input and final output matter
Intermediate provenance is not required

Option I: Schema-Defined Source Binding

Declare in the schema which source is authoritative for each field.


address.postalCode:

authoritative: CONFIG_DB

fallback: [ADDRESS_DIRECTORY, REQUEST]

organization.name:

authoritative: CONFIG_DB

fallback: [REQUEST]

servicePoint[].classification:

authoritative: ADDRESS_DIRECTORY

fallback: []

Resolution logic becomes generic and configuration-driven.

Pros

Declarative and easy to reason about
Resolution logic is consistent and centralized
Schema serves as documentation

Cons

Less flexible for conflict-as-signal scenarios
Requires schema extensions or external configuration
Dynamic precedence rules are harder to express

Good when

Source authority is stable and well-defined
A declarative, configuration-driven approach is preferred

Option J: Graph or Triple Store

Model data as RDF-style triples or a property graph.


(address.postalCode, assertedBy, REQUEST)

(address.postalCode, hasValue, "12345")

(address.postalCode, assertedBy, CONFIG_DB)

(address.postalCode, hasValue, "99999")

(address.postalCode, conflictsWith, address.postalCode)

Pros

Maximum flexibility for querying relationships
Can express arbitrary provenance and conflict relationships
Extensible to complex scenarios

Cons

Heavy infrastructure requirements
Rarely justified for typical business automation
Steeper learning curve for teams

Good when

Building a knowledge graph or semantic data platform
Query flexibility outweighs implementation complexity

cprima · December 6, 2025, 2:44am

I prefer to read systems and collect facts,
only to decide late when I have all my facts collected.

I even open and close GUIs just to read, and conditionally re-open them in case I need to write later.

In any non-trivial, requirements-volatile process, NEVER go down the quagmire of opening multiple systems, in nested logic, haphazardly passing data. That is amateurish and you MUST strive for a better solution design.

Why is that important? Agents.

Undoubtedly you did not live under a rock, so you know future automations will make autonomous decisions. Even traditional RPA needs to adapt, to remain relevant, at least as a “tool” (no pun intended).

cprima · December 6, 2025, 3:22am

Enlightening use of LLMs as pair programming:

Current reasoning LLMs rarely get a shout out for their educational value:

you prompt
you observe its reasoning
you ask follow-up questions

cprima · December 7, 2025, 10:05am

Found a highly promising way to make inroads into pro-code development for the UiPath ecosystem:

Polyglot Notebooks with CSharp and Python

I expect to prototype rapidly the coded source files for UiPath Studio, and in parallel work with the UiPath Python SDK and cli.

Will attempt to implement the (mocked) logic on top of the testdata, with focus on a .cs file for use in UiPath Studio.

cprima · December 8, 2025, 1:49pm

This process requires 50-100 data to be processed, enriched and validated. In the past I have experimented with

Adding to the TransactionItem.SpecificContent Dictionary (it is mutable!)
used (nested) Dictionary(Of String, Object) as input argument “Request” or output argument “Response”
using a CSharp class as “coded source file” with properties, getter and setter

This time I want to explore

passing in JSON and returning JSON
improving on the coded source file appraoch

In case I will decide to use a CSharp class, borrowing heavily from the dreaded “DTO data transfer object” approch, this calss might look like in the screenshot below.

LLMs handle a conversion from the JSON schema to such a class with ease.
I then refined my whole data-model to avoid any potential CSharp issues like shadowing built-in types. Therefore in the data model I renamed Payer.Type to Payer.Kind (to prevent accidental shadowing of System.Type)

Again, I might now use such a CSharp coded source file, but a bunch or records with a few setter and getter are a powerful approach.

Criteria #1 is that everything remains serializable.
Not only for job suspension, but if this data ends up at the outer surface then such code only remains usable in BPMN processes with a human-in-the-loop if the output is serializable.

Topic		Replies	Views
UiPath Viet Nam Ask in Your Language activities , question , vietnamese	78	11091	November 2, 2025
Best Practice Webinar QnA Events faq , webinar	3	2874	September 7, 2022
QA Agent: Expand Process by Documented Response AgentHack information-technology-and-services , operations , beginner , uipath-orchestrator , uipath-studio-web , enterprise-agents , one-agent , uipath-agent-builder , uipath-maestro	0	49	July 3, 2025
Request to share Complex Business or Workflow Design Using Ui path Studio RPA Discussions machine-learning , coding , business-analytics	3	969	April 7, 2022
UiPath Data Service \| Configure, Enable, and Usage of Entities Video Tutorials question , faq , data_service	43	4911	June 12, 2025

Build Log "T-Shirt size M+": AI pair programming

Why is that important?

Instructions to Sonnet4.5

Preliminary json schema:

The Core Question

Related topics