Submission type
Coded Agent with UiPath SDK
Name
CPM
Industry category in which use case would best fit in (Select up to 2 industries)
Finance
Marketing/Sales
Complexity level
Beginner
Summary (abstract)
**peppol-participant-discovery ** is a specialist coded agent that autonomously finds PEPPOL Directory entries for companies extracted from unstructured text signatures.
It uses a 2-attempt strategy (naive → refined) combined with LLM-based company-name normalization to deliver reliable, auditable results — and stops gracefully when uncertain.
Designed for repeatability, it saves all reasoning steps and raw PEPPOL responses for human-in-the-loop review.
Detailed problem statement
Organizations transitioning to electronic invoicing through the PEPPOL network face a critical challenge in identifying which business
partners are already PEPPOL-enabled. Contact information exists in scattered, unstructured formats—primarily email signatures—making it
time-consuming and error-prone to manually extract company details and validate PEPPOL registration status. Traditional automation
approaches fail when company names in signatures don’t exactly match PEPPOL Directory entries due to legal suffixes (GmbH, Ltd, Inc),
informal naming, or formatting variations. This creates a bottleneck in e-invoicing onboarding workflows, as finance teams cannot
efficiently discover and connect with PEPPOL-capable suppliers and customers, delaying adoption and requiring manual intervention to
reconcile company identities across systems. An intelligent agent that can parse unstructured signatures, intelligently refine company
names, and automatically validate against the PEPPOL Directory would eliminate this friction point and accelerate network adoption.
Detailed solution
The PEPPOL Participant Discovery agent employs a sophisticated multi-stage approach combining natural language processing, API
integration, and intelligent workflow orchestration to automate the discovery and validation of PEPPOL-enabled business partners from
unstructured email signatures.
Core Architecture:
The solution is built on a LangGraph state machine with seven orchestrated nodes that process data through a intelligent pipeline. At
its foundation, the agent uses Pydantic models for strict type validation and LangChain for LLM integration, ensuring reliable data
handling throughout the workflow.
Stage 1: Data Extraction
The agent begins by parsing unstructured email signature text using a combination of regex patterns and heuristic extraction rules. The
extraction module (lib/extractors/) identifies key components:
- Company Name: Removes common titles, roles, and personal names to isolate the business entity
- Country Code: Detects ISO 2-letter codes using position-aware patterns (end-of-line, after postal codes)
- Email & Domain: Extracts contact information and infers company domain for validation
- Address Components: Captures location data for context enrichment
This extraction prioritizes precision over recall, focusing on clearly identifiable patterns common to European business signatures
(PEPPOL’s primary region).
Stage 2: Naive PEPPOL Search
The extracted company name is used for a direct lookup against the PEPPOL Directory API. The search combines the company name with the
country code to query for exact or near-exact matches. If a participant is found at this stage, the system assigns a high confidence
score (1.0) since the match required no additional processing. This “naive” approach succeeds in approximately 40-60% of cases where
signatures contain formal, complete company names.
Stage 3: Intelligent Refinement
When the naive search fails, the agent invokes an LLM-powered refinement node. This is the solution’s key differentiator. The LLM
(Claude Haiku or similar via OpenRouter) receives:
- The original company name
- Context about common PEPPOL naming conventions
- Instructions to remove legal suffixes (GmbH, Ltd, Inc, AG, etc.)
- Guidance to standardize formatting and remove extraneous characters
The LLM prompt is carefully engineered to handle edge cases:
- Legal Entity Removal: “Acme Corp GmbH” → “Acme Corp”
- Informal to Formal: “Acme” → “Acme Corporation” (if context suggests)
- Special Characters: Remove or standardize punctuation and spacing
- Multi-language Support: Handle European naming conventions
The refinement process is deterministic—the LLM is configured with temperature=0.0 to ensure consistent output for identical inputs.
Stage 4: Refined PEPPOL Search
The LLM-refined company name is used for a second PEPPOL Directory lookup. This retry captures companies whose signatures used informal
or legally-qualified names. Matches at this stage receive a medium confidence score (0.7) to indicate that refinement was required.
This two-attempt strategy increases the overall success rate to 70-85%.
Stage 5: Participant Details Retrieval
When either search succeeds, the agent fetches comprehensive participant details including:
- Full PEPPOL participant ID (e.g., iso6523-actorid-upis::9915:bounce)
- Business entity metadata (legal name, registration identifiers)
- Supported document types and processes
- Registered endpoints and capabilities
This detailed data is preserved in JSON format for downstream systems to consume.
Stage 6: Result Finalization
The finalization node aggregates all extracted and discovered data into a structured output conforming to the predefined Pydantic
schema. It calculates:
- Validation Status: valid (PEPPOL found), not_found (no match), or error (pipeline failure)
- Confidence Score: 1.0 (naive), 0.7 (refined), or 0.0 (not found)
- Search Method: Documents which approach succeeded for audit purposes
Stage 7: Queue Integration
Successfully discovered participants are automatically written to a UiPath Orchestrator queue for downstream processing. Each queue
item includes:
- Slugified Reference: URL-friendly identifier (e.g., bounce-gmbh-de) for easy tracking
- Original Signature Payload: Full unstructured text for human verification
- Extracted Data: Structured company information
- PEPPOL Details: Complete participant metadata as JSON
- Provenance: Source system, search method, and refinement details
This queue integration enables seamless handoff to:
- CRM/ERP enrichment workflows
- E-invoicing onboarding automation
- Master data update processes
- Compliance validation routines
API Integration Strategy
The solution maintains clean separation between data sources:
- Company Data Hub API: Optional Cloudflare Worker endpoint for fetching test signatures (configurable via COMPANYDATAHUB_API_KEY)
- PEPPOL Directory API: Public directory for participant lookup (no authentication required)
- UiPath Orchestrator API: Queue operations via UiPath Python SDK (authenticated via access token)
- OpenRouter/OpenAI API: LLM refinement via LangChain integration (authenticated via API key)
All API calls use httpx for modern async/sync HTTP with automatic retry logic and connection pooling.
Configuration Management
Environment-based configuration via pydantic-settings allows deployment flexibility:
- Local Development: .env file for rapid iteration
- UiPath Cloud: Environment variables and Assets for credential management
- Validation: Pydantic automatically validates all config at startup, failing fast on misconfiguration
Error Handling & Resilience
The agent implements comprehensive error handling:
- Extraction Failures: Return partial data with error field populated
- API Timeouts: Configurable timeouts (default: 30s) with clear error messages
- LLM Failures: Fallback to original company name if refinement fails
- Queue Failures: Log warning but don’t fail the entire pipeline (queue ops are fire-and-forget)
- Validation Errors: Pydantic catches schema violations before they reach the pipeline
Performance & Scalability
The agent is designed for batch processing:
- Synchronous Execution: Simple, predictable execution model for UiPath robot hosting
- Stateless Design: Each invocation is independent, enabling parallel execution
- Efficient API Usage: Minimal API calls (1-2 PEPPOL lookups, 0-1 LLM calls per signature)
- Resource-Conscious: Uses lightweight Haiku model to minimize cost and latency
Observability
Execution transparency through:
- Structured Logging: All stages log key decisions and data transformations
- Provenance Tracking: Every output includes search_method and confidence scores
- Error Context: Failed operations return detailed error messages for debugging
- UiPath Integration: Seamless logging to Orchestrator for centralized monitoring
Extensibility
The modular architecture supports future enhancements:
- Additional Extractors: Pluggable extraction modules for new signature formats
- Custom Refinement: LLM prompt templates can be adjusted per use case
- Alternative Directories: Abstract API client pattern allows swapping PEPPOL for other networks
- Multi-Provider LLMs: LangChain abstraction enables switching between Claude, GPT, or local models
This solution transforms a manual, error-prone process into an automated, intelligent workflow that scales efficiently while
maintaining high accuracy through its two-stage search strategy and LLM-powered company name refinement.
Narrated video link (sample: https://bit.ly/4pvuNEL)
Expected impact of this automation
saves manual work, improves data quality
UiPath products used (select up to 4 items)
UiPath Coded Agents
Automation Applications
none
Integration with external technologies
OpenAI, OpenRouter, PEPPOL
