Submission type
Coded Agent with UiPath SDK
Name
Abhishek Jagtap
Industry category in which use case would best fit in (Select up to 2 industries)
Finance
Complexity level
Advanced
Summary (abstract)
Overview: The Redaction Agent is an AI-powered automation bot built using LangChain and the UiPath Python SDK. It intelligently scans unstructured documents — such as invoices, forms, or reports — to identify Personally Identifiable Information (PII) including:
Names
Emails
Phone Numbers
Account or ID Details, Bank Details
Once detected, the agent automatically redacts the PII from the document (PDF or text) by replacing it with masked values such as ██████ or [REDACTED].
Detailed problem statement
Problem Statement
Enterprises across legal, financial, and healthcare sectors manage thousands of documents containing Personally Identifiable Information (PII) every month.
The manual redaction process — reading, identifying, and masking sensitive data — is inefficient, inconsistent, and risky.
Key Challenges
Time-Consuming & Expensive: Legal teams spend hours manually reviewing each document, costing hundreds of dollars per file and millions annually.
High Error Rates: Human reviewers frequently miss up to 30% of PII, exposing organizations to GDPR, HIPAA, and CCPA violations.
Policy Inconsistency: Different reviewers interpret redaction policies differently, leading to compliance gaps and audit failures.
Scalability Limitations: Manual processes cannot handle large document volumes during audits, investigations, or discovery requests.
Format Complexity: Documents appear in various formats — scanned PDFs, tables, forms, and unstructured text — making detection difficult.
Lack of True Redaction: Simple masking is often reversible; metadata and hidden layers may still expose sensitive data.
Poor Auditability: Manual logs and spreadsheets offer no real-time tracking or compliance visibility.
Integration Gaps: Existing document management systems lack intelligent redaction or policy automation capabilities.
Detailed solution
How It Works — Solution Architecture
Document Ingestion
• Documents are securely uploaded to the UiPath Storage Bucket, enabling centralized access, version control, and audit-ready storage.
• This ensures traceability and controlled access for all incoming files.
Policy Management (Context Grounding via RAG)
• UiPath Context Grounding dynamically loads organization-specific redaction policies using Retrieval-Augmented Generation (RAG).
• Supports “must redact” lists, safe-value whitelists, and real-time policy updates — ensuring every detection follows the latest compliance standards.
AI Processing Engine (LangChain Framework)
• LangChain Orchestration manages document parsing, chunking, and prompt construction.
Components include:
• PyPDFLoader → text extraction
• RecursiveCharacterTextSplitter → intelligent text segmentation
• ChatPromptTemplate → policy-aware prompt generation
• LLM Chain Composition → robust and contextual PII analysis
AI Detection (Google Gemini 2.5 Flash)
• Performs context-aware PII detection across 10+ entity types (names, SSNs, emails, bank details, etc.).
• Outputs structured JSON results, aligned with organizational policies and compliance formats.
• Ensures high precision with minimal false positives/negatives.
Permanent Redaction (PyMuPDF Engine)
• Employs a ping-pong redaction algorithm to apply irreversible black-box masking.
• Cleans embedded metadata and hidden layers, ensuring complete and permanent redaction.
Guarantees compliance with standards like GDPR, HIPAA, and CCPA.
Output & Notification Automation
• Redacted files are uploaded back to the output bucket, along with a compliance summary report.
• Automated email notifications (via UiPath MCP) alert stakeholders of completion.
• Every step is logged in a tamper-proof audit trail for verification and traceability.
Narrated video link (sample: https://bit.ly/4pvuNEL)
Expected impact of this automation
90% reduction in redaction time per document
AI-driven accuracy with contextual PII detection
Automated policy enforcement and centralized control
Irreversible, audit-compliant redaction process
Seamless UiPath integration for scalable, enterprise-grade automation
Real-time compliance reporting and notifications
Supports multiple document types and formats
Minimizes human error and operational costs
UiPath products used (select up to 4 items)
UiPath Automation Cloud™
UiPath Coded Agents
UiPath Integration Service
UiPath Orchestrator
UiPath Studio Web
Integration with external technologies
Google Gemini
