ClauseMark
Verified Regulatory Evidence Engine - UN Digital Trade (RDTII)

Problem
UN digital-trade benchmarking (RDTII) requires mapping real statutes to policy indicators with defensible evidence. Legal RAG systems routinely hallucinate citations; isolated clause retrieval misses exceptions and definitions in other sections; a confident "0" without measured recall is epistemically unsafe for regulators.
Approach
Regulatory Intelligence Engine (RIE): parent-document hybrid retrieval over a legal structure graph; constrained decoding for indicator IDs; four verification gates (deterministic span checks + NLI + second LLM + self-consistency). Layer 1 is verifiable extraction; Layer 2 is a recommended score band that always requires human confirmation. Pillars and indicators live in YAML-not Python-so all 12 RDTII pillars extend by configuration.
At a glance
Event
UN Digital Trade · 2026
Pillars
12 (6 & 7 to depth)
Packages
15 (uv monorepo)
Verification
4 gates + HITL
Retrieval
BGE-M3 + Qdrant + RRF + rerank
Eval
RAGAS + ablations
Tech decisions
ID-replacement citations
The model never authors citation strings-only span IDs resolved by deterministic code, blocking ghost references.
Two-layer output
Extraction is automatable and gate-checked; scoring is legal judgment and stays a reviewer-confirmed recommendation.
Parent-document retrieval
Child-chunk search with parent+neighbourhood context recovers cross-section exceptions standard chunking severs.
Pillar-as-data
Engine stays generic; legal expertise ships in validated YAML + gold sets-add a pillar without refactoring core code.
LangGraph + Postgres checkpointer
Durable state and interrupt-based human review for flagged claims and absence cases.