ADR-0084: Auto-Agent as a Universal, Persistent, Root-Cause HITL Gate¶
- Status: Proposed
- Date: 2026-06-12
- Supersedes: none
- Superseded by: none
- Amends: ADR-0050 (Auto-Agent HITL Pre-Flight Loop) — keeps its architecture, tightens its escalation contract.
- Related: ADR-0002 (label state machine); ADR-0029 (caretaker loop pattern); ADR-0044 (principles audit, recursion safety); ADR-0045 (
hitl-escalation); ADR-0049 (enabled_cbkill-switch); ADR-0032 (wiki for learned playbooks). - Enforced by (planned):
tests/test_auto_agent_preflight_loop.py;tests/test_escalation_mixin.py;tests/test_pipeline_human_required_guard.py;tests/scenarios/test_auto_agent_convergence_scenario.py;tests/test_loop_wiring_completeness.py.
Context¶
ADR-0050 introduced the AutoAgentPreflightLoop to honour the dark-factory contract: intercept every hitl-escalation issue, let an auto-agent ("emulated Travis") attempt a fix, and page a human only for genuinely novel failures. The architecture is sound and shipped. In practice the gate is too timid, so the meta-pattern it was meant to kill keeps recurring — this very session resolved four instances of it by hand:
- ADR-drift false positives (#9304–#9309), fake-coverage false positives (#9400/#9403), trust anomalies (#9222/#9223/#9275), and the retrospective stale-insight HITL dead-end (#9227, fixed in #9431). Every one is the factory escalating to a human something it should have auto-resolved or never escalated.
- #9275 is the smoking gun: the auto-agent fired, diagnosed "all background tasks confirmed clean state", returned
needs_humanon attempt 1 with no fix and no retry, and the issue then sat until a human closed it.
Five concrete gaps (mapped against the live code) explain the timidity:
- One-shot bail. The 3-attempt budget is inter-issue, not used to retry a single weak/transient bail.
parse_agent_responsedefaults toneeds_humanon any malformed output. There is noretryoutcome and no confidence/blocked-reason signal, so a transient "context gather failed" is indistinguishable from a genuine human-only blocker. - Playbook-limited → cannot fix novel issues. Only five phase specialists (plan/implement/review/discover/triage) plus a generic
_default. The ~20*-stuckcaretaker domains (fake-coverage, adr-drift, flaky, wiki-rot, shadow-drift, …) fall to_defaultwith generic guidance — a weak attempt that converts straight toneeds_human. - Issues cycle.
human-requiredis invisible to the core phases and absent fromall_pipeline_labels, so it survivesswap_pipeline_labels. An exhausted issue keeps its origin label, re-enters plan→diagnose→hitl→auto-agent, and repeats. There is no global escalation cap. - Not everything routes through the gate. Routing keys off
hitl-escalation; the deny-list and any future loop that forgets to pairhitl-escalationwith its*-stucklabel skip the agent entirely. The escalation pattern is copy-pasted across 10+ loops with no shared helper, so drift is inevitable. - No root-cause convergence or learning. The agent fixes the symptom issue, never confirms the root cause won't recur, and discards successful fixes (no learned-playbook cache).
The operating principle we want: almost nothing reaches a human. A human is paged only for a true "on-fire" condition — never for routine, mechanically- or agentically-resolvable toil, and never because the gate gave up early.
Decision¶
Make the Auto-Agent a universal, persistent, root-cause gate. Five pillars.
A. Universal interception — every escalation hits the auto-agent first¶
- Extract a shared
BaseEscalationMixinused by all caretaker loops; it always fileshitl-escalation+ exactly one domain*-stucksub-label, and owns the reconcile-on-close lifecycle. This guarantees the single routing chokepoint and removes the copy-paste drift. - Add a central stuck-label registry in config (name → policy:
auto_agent_allowed,domain_context_pack,description). A pre-flight validator rejects/flags anyhitl-escalationissue that lacks a registered sub-label, so a new producer cannot silently bypass the gate. - Shrink the bypass deny-list to the recursion-critical minimum (self-judgment:
principles-stuck,cultural-check). The auto-agent's own code is already protected by tool-layer path restrictions, so it need not be on a label deny-list.
B. Convergence loop — retry to root cause, escalate only when truly blocked¶
- Add a
retryoutcome toPreflightResult.status, distinct fromneeds_human. - The agent's
<diagnosis>gains aconfidence(high|medium|low) and ablocked_reasonenum:transient|insufficient_context|needs_human_decision|needs_credentials|needs_permissions|unsafe|none. apply_decisionescalates tohuman-requiredonly whenblocked_reason ∈ {needs_human_decision, needs_credentials, needs_permissions, unsafe}athighconfidence. Everything else (transient,insufficient_context, low-confidence) becomesretry: the loop keeps the issue, broadens context (more git/Sentry/wiki/test history) and a different approach on the next cycle.- Replace the flat 3-strike cap with a graduated escalation budget (probe → specialist → broadened-context → adversarial self-review), each attempt distinct.
- Parse robustness: a structured fallback (scan the diagnosis for blocker keywords) instead of defaulting to
needs_human.
C. Anti-cycle guardrails — bounded, never infinite¶
- Add
human-requiredtoall_pipeline_labelsso a successful HITL correction's swap clears it. - The core phases (plan/implement/review/triage) skip any task tagged
human-required. - Add a global per-issue escalation counter in
StateTracker, incremented on every escalation regardless of source/phase. When an issue exceeds the cap (default 6), it is forced to true-HITL with a "genuinely stuck after N diverse attempts" comment — this is the legitimate true-HITL trigger, and the cycle stops.
D. Novel-issue capability — a generic root-cause resolver¶
- Replace the weak
_defaultwith a first-class root-cause resolver persona: form a hypothesis → make the change → verify (tests/repro) → iterate. It is not limited to known shapes. - Feed it lightweight, per-domain context packs (what each caretaker loop checks and where its code lives) keyed off the stuck-label registry, so it is grounded for any of the ~20 domains without a hand-written playbook each.
E. Learning + observability¶
- On a confirmed resolution, cache the successful pattern (conditions + fix shape) to the per-repo wiki (ADR-0032); probe the cache on the first attempt for a fast path on routine recurrences.
- Always post a structured escalation-reason comment ("what was tried, what was ruled out, why a human is needed") before any issue reaches a human.
True-HITL is reserved for exactly three "on-fire" conditions¶
An issue reaches human-required only when:
1. Safety tripwire — the fix requires editing recursion-critical/principles/CI/secrets code, or an irreversible/destructive action.
2. Human-only blocker — a product/policy decision, missing credentials, or repo permissions the agent cannot obtain (high confidence).
3. Global cap tripped — genuinely stuck after many diverse attempts.
Everything else loops inside the auto-agent until resolved.
Consequences¶
Positive: - The dark-factory contract is actually honoured: the human queue collapses to genuine fires. - Recurring false-positive/dead-end escalations (this session's whole theme) are absorbed automatically. - Root-cause focus + learned playbooks reduce recurrence over time, not just per-issue.
Negative: - More auto-agent cycles per issue → more LLM spend. Mitigated by the graduated budget, learned-playbook fast path, and the existing daily-budget cap + audit/dashboard visibility. - Time-to-human for genuine fires grows by the convergence budget. Acceptable: true fires are rare and the global cap bounds the delay.
Risks (this subsystem fixes the factory — recursion sensitivity is paramount):
- Auto-agent loops forever / burns budget. Mitigations: global escalation cap (C), graduated budget (B), daily-budget gate, mid-run cost watchdog.
- Auto-agent "fixes" something wrong with more autonomy. Mitigations: unchanged tool-layer restrictions, human review of every resulting PR before merge, the safety tripwire, and the recursion-critical deny-list.
- Retry masks a real human-needed issue. Mitigation: the blocked_reason taxonomy routes true human-only blockers straight out; only transient/insufficient_context/low-confidence retry.
Alternatives Considered¶
- Leave ADR-0050 as-is, fix false positives at each producer (what we did by hand this session) — rejected: treats symptoms, not the gate; the meta-pattern recurs with every new loop.
- More hand-written playbooks per domain — rejected: doesn't scale to novel issues and re-creates the playbook-limited gap; the generic resolver + context packs subsume it.
- Raise the flat attempt cap to N — rejected: more weak
_defaultpasses is not convergence; without theretry/blocked_reasondistinction it just delays the same bail and burns budget. - Make true-HITL impossible (no human path) — rejected: the three on-fire conditions genuinely need a human; the goal is rare, not never.
Rollout (staged; each stage is independently shippable)¶
- PR-1 — Cycle safety (re-entry break):
human-requiredfiltering in core phases (skip inIssueStore._take_from_queue) + added toall_pipeline_labelsso it clears on a successful HITL correction. This alone eliminates the unbounded autonomous cycle (a corrected issue re-enters clean; a blocked one is never re-pulled). (Highest value, lowest risk.) - PR-2 — Convergence:
retryoutcome +confidence/blocked_reason+ intra-issue retry; parse robustness. - PR-3 — Universal routing + global cap:
BaseEscalationMixin+ central stuck-label registry + bypass-validator; deny-list shrink. The shared mixin is the single escalation chokepoint, so the global per-issue escalation counter → forced true-HITL at cap lands here (where it can count every escalation cleanly) rather than being bolted onto the ~7 current escalation sites. - PR-4 — Novel resolver + learning: generic root-cause persona + domain context packs + learned-playbook wiki cache.
Each stage ships the full test pyramid (unit + MockWorld scenario + sandbox e2e) per docs/standards/testing/.
Source-file citations¶
The following files carry this ADR's decisions and must be kept in sync with any supersession:
src/auto_agent_preflight_loop.py:AutoAgentPreflightLoop— the gate loop; gains graduated budget,retryhandling, and the global-cap → true-HITL transition.src/preflight/decision.py:apply_decision— escalates tohuman-requiredonly on the three on-fire conditions; maps the newretrystatus via_LABEL_MAP.src/preflight/agent.py:run_preflight— emitsconfidence+blocked_reason; parse-robust fallback instead of defaulting toneeds_human.src/preflight/runner.py:parse_agent_response— structured fallback parsing.src/preflight/context.py:PreflightContext— domain context packs + context-sufficiency gating.src/models.py:StateData— newescalation_attempts: dict[str, int](global per-issue counter).src/config.py:HydraFlowConfig—auto_agent_skip_sublabels(shrunk), stuck-label registry, global escalation cap field.src/issue_store.py:IssueStore—human-requiredmade visible so core phases skip it.src/pr_manager.py:swap_pipeline_labels— clearshuman-requiredviaall_pipeline_labels.src/base_background_loop.py:BaseBackgroundLoop— host for the sharedBaseEscalationMixin.