ADR-0044: HydraFlow Principles — the audit contract for new and existing repos¶

Status: Proposed
Date: 2026-04-22
Supersedes: none
Superseded by: none
Enforced by: scripts/hydraflow_audit/* (structural/behavioural checks), tests/test_planner.py::test_build_prompt_includes_principles_checklist, tests/test_reviewer.py::test_build_review_prompt_includes_hydraflow_principles_checks (prompt-level enforcement in plan + review phases)

Context¶

HydraFlow's value is not any single agent loop or script — it is the shape of the repository: the documentation contract, the hexagonal split, the scenario testing harness with MockWorld, the quality gates, the CI workflow, the five concurrent async loops, the label state machine, the Sentry discipline, and the superpowers skill workflow. Today these live in a scattered set of files: CLAUDE.md, docs/wiki/*, and 40+ existing ADRs. A new project that wants to adopt "the HydraFlow way" has to reverse-engineer the shape from those documents. An existing project trying to evolve toward the shape has no way to measure how far along it is.

We want one declarative source of truth that (a) enumerates the principles, (b) cites the existing documentation for each one, and (c) is machine-parseable so make audit can score a repository against the principles without duplicating the rules in Python.

Decision¶

Define ten principles — P1 through P10 — each corresponding to a load-bearing facet of HydraFlow. Each principle section contains:

A one-line Rule stating the intent.
A Why paragraph naming the failure mode the principle prevents.
A How to apply paragraph for the greenfield and adoption cases.
A Check table with columns check_id | type | source | what | remediation, parsed directly by scripts/hydraflow_audit/ at runtime.

The type column takes one of three values:

STRUCTURAL — a file or directory must exist with a specified shape. Audit fails loudly when missing.
BEHAVIORAL — a tool must run clean, or a target must exist and succeed. Audit runs the tool and fails on non-zero exit.
CULTURAL — a human workflow rule that the audit cannot verify reliably (e.g. branch protection on the remote, "never commit to main"). Audit emits a WARN with the ADR citation and a remediation hint; humans confirm in the make init prompt.

The audit tool (scripts/hydraflow_audit/) parses these tables at startup and dispatches each check_id to a Python check function of the same name. If a table row exists without a matching check function, the audit fails with "check not implemented" — this keeps the ADR and the script in lockstep.

The prompt tool (scripts/hydraflow_init/) reads the audit's JSON output and templates a superpowers-chained plan (brainstorming → writing-plans → TDD → verification) scoped to the failing principles only.

Self-documenting by construction. Every check row cites either an ADR or a docs/wiki/ file as its source. The audit report echoes those citations, and make init injects them into the remediation plan. A reader who sees a FAIL can follow the citation to the decision that motivated the rule — not a paraphrase, the real thing. The ADR and wiki layers are the documentation; this ADR is the index that makes them executable. When a principle changes, you edit the ADR; when a check changes, you edit the table row; the script re-reads both on the next run. There is no out-of-band spec for what "HydraFlow-shaped" means — this file is the spec.

Self-observing by construction. The audit and init tools are themselves instrumented with the Sentry filter defined in P7, so runtime failures in the tooling surface as real signal — an audit that silently swallows a check exception is worse than no audit. Unhandled tool exceptions become Sentry events (real bugs only; transient errors log at warning). Future backends — OpenTelemetry traces, structured JSONL to a SIEM, a local observability sidecar — can plug in behind the same port without changing call sites. The project follows the patterns it enforces.

The ten principles¶

P1. Documentation Contract¶

Rule. A HydraFlow repo has a machine-navigable documentation spine: CLAUDE.md at the root as a table of contents, docs/wiki/ as the topic guides, docs/adr/ as the decision log.

Why. Agents and new humans need a stable entry point. Without it, every session starts by re-reading scattered READMEs, context windows burn on re-discovery, and ad-hoc docs drift silently out of sync with code.

How to apply. Greenfield: scaffold all three locations with the topic stubs listed below. Adoption: copy the structure from HydraFlow, then fill in project-specific content — the shape matters more than the wording.

check_id	type	source	what	remediation
P1.1	STRUCTURAL	CLAUDE.md	`CLAUDE.md` exists at repo root	`touch CLAUDE.md` and populate from the template
P1.2	STRUCTURAL	docs/wiki/index.md	`docs/wiki/index.md` exists	Copy the topic-index layout from the HydraFlow repo
P1.3	STRUCTURAL	docs/wiki/architecture.md	`docs/wiki/architecture.md` exists	Describe the major components and their boundaries
P1.4	STRUCTURAL	docs/wiki/gotchas.md	`docs/wiki/gotchas.md` exists	Document the branch-protection and worktree workflow
P1.5	STRUCTURAL	docs/wiki/testing.md	`docs/wiki/testing.md` exists	Document coverage floor and test layering
P1.6	STRUCTURAL	docs/wiki/gotchas.md	`docs/wiki/gotchas.md` exists	Seed with Pydantic, test-import, and mocking pitfalls
P1.7	STRUCTURAL	docs/wiki/patterns.md	`docs/wiki/patterns.md` exists	Document the `make quality` sequence
P1.8	STRUCTURAL	docs/wiki/architecture.md	`docs/wiki/architecture.md` exists	Only required if the project has background loops
P1.9	STRUCTURAL	docs/wiki/patterns.md	`docs/wiki/patterns.md` exists	Document the bug-types filter and logging levels
P1.10	STRUCTURAL	docs/wiki/patterns.md	`docs/wiki/patterns.md` exists	List the Makefile targets
P1.11	STRUCTURAL	docs/adr/README.md	`docs/adr/README.md` exists with an index table	Start the ADR index, even if there is only one ADR
P1.12	BEHAVIORAL	CLAUDE.md	`CLAUDE.md` contains a "Quick rules" section	Add the five non-negotiables (no main commits, no `--no-verify`, run `make quality`, write tests, read avoided-patterns)
P1.13	BEHAVIORAL	CLAUDE.md	`CLAUDE.md` contains a knowledge lookup table	List docs/wiki, docs/adr, and any repo-wiki location
P1.14	STRUCTURAL	docs/adr/README.md	Load-bearing ADRs are present and marked Accepted (or project equivalents exist)	For orchestration repos: ADR-0001 (loops), 0002 (labels), 0003 (worktrees), 0021 (persistence), 0022 (MockWorld), 0029 (caretakers), 0032 (wiki). Non-orchestration repos mark N/A with justification in `docs/adr/README.md`
P1.15	BEHAVIORAL	docs/wiki/gotchas.md	File has ≥5 pattern sections with example code blocks	Seed from HydraFlow's 13-section file; an empty stub does not count
P1.16	BEHAVIORAL	docs/adr/README.md	ADR source citations omit line numbers (use `module:function_or_class`)	Grep for `:\d+` in ADR prose and strip; line numbers drift as code evolves

P2. Domain-Driven Design, Ports & Adapters, Clean Architecture¶

Rule. Source is organised into four layers — domain, application, runners, infrastructure — with imports flowing inward only (clean architecture). Cross-layer coupling is expressed through Protocols in a single ports module (ports & adapters / hexagonal), never direct imports. The domain layer speaks a ubiquitous language: the names in code match the names in docs and in conversation (DDD). Domain types carry behaviour, not just data — anaemic Pydantic models that only hold fields belong in DTOs, not the domain.

Why. Without enforced direction, agent code grows into a ball of mud where a GitHub change breaks the domain model. Protocol boundaries make the system testable with stateful fakes (see P3) and make each adapter replaceable (swap Anthropic for Codex, GitHub for Gitea, without touching the domain). Ubiquitous language collapses translation costs during review — when Issue, Phase, Worktree mean the same thing in prose and code, new contributors do not have to translate. The inward-only import rule is the one invariant that keeps the other three (ports, DDD, testability) possible; once domain imports infrastructure, all three collapse.

How to apply. Greenfield: create the layer directories up front, name them after bounded contexts you can say out loud, and add scripts/check_layer_imports.py to CI before any domain code is written. Adoption: introduce ports.py first, migrate infrastructure behind Protocols one at a time, then add the import checker with an allowlist that shrinks per PR. Rename domain types to match the ubiquitous language as their first refactor — before anything else changes.

check_id	type	source	what	remediation
P2.1	STRUCTURAL	ADR-0003	`src/` directory exists	Move module code under `src/` to keep test discovery clean
P2.2	STRUCTURAL	docs/wiki/architecture.md	`src/ports.py` exists and defines at least one `Protocol`	Extract the first cross-layer boundary (likely the PR/VCS adapter)
P2.2a	STRUCTURAL	docs/wiki/architecture.md	Each infrastructure boundary the project uses has a Protocol in `ports.py` (VCS, workspace, runner, LLM if applicable)	Add a Protocol the first time you reach for `AsyncMock` in a unit test; that is the signal a port is missing
P2.3	STRUCTURAL	docs/wiki/architecture.md	`scripts/check_layer_imports.py` exists	Port the HydraFlow script; configure the layer map for this repo
P2.4	BEHAVIORAL	ADR-0003	`make layer-check` exits 0 (no upward imports)	Refactor the offending import behind a `ports.py` Protocol
P2.5	STRUCTURAL	docs/wiki/architecture.md	A composition root module (e.g. `service_registry.py`) wires layers	Centralise dependency assembly so tests can swap fakes cleanly
P2.6	STRUCTURAL	docs/wiki/architecture.md	Composition root is the only module allowed to import across layer boundaries (explicit ALLOWLIST entry)	Layer checker treats the root as a documented exception, not a blanket escape hatch
P2.7	STRUCTURAL	docs/wiki/architecture.md	Domain layer has no imports from infrastructure, runners, or third-party adapter SDKs	The layer-check must special-case this to a hard failure; domain purity is the load-bearing invariant
P2.8	BEHAVIORAL	docs/wiki/architecture.md	Domain types carry behaviour (methods), not just `@dataclass`/Pydantic fields	Anaemic domain is a sign logic leaked into application or infra; audit samples `src/<domain>/*.py` and warns on files with zero methods on public types
P2.9	CULTURAL	docs/wiki/architecture.md	Ubiquitous language: domain type names appear in `docs/wiki/architecture.md` and in `CLAUDE.md` with matching semantics	When the doc says "Issue" and the code says "Task", translation overhead accumulates; keep one name per concept

P3. Testing — MockWorld and Layered Tests¶

Rule. Tests are organised into five concentric rings — unit, integration, scenario, E2E (smoke + browser), regression — with a stateful MockWorld fixture driving the scenario ring. Coverage floor is 70%. Scenarios gate release.

Why. AsyncMock-based tests pass with the wrong call shape; stateful fakes catch real interaction bugs. Concentric layering means fast tests run every commit and slow E2E runs in CI. 70% coverage is pragmatic — higher thresholds drive test-coverage theatre, lower drive regressions. MockWorld's value comes from how it fakes (state you can inspect, time you can advance, services you can fault-inject) — a mock_world fixture that delegates to AsyncMock passes the shape check but defeats the point.

How to apply. Greenfield: scaffold tests/scenarios/ with conftest.py, fakes/, and at least one happy/sad/edge scenario on day one. Adoption: add MockWorld alongside existing tests; do not retro-fit old tests, but require all new pipeline tests to use it. E2E and browser rings are conditional — only required when a dashboard or UI exists.

check_id	type	source	what	remediation
P3.1	STRUCTURAL	ADR-0022	`tests/scenarios/` directory exists	Create the directory and seed a happy-path scenario
P3.2	STRUCTURAL	ADR-0022	`tests/scenarios/conftest.py` provides a `mock_world` fixture	Port the fixture wiring from HydraFlow
P3.3	STRUCTURAL	ADR-0022	`tests/scenarios/fakes/` contains ≥3 stateful fakes (VCS, LLM, workspace at minimum)	Replace `AsyncMock` fakes one boundary at a time
P3.4	STRUCTURAL	docs/wiki/testing.md	`tests/conftest.py` exists with shared fixtures	Centralise env isolation and factory fixtures
P3.5	STRUCTURAL	docs/wiki/testing.md	At least one factory class (e.g. `IssueFactory`) exists under `tests/`	Introduce factories before the third duplicated fixture
P3.6	BEHAVIORAL	docs/wiki/testing.md	Coverage floor of 70% configured in `pyproject.toml`	Add `fail_under = 70` under `[tool.coverage.report]`
P3.7	BEHAVIORAL	docs/wiki/testing.md	`make test` runs the unit tier and exits 0	Wire `pytest tests/` into `make test`
P3.8	BEHAVIORAL	ADR-0022	`make scenario` runs the scenario tier and exits 0	Add the `scenario` pytest marker and a `make` target
P3.9	BEHAVIORAL	docs/wiki/patterns.md	`make smoke` target exists and exits 0	Smoke is the minimal cross-system path that must pass on every push
P3.10	STRUCTURAL	ADR-0022	Scenario tests are release-gating (CI blocks release branch promotion on scenario red)	Wire `make scenario` into the release or RC workflow (see ADR-0042 for the promotion model)
P3.11	STRUCTURAL	docs/scenarios/README.md	When `ui/` exists, browser E2E directory (`tests/scenarios/browser/` or equivalent) exists	Add Playwright harness with at least one dashboard smoke test; skip if no UI
P3.12	STRUCTURAL	ADR-0022	A `ScenarioResult` / `IssueOutcome`-shaped dataclass exists for scenario inspection	Return structured results from `world.run_pipeline()` so assertions read state, not call counts
P3.13	STRUCTURAL	ADR-0022	`FakeClock` (or equivalent deterministic time fake) exists	Scenarios must not depend on wall-clock time; inject a clock fake
P3.14	BEHAVIORAL	ADR-0022	Fakes expose stateful inspection (`world.vcs.issue(1).labels` or similar), not just `assert_called_with`	Rebuild the offending fake as a stateful class; `AsyncMock` subclasses do not count
P3.15	STRUCTURAL	ADR-0022	`MockWorld` exposes fault-injection API (`fail_service` / `heal_service` or equivalent)	Wire fault injection before the first retry/recovery scenario is written
P3.16	STRUCTURAL	docs/wiki/testing.md	`tests/regressions/` directory exists	Add the directory; every bug fix lands with a regression test there
P3.17	STRUCTURAL	docs/wiki/testing.md	`integration` and `scenario` pytest markers registered in `pyproject.toml`	Declare markers under `[tool.pytest.ini_options.markers]` to fail CI on typos
P3.18	BEHAVIORAL	docs/wiki/testing.md	At least one `*_integration.py` test file exists to drive the integration ring	Start with a cross-module wiring test; pure unit tests do not satisfy this
P3.19	BEHAVIORAL	docs/wiki/gotchas.md	No top-level imports of optional dependencies in test files	Move imports inside the test function; top-level imports break collection when deps are absent

P4. Quality Gates¶

Rule. make quality is the single command a developer runs before declaring work complete. It composes lint, typecheck, security, test, and layer-check into one fail-fast pipeline. make quality-lite runs the non-test checks for quick iteration.

Why. Quality tools only help when they run. One canonical target removes the "did I run all of them" ambiguity and gives CI a single command to mirror.

How to apply. Greenfield: add all five tools on day one, even with empty configs — it is cheaper than retrofitting. Adoption: introduce tools one at a time via quality-lite so the first PR that adds make quality-lite is small; make quality (which includes tests) follows once coverage is credible.

check_id	type	source	what	remediation
P4.1	BEHAVIORAL	docs/wiki/patterns.md	`make lint-check` target exists and exits 0	Add `ruff check` + `ruff format --check` behind the target
P4.2	BEHAVIORAL	docs/wiki/patterns.md	`make typecheck` target exists and exits 0	Add `pyright` with config in `pyproject.toml`
P4.3	BEHAVIORAL	docs/wiki/patterns.md	`make security` target exists and exits 0	Add `bandit -r src/ --severity-level medium`
P4.4	BEHAVIORAL	docs/wiki/patterns.md	`make test` target exists and exits 0	Wire pytest behind the target
P4.5	BEHAVIORAL	docs/wiki/patterns.md	`make quality-lite` composes lint + typecheck + security	Add the aggregate target
P4.6	BEHAVIORAL	docs/wiki/patterns.md	`make quality` composes quality-lite + test + layer-check	Add the final gate target
P4.7	STRUCTURAL	docs/wiki/patterns.md	Tool configs live in `pyproject.toml` (not a forest of dotfiles)	Move ruff/pyright/bandit/pytest configs into `pyproject.toml`

P5. CI and Branch Protection¶

Rule. CI mirrors make quality on every PR and enforces the same exit codes. main is protected and only advances through PRs. Pre-commit and pre-push hooks run the relevant subset locally.

Why. If CI and local gates diverge, one of them lies. Branch protection on main turns the quick rule "never commit to main" into a guarantee the remote enforces rather than a convention humans remember.

How to apply. Greenfield: push the first commit on a branch, set up branch protection before merging it, and wire the .github/workflows/ci.yml file in the same PR. Adoption: enable branch protection first, then migrate local make commands into CI one at a time to avoid a big-bang green/red switch.

check_id	type	source	what	remediation
P5.1	STRUCTURAL	docs/wiki/patterns.md	`.github/workflows/` directory contains at least one workflow	Port `ci.yml` from HydraFlow as a starting point
P5.2	BEHAVIORAL	docs/wiki/patterns.md	Workflow runs `make quality-lite` or equivalent	Wire the make target into the workflow's steps
P5.3	BEHAVIORAL	docs/wiki/patterns.md	Workflow runs `make test` with the coverage gate	Add a `--cov-fail-under=70` invocation
P5.4	STRUCTURAL	docs/wiki/gotchas.md	`.githooks/pre-commit` exists and is executable	Seed the hook from HydraFlow's template
P5.5	CULTURAL	docs/wiki/gotchas.md	`main` has branch protection with required PR review and CI	Enable via GitHub repo settings; audit cannot verify offline
P5.6	CULTURAL	CLAUDE.md	No direct pushes to `main` in the last 100 commits	Inspect `git log --first-parent main`; audit reports as a warning
P5.7	BEHAVIORAL	docs/wiki/patterns.md	`pytest` treats `RuntimeWarning` and `PytestUnraisableExceptionWarning` as errors	Add `filterwarnings = ["error::RuntimeWarning", "error::pytest.PytestUnraisableExceptionWarning"]` to `pyproject.toml`; warnings-are-errors turns async lifecycle bugs into red CI instead of silent drift
P5.8	STRUCTURAL	docs/wiki/patterns.md	`.githooks/pre-push` exists and runs `make quality-lite`	Pre-commit gates staged Python; pre-push gates the branch before the remote sees it
P5.9	BEHAVIORAL	docs/wiki/patterns.md	Pre-commit hook implements self-repair (on lint-check failure, run `make lint-fix` and re-stage before escalating)	Agent sessions stall indefinitely on formatting errors otherwise; self-repair keeps the loop moving
P5.10	STRUCTURAL	CLAUDE.md	Pre-commit hook refuses deletion or net content removal of `CLAUDE.md`	Load-bearing file; silent loss of the Quick Rules section would remove the project's guardrails without notice

P6. Agents — Loops, Labels, Background Workers¶

Rule. Pipeline work runs as N concurrent async loops (N=5 for HydraFlow) coordinating on a GitHub label state machine. Auxiliary long-running work runs as BaseBackgroundLoop subclasses wired through a five-checkpoint registration (service registry, orchestrator dict, UI constants, dashboard route bounds, config interval).

Why. Concurrent loops let the system make progress on independent phases without queue coordination. Labels as the state machine mean every state transition is visible on the GitHub timeline — debuggable by reading a PR. The five-checkpoint wiring is how we avoid "half-registered" loops that run but don't show up in the UI.

How to apply. Greenfield orchestration project: scaffold the BaseBackgroundLoop base class and the wiring test on day one. Non-orchestration project: this principle is informational; note in the audit output that P6 is optional for the repo type and skip the failures.

check_id	type	source	what	remediation
P6.1	STRUCTURAL	ADR-0001	`src/orchestrator.py` exists with concurrent loop structure	Only applicable to orchestration-shaped projects; mark N/A otherwise
P6.2	STRUCTURAL	ADR-0002	Label names are centralised in config (not scattered strings)	Collect labels into a single config module or dataclass
P6.3	STRUCTURAL	ADR-0029	`BaseBackgroundLoop` base class exists	Port from HydraFlow when the first long-running job appears
P6.4	BEHAVIORAL	docs/wiki/architecture.md	Loop-wiring completeness test covers all five checkpoints (service registry, orchestrator dict, UI constants, dashboard-route bounds, config interval + env override)	Port HydraFlow's `test_loop_wiring_completeness.py`; half-wired loops run but vanish from the dashboard
P6.5	STRUCTURAL	ADR-0002	Atomic label-swap helper exists (no ad-hoc add/remove call sites)	Add a `swap_pipeline_labels` function and forbid direct calls

P7. Observability — Sentry, Structured Logging, Repo Wiki¶

Rule. Sentry events are filtered by a _BUG_TYPES gatekeeper so transient errors never page a human. Logging uses structured levels (warning for expected transient failures, error only for real bugs). Knowledge captured from past runs is stored in a per-repo wiki under repo_wiki/<repo_slug>/ and injected into runner prompts.

Why. Unfiltered Sentry becomes noise and gets muted. Unstructured logs mean incident response starts from zero every time. The repo wiki is how the system compounds learnings rather than re-discovering them every session.

How to apply. Greenfield: define _BUG_TYPES the first time you wire Sentry; seed repo_wiki/ with the first post-mortem. Adoption: introduce the filter in a dedicated PR so the drop in event volume is visible; migrate noisy logger.error calls to logger.warning in a follow-up.

check_id	type	source	what	remediation
P7.1	STRUCTURAL	docs/wiki/patterns.md	`_BUG_TYPES` tuple exists where Sentry is initialised	Define the tuple with real-bug exceptions only
P7.2	BEHAVIORAL	docs/wiki/patterns.md	Sentry `before_send` callback uses `_BUG_TYPES`	Wire the filter in the init call
P7.3	STRUCTURAL	ADR-0032	`repo_wiki/` directory exists (or project-equivalent knowledge base)	Create the directory; seed from post-mortems
P7.3a	STRUCTURAL	ADR-0032	Wiki has the three-layer shape: raw sources, synthesised wiki pages, index/schema	A flat dumping ground of markdown is not a wiki; the compiler/librarian pattern requires all three
P7.3b	BEHAVIORAL	ADR-0032	Wiki store exposes ingest / query / lint operations (or project equivalents)	Port `RepoWikiStore`; without ingest the wiki stagnates, without lint it accumulates stale entries
P7.3c	BEHAVIORAL	ADR-0032	Runner prompts inject relevant wiki content before agent invocation	`_inject_repo_wiki` pattern or equivalent; a wiki that is never read has no value
P7.4	CULTURAL	docs/wiki/patterns.md	No `except: pass` or bare `except:` in `src/`	Audit greps; remediate by logging at `warning` minimum
P7.5	BEHAVIORAL	docs/wiki/patterns.md	No `logger.error(value)` without a format string (audit greps `logger\.error$\w+$$`)	Format strings preserve structure for log aggregation; bare-value error calls flatten to opaque strings
P7.6	STRUCTURAL	docs/wiki/patterns.md	The audit and init tooling (`scripts/hydraflow_audit/`, `scripts/hydraflow_init/`) route unhandled exceptions through the P7.1/P7.2 Sentry filter	The tooling must follow its own principle; silent audit failures poison the signal the audit is supposed to provide
P7.7	STRUCTURAL	docs/wiki/patterns.md	Observability is behind a port (`ObservabilityPort` or equivalent) so the Sentry adapter can be swapped for OTLP / structured logs / a sidecar without touching call sites	Preserves future optionality without committing to a second backend today

P8. Superpowers / Skills Integration¶

Rule. The repo is wired to the superpowers skill pack so sessions start with brainstorming for greenfield work, TDD for features, systematic debugging for bugs, writing-plans for multi-step changes, and verification-before-completion before commits. Hooks run in .claude/hooks/ enforce the guardrails the human-driven skills encode.

Why. Skills are the operational playbook. Without them each session re-litigates how to approach a task. Hooks make the "always" rules in CLAUDE.md actually always-on instead of best-effort.

How to apply. Greenfield: seed .claude/settings.json and .claude/hooks/ from HydraFlow. Adoption: add one hook at a time, starting with block-destructive-git (high value, low controversy).

check_id	type	source	what	remediation
P8.1	STRUCTURAL	CLAUDE.md	`.claude/` directory exists	Seed from HydraFlow's `.claude/` layout
P8.2	STRUCTURAL	CLAUDE.md	`.claude/settings.json` or `settings.local.json` exists	Configure hooks and skill references
P8.3	STRUCTURAL	CLAUDE.md	`.claude/hooks/` contains at least one PreToolUse hook	Start with `block-destructive-git`
P8.4	CULTURAL	CLAUDE.md	CLAUDE.md references the six core superpowers skills by name	Must mention: `brainstorming`, `test-driven-development`, `systematic-debugging`, `writing-plans`, `verification-before-completion`, `requesting-code-review`. A vague "use skills" line does not count
P8.5	STRUCTURAL	CLAUDE.md	`.claude/hooks/` includes at least one hook of each enforced kind: PreToolUse, PostToolUse, Stop	Seed from HydraFlow: `block-destructive-git` (PreToolUse), `auto-lint-after-edit` (PostToolUse), `hf.session-retro` (Stop)
P8.6	STRUCTURAL	docs/self-improving-harness.md	In-process trace collector writes subprocess traces per phase/run	Port `trace_collector.py`; without traces, session retros have nothing to mine

P9. Persistence and Data Layout¶

Rule. All run-time state lives under a single configurable root (config.data_root, default .hydraflow/), scoped by repo slug. Writes are atomic. Cross-process coordination goes through named stores (StateTracker for phase state, DedupStore for idempotency) rather than ad-hoc files. Nothing run-time goes in the repo working tree.

Why. A single root makes ops trivial — one directory to back up, one to blow away, one to gitignore. Repo-slug scoping means multiple target repos coexist without collision. Atomic writes prevent corrupted state from a killed process becoming permanent. Named stores concentrate the race-prone logic in one place; ad-hoc JSON files scattered across modules grow inconsistent invariants.

How to apply. Greenfield: define data_root in config on day one, even if the first feature only needs one file. Adoption: introduce data_root and migrate one persisted file at a time; keep backward-compatible reads until the migration is complete.

check_id	type	source	what	remediation
P9.1	STRUCTURAL	ADR-0021	Config exposes a `data_root` (or equivalent) field with a documented default	Add to the config dataclass; default under the user's home or a repo-gitignored path, not the working tree
P9.2	BEHAVIORAL	ADR-0021	`data_root` is overridable via environment variable	Wire `HYDRAFLOW_DATA_ROOT` (or project-namespaced equivalent) through config loading
P9.3	STRUCTURAL	ADR-0021	Persisted state is scoped per repo slug inside `data_root`	Path shape: `<data_root>/<repo_slug>/...` so multi-repo runs never collide
P9.4	STRUCTURAL	ADR-0021	`StateTracker`-shaped abstraction exists for phase/run state	Centralise state transitions; disallow direct JSON read/write from phase code
P9.5	STRUCTURAL	ADR-0021	`DedupStore`-shaped abstraction exists for idempotency tracking	Background loops must guard against double-processing across restarts
P9.6	BEHAVIORAL	ADR-0021	All state writes go through atomic write helper (write-to-temp + rename)	A `kill -9` mid-write must not leave a half-valid file
P9.7	STRUCTURAL	ADR-0021	`data_root` (default `.hydraflow/`) is in `.gitignore`	Run-time state is never committed; enforce via gitignore
P9.8	STRUCTURAL	ADR-0021	No run-time state is written inside `src/` or the repo working tree	Grep for `open(.*"w")` with repo-relative paths outside `data_root`; migrate hits to the store abstractions

P10. TDD Workflow Discipline¶

Rule. Every feature and bug fix lands through the test-first loop: write a failing test that names the intended behaviour, make it pass with the smallest credible change, then refactor. Bug fixes land with the regression test that would have caught them. The superpowers:test-driven-development skill is the default for implementation work.

Why. Test-first locks the specification before the implementation can cheat it; test-after retrofits the spec to whatever the code does. Bug fixes without regression tests are open invitations for the bug to return. TDD is also the tightest feedback loop for agent work — a red test makes the success criterion machine-checkable.

How to apply. Greenfield: the first feature's first commit is a failing test. Adoption: introduce TDD for new features only; do not retro-fit, but every bug fix from today forward lands with a regression test.

check_id	type	source	what	remediation
P10.1	CULTURAL	CLAUDE.md	CLAUDE.md documents test-first as the default workflow and names `superpowers:test-driven-development`	Add a "Workflow" section pointing at the skill
P10.2	BEHAVIORAL	docs/wiki/testing.md	Every directory under `src/` with production code has a corresponding test file (unit ring coverage)	Audit walks `src/` and expects a matching `tests/test_<module>.py` or similar; orphan modules surface in the report
P10.3	CULTURAL	docs/wiki/testing.md	Bug-fix commits land with a regression test in `tests/regressions/`	Audit scans last 50 merged PRs tagged `bug`/`fix`; reports PRs missing a regression-test delta as a warning
P10.4	STRUCTURAL	docs/wiki/testing.md	Test names describe behaviour, not implementation (e.g. `test_merges_when_all_checks_pass` not `test_merge_function`)	Enforce via a test-name linter or review rubric; audit samples names and flags ones matching `test_<funcname>$`
P10.5	BEHAVIORAL	docs/wiki/testing.md	Test files use the Arrange / Act / Assert structure visibly	Prefer factories + one assertion per test; multi-assert tests are a smell

Consequences¶

Positive - New repos get a one-command (make audit) readout of their conformance. - Adoption path is measurable: the audit's pass count moves PR by PR. - Principles have one home (ADR-0044) — no more "is it in CLAUDE.md or docs/wiki or an ADR?" ambiguity. - ADR tables and audit code stay in lockstep because the code reads the ADR at runtime; a dangling check_id fails the audit. - Self-documenting: every check cites its source, so remediation hints point at the real decision record, not a paraphrase. - Self-observing: audit/init runtime failures surface through the same Sentry filter the principles require, so the tooling eats its own dogfood and runtime regressions feed back into the learning loop.

Negative - Any change to a principle now requires an ADR edit, not just a doc tweak. This is the intended friction, but it is friction. - Audit is Python-only today; polyglot repos (e.g. a Node frontend with a Python backend) only get the Python-side checks until checks are generalised. - CULTURAL checks under-cover reality — an audit can say "no main commits in the recent log" but cannot verify the remote branch-protection setting. The make init prompt compensates by asking the user to confirm.

Neutral - P6 (agents/loops/labels) is only meaningful for orchestration-shaped projects. Non-orchestration repos will see P6 as "N/A" rather than FAIL. - The bar is 10 principles today; the list is expected to grow. Adding P11 is an ADR amendment, not a code refactor. - Several checks (P2.8 anaemic-type detection, P2.9 ubiquitous-language, P10.2 orphan-module coverage) are heuristic — the audit reports them as warnings even when the numeric threshold is met, so reviewers apply judgement rather than treating a green audit as proof of correctness.

Alternatives considered¶

Inline principles in CLAUDE.md. Rejected because CLAUDE.md is a table of contents, not a decision log. Principles are decisions and belong in the ADR directory alongside the other architectural commitments.

YAML sidecar for check tables (0044-principles.checks.yaml). Rejected because the sidecar splits the rule from its rationale — a reader of the ADR sees only half the contract. Markdown tables are fiddly to parse but not prohibitively so, and the audit has a sharp schema: five columns, first row is headers.

Generate principles from scanning the HydraFlow repo. Rejected because it inverts the direction of authority. Principles should drive the code, not the other way around. An ADR that merely describes existing code is a snapshot that rots.

CLAUDE.md — the table of contents this ADR formalises
ADR-0001 — five concurrent async loops (P6)
ADR-0002 — labels as the state machine (P6)
ADR-0003 — worktree isolation (P5)
ADR-0021 — persistence layout (informational)
ADR-0022 — MockWorld harness (P3)
ADR-0029 — BaseBackgroundLoop (P6)
ADR-0032 — repo wiki (P7)
scripts/hydraflow_audit/ — the audit tool that reads this ADR's tables
scripts/hydraflow_init/ — the prompt emitter that reads the audit's report