ADR-0021: Persistence Architecture and Data Layout¶
Status: Accepted Enforced by: tests/test_state_persistence.py, tests/test_event_persistence.py Date: 2026-02-28
Context¶
HydraFlow must persist state across process restarts, track active issues through the pipeline, store event history, and organise logs/plans/memory/metrics for each managed repository. The persistence layer must support:
- Crash recovery: if the orchestrator is killed mid-run, it must resume without re-processing completed work or losing track of in-flight issues.
- Multi-repo isolation: when a supervisor spawns one HydraFlow process per repository, each process must write to its own data directory without collisions.
- Operational visibility: dashboards and CLI tools need structured access to metrics snapshots, event streams, and session history.
- Configuration flexibility: operators must be able to relocate the data directory (e.g. to a mounted volume in Docker) without code changes.
Prior to formalising this decision, the data layout had grown organically. This ADR captures the current architecture and makes the conventions explicit so that future features (e.g. multi-repo dashboard aggregation, remote state backends) build on a documented foundation.
Decision¶
All HydraFlow persistent data lives under config.data_root, which defaults to
<repo_root>/.hydraflow/. The data root is resolved at startup with the following
precedence:
HYDRAFLOW_HOMEenvironment variable (highest priority).- Explicit
data_rootfield inHydraFlowConfig. - Default:
<repo_root>/.hydraflow/.
Full config load order is: Pydantic defaults, config file, env vars, CLI args.
Path resolution runs via the resolve_defaults validator in seven steps:
_resolve_base_paths— repo_root, worktree_base, data_root_resolve_repo_and_identity— repo slug, gh_token, git identity_resolve_repo_scoped_paths— state_file, event_log_path, config_file_apply_env_overrides— env-var overrides for labels, tokens, etc._apply_profile_overrides— grouped tool/model defaults for profiles_harmonize_tool_model_defaults— tool and model consistency_validate_docker— Docker configuration validation
Data layout¶
Note: The layout below reflects the target architecture mandated by ADR-0010 (Worktree and Path Isolation Architecture).
log_dir,plans_dir, andmemory_dirremain flat underdata_root/in the current implementation — see [^1].
<data_root>/ # default: <repo_root>/.hydraflow/
config.json # Persisted config snapshot
<repo_slug>/ # Per-repo namespace (owner-repo)
state.json # StateTracker crash-recovery state
events.jsonl # EventBus append-only event log
sessions.jsonl # Session history
logs/ # Transcript and log files
plans/ # Saved implementation plans
memory/ # Memory and review-insight files
metrics/ # Per-repo metrics
snapshots.jsonl # Dashboard-consumable metric snapshots
runs/ # Run artifacts
manifest/ # Codebase manifest snapshots
cache/ # Ephemeral caches
verification/ # Verification artifacts
[^1]: log_dir, plans_dir, and memory_dir are flat under <data_root>/
in the current implementation (config.py properties return data_root / "logs"
etc.). ADR-0010 (Worktree and Path Isolation Architecture) mandates migrating them
to <data_root>/<repo_slug>/logs/ etc. The layout tree above documents the target
structure; the derived-paths table documents current behaviour.
Persistence guarantees¶
- Atomic writes:
StateTracker.save()usesatomic_write()(fromfile_util) to writestate.json, preventing corruption from partial writes or crashes mid-flush. - Immediate persistence: every state mutation calls
save()immediately; there is no batching or deferred flush. - Pydantic validation on load:
StateTracker.load()deserialises withStateData.model_validate(). Corrupt or unreadable state files are reset to empty defaults and logged, rather than crashing the process. - Automatic migration: legacy
bg_worker_statesentries are migrated to the newerworker_heartbeatsschema on first load.
Config snapshot (config.json)¶
An optional config.json at data_root / "config.json" persists the current
HydraFlowConfig as a JSON snapshot. Persistence is disabled by default
(config_file = None); operators opt in by setting config_file explicitly.
When present, load_config_file() reads it at startup and save_config_file()
writes it on hot-config updates.
Session history (sessions.jsonl)¶
sessions.jsonl is an append-only JSONL file that records one SessionLog
entry per pipeline session (plan, implement, review). It lives alongside
state.json at repo_data_root / "sessions.jsonl" and is managed by
StateTracker (via _session.py). Legacy flat-layout files at
data_root / "sessions.jsonl" are automatically migrated to the repo-scoped
location on first load (see _resolve_repo_scoped_paths in config.py).
Multi-repo namespacing¶
- Per-repo artifacts are scoped under
data_root/<repo_slug>/via_resolve_repo_scoped_paths()inconfig.py, whererepo_slugisconfig.repo.replace("/", "-"). State, events, and sessions are fully repo-scoped today;log_dir,plans_dir, andmemory_dirremain flat underdata_root/in the current implementation — see [^1]. config.repo_data_rootprovides a general-purpose repo-scoped subdirectory atdata_root / repo_slug.- The supervisor spawns isolated processes per repo with separate
HYDRAFLOW_HOMEvalues, giving each process its owndata_root. - Worktrees are scoped under
worktree_base / repo_slug / issue-{N}to prevent collisions between same-numbered issues across repositories.
Derived paths¶
The following HydraFlowConfig properties derive directories from data_root:
| Property | Path |
|---|---|
config_file |
None by default (persistence disabled); conventional location data_root / "config.json" when opted in |
repo_data_root |
data_root / repo_slug |
state_file |
data_root / repo_slug / "state.json" |
event_log_path |
data_root / repo_slug / "events.jsonl" |
sessions.jsonl (no config property; implicit path) |
repo_data_root / "sessions.jsonl" |
log_dir |
data_root / "logs" [^1] |
plans_dir |
data_root / "plans" [^1] |
memory_dir |
data_root / "memory" [^1] |
All paths can be individually overridden via their respective config fields,
but the defaults ensure a single data_root change relocates everything.
Consequences¶
Positive:
- Single knob relocation: setting
HYDRAFLOW_HOMEordata_rootmoves all persistent data, which is essential for Docker volume mounts and CI environments. - Crash resilience: atomic writes and Pydantic-validated loads mean the orchestrator can be killed at any point and resume cleanly.
- Multi-repo safety: repo-slug namespacing in metrics and worktrees prevents data collisions when multiple repos are managed from a shared filesystem.
- Structured layout: dedicated subdirectories for logs, plans, memory, and metrics make it straightforward to add retention policies, backup scripts, or dashboard queries targeting specific data categories.
Negative / Trade-offs:
- Single-file state bottleneck:
state.jsonis a single file rewritten on every mutation. This is adequate for current throughput but would not scale to hundreds of concurrent issues without moving to a database or partitioned state. - No built-in replication: the data directory is local-only. High-availability setups require external solutions (e.g. shared NFS, object-store sync).
- Flat event log:
events.jsonlgrows without bound until rotation triggers (configurable viaevent_log_max_size_mbandevent_log_retention_days). Large deployments should tune these values. - Implicit directory creation: subdirectories are created on first use rather than at startup, which can obscure the full layout until the system has exercised all code paths.
Alternatives considered¶
- SQLite for state: would remove the single-file bottleneck and enable concurrent reads, but adds a dependency and complicates atomic snapshot export. Not justified at current scale.
- Per-issue state files: would reduce write contention, but complicates querying across issues and makes crash recovery harder (must scan many files).
- Remote state backend (S3/Redis): would enable multi-instance deployments, but introduces network failure modes and latency. Deferred until there is a concrete multi-instance requirement.
- Embedded Dolt state backend (briefly implemented, removed 2026-04-24):
DoltBackendversionedstate.json, dedup sets, and insight stores via the embeddeddoltCLI. It was opt-in (active only whendoltwas on PATH) and never installed in production containers or EC2, so the file fallback was the de-facto path everywhere except one developer machine. Removed because the dual-path branches in seven consumers carried real maintenance cost for a feature that hadn't earned an ADR or a deployment.
Related¶
- Source memory: #1624 — HydraFlow persistence architecture and data layout
- This ADR: #1633
src/state:StateTracker— crash-recovery state persistencesrc/state/_session.py— session history persistence (sessions.jsonl)src/config.py:_resolve_base_paths,src/config.py:_resolve_repo_and_identity,src/config.py:_resolve_repo_scoped_paths— data root and path resolutionsrc/config.py:load_config_file,src/config.py:save_config_file— config snapshot persistencesrc/config.py:HydraFlowConfig.data_root— data root configurationsrc/metrics_manager.py— repo-slug namespaced metricssrc/file_util.py:atomic_write— atomic file write helper- ADR-0003 (Git Worktrees for Issue Isolation) — worktree isolation (complementary filesystem layout)
- ADR-0006 (RepoRuntime Isolation Architecture), superseded by ADR-0009 (Multi-Repo Process-Per-Repo Model) — RepoRuntime isolation (per-repo process boundaries)
- ADR-0009 (Multi-Repo Process-Per-Repo Model) —
_resolve_repo_scoped_paths()scoping that places state files underdata_root/<repo_slug>/ - ADR-0010 (Worktree and Path Isolation Architecture) — mandates repo-slug scoping for
log_dir,plans_dir,memory_dirtodata_root/<repo_slug>/