Data Backups¶

Off-machine snapshot + restore system for work-buddy's vital SQLite databases. Hot-backup -> tarball -> manifest -> GitHub Releases. Tiered retention, gh-CLI driven, integrates with the health Component system.

Details¶

Off-machine snapshot + restore for work-buddy's vital SQLite databases. Built on SQLite's hot-backup API, tarballed with a structured manifest, pushed to a user-owned private GitHub Releases bucket, and recoverable on a fresh-installed machine through a schema-aware restore pipeline.

Lives in work_buddy/backups/. The system has four moving parts (local snapshot, manifest, remote push, restore) plus a health-Component for setup and observability.

Why it exists¶

The task store is the single source of truth for everything work-buddy knows about the user's work -- claims, archives, action items, tags, state history. A single bug that issues a wide-fanout DELETE against it (or a corrupted disk, or a fat-fingered rm -rf .data/) would be catastrophic and not recoverable from any other system surface. The backup system + the soft-delete discipline (see tasks/task_delete) are two halves of the same safety net: soft-delete prevents accidental destruction of individual rows; backups protect against categorical loss of the whole store.

Vital DBs that get backed up (declared in work_buddy/backups/local.py as VITAL_DBS):

Logical name	On-disk file	Owner
`tasks`	`.data/db/tasks/task_metadata.db`	`obsidian/tasks-plugin`
`projects`	`.data/db/projects.db`	`projects/`
`messages`	`.data/db/messages.db`	`messaging/`
`threads`	`.data/db/threads.db`	`threads/`
`entities`	`.data/db/entities.db`	`entities/`
`settings`	`.data/db/settings/settings.db`	`settings`

The logical name is what appears in the manifest and the snapshot tag; the on-disk filename is preserved inside the tarball so restore can reconstruct the directory layout.

Snapshot pipeline (`work_buddy/backups/local.py`)¶

For each vital DB, open it and call sqlite3.Connection.backup(dest). This is SQLite's hot-backup API -- a page-by-page logical copy under the lock protocol that does not block writers and is WAL-coherent. Output: .data/backups/<snapshot_id>/<dbname>.db.
Write MANIFEST.json alongside.
Tar+gzip the directory via Python's tarfile stdlib (cross-platform, no shell-out).
Sweep retention (see Retention).
If backups.github.repo is configured and gh is authenticated, push to GitHub Releases (see Remote push).
Write .data/backups/last_run.json recording success/fail + duration + sizes. Health checks read this file -- they never hit GitHub on the hot path.

Snapshot IDs are ISO-timestamped: snap-<utc-isoformat>. Manual snapshots (triggered via /wb-backup-now or data_backup(manual=True)) get a -manual suffix and live in their own retention bucket.

Manifest format (`work_buddy/backups/manifest.py`)¶

Keys: - snapshot_ts -- ISO UTC timestamp of the snapshot. - work_buddy_version, work_buddy_commit, work_buddy_branch, work_buddy_dirty -- code provenance at snapshot time. work_buddy_dirty=True flags an uncommitted working tree as an audit signal; does not block restore. - host -- hostname of the snapshotting machine. - schema_versions -- map of logical DB name -> PRAGMA user_version at snapshot time. Restore uses this to refuse forward-time travel and to drive forward-migration. - row_counts -- map of table -> row count at snapshot time. Restore validates counts after schema upgrade against this, with tolerance for migration-added rows. - manifest_version -- integer; future-proofs the manifest format itself. Restore checks it and refuses unknown values.

Retention (tiered, per-tier capped)¶

Sweep runs after every snapshot, mirrored locally and remotely. Both sweeps bucket a snapshot by the timestamp encoded in its snap-<isots> id/tag -- never by a filesystem mtime or a GitHub release's createdAt -- so the local set and the remote set converge on the same tiered selection. The remote sweep deletes out-of-bucket releases with gh release delete.

Tier	Cadence	Cap
Hourly	every hour	24
Daily	one per day	7
Weekly	one per ISO week	4
Monthly	one per calendar month	12
Annual	one per calendar year	unbounded
Manual	user-triggered	20 (independent bucket)

Steady-state local footprint at ~3 MB compressed per snapshot is ~156 MB across the ~52 retained slots. Manual snapshots are deliberately a small bucket -- they are anchor points a user takes before something risky, not archival.

The tier caps are defined by the RETENTION dict in work_buddy/backups/local.py.

Remote push (`work_buddy/backups/remote.py`)¶

The remote target is a user-owned private GitHub repository. Snapshots are uploaded as GitHub Release assets, one release per snapshot, tagged with the snapshot ID. We subprocess the gh CLI rather than embed PyGithub because:

The user's existing GitHub credentials are managed by gh; we never touch a PAT.
gh release create / gh release upload support private repos natively and need no Python GitHub client.
The gh release list --json query lets the restore pipeline enumerate remote snapshots without a Python GitHub client.

Transient-fault handling: push_snapshot retries a push that fails with a network/DNS fault (e.g. intermittent resolution of uploads.github.com) up to three attempts with a short backoff -- well inside the hourly cron window. Permanent faults (gh missing, unauthenticated) are not retried. gh release create uploads the asset after creating the release object; if an earlier attempt created the release but its asset upload failed, the retry detects the "already exists" error and falls back to gh release upload --clobber, so a retried push converges instead of looping.

No encryption layer. A private GitHub repo is the same trust model as the user's other private code repositories -- the threat model is account compromise, not in-transit interception. Adding GPG encryption would buy nothing and add a key-management failure mode.

Fresh-repo gotcha: the first push to an empty repo errors with Repository is empty. The fix_backup_repo_configured fixer creates the repo with gh repo create --private --add-readme to seed the default branch.

Restore pipeline (`work_buddy/backups/restore.py`)¶

data_restore(snapshot_id) (capability) executes:

Download <tag>.tar.gz from GitHub Releases into a staging directory.
Read MANIFEST.json and validate: manifest_version is recognized; for each DB, snapshot's schema_versions[db] <= code's max migration (forward-time-travel guard).
Unpack into .data/db.staging_<ts>/.
Open each staged DB through its migration authority (see architecture/migrations) -- the ladder rolls the staged schema forward to current. The Settings database uses its own versioned ladder and the same forward-version guard.
PRAGMA integrity_check + PRAGMA foreign_key_check per DB. Refuse on either failure.
Verify row counts after schema upgrade match the manifest, with tolerance for migration-added rows.
Move current .data/db/ to .data/db.pre_restore_<ts>/ (auto-rollback safety net).
Move staging into place.

Steps 3-7 are reversible -- staging gets discarded on any failure and the live DB is never touched until step 8.

Health system integration¶

Registered as a non-core opt-in Component github_backups (see architecture/health). Three Requirements with their own Fixers:

Requirement	Fix kind	Fixer behaviour
`gh-cli-installed`	`agent_handoff`	Spawns a Claude Code session that walks the user through OS-appropriate install.
`gh-authenticated`	`agent_handoff`	Walks through `gh auth login --web`.
`repo-configured`	`input_required`	Form for repo name, calls `gh repo create --private --add-readme` if absent, writes `backups.github.repo` to `config.local.yaml`.

The Component declares one custom check (check_github_backup_freshness) that reads .data/backups/last_run.json and returns success/warning/failure based on whether the last snapshot landed inside the configured cadence window. It never polls GitHub directly.

A domain:backups entry in work_buddy/control/graph_static.py makes the Component surface in the Settings tab's domain list. No frontend work beyond adding "domain:backups" to domainOrder -- the card auto-renders from the control graph (see architecture/control-graph).

Cron + slash commands¶

sidecar_jobs/data-backup.md -- hourly cron, calls data_backup capability. Skips silently if the Component is unwanted or its Requirements are unmet.
/wb-backup-now -- manual one-off snapshot. Used as an anchor point before a risky operation.
/wb-backup-restore [snapshot-id] -- list remote snapshots or restore a specified one.

There are no /wb-backup-setup, /wb-backup-status, or /wb-backup-config slash commands by design -- those surface via the Settings tab's auto-rendered card. The slash-command surface is reserved for the two recurring user-initiated operations (snapshot now, restore).

Capabilities (registered in `work_buddy/mcp_server/registry.py`)¶

data_backup(manual: bool = False) -- take a snapshot, push to remote.
data_backup_list() -- list local + remote snapshots with sizes and timestamps.
data_restore(snapshot_id: str, force: bool = False) -- restore from a given snapshot.

Data Backups¶

Details¶

Why it exists¶

Snapshot pipeline (`work_buddy/backups/local.py`)¶

Manifest format (`work_buddy/backups/manifest.py`)¶

Retention (tiered, per-tier capped)¶

Remote push (`work_buddy/backups/remote.py`)¶

Restore pipeline (`work_buddy/backups/restore.py`)¶

Health system integration¶

Cron + slash commands¶

Capabilities (registered in `work_buddy/mcp_server/registry.py`)¶

See also¶

Children¶

Data Backups¶

Details¶

Why it exists¶

Snapshot pipeline (work_buddy/backups/local.py)¶

Manifest format (work_buddy/backups/manifest.py)¶

Retention (tiered, per-tier capped)¶

Remote push (work_buddy/backups/remote.py)¶

Restore pipeline (work_buddy/backups/restore.py)¶

Health system integration¶

Cron + slash commands¶

Capabilities (registered in work_buddy/mcp_server/registry.py)¶

See also¶

Children¶

Snapshot pipeline (`work_buddy/backups/local.py`)¶

Manifest format (`work_buddy/backups/manifest.py`)¶

Remote push (`work_buddy/backups/remote.py`)¶

Restore pipeline (`work_buddy/backups/restore.py`)¶

Capabilities (registered in `work_buddy/mcp_server/registry.py`)¶