feat(api): export user domain data as a tar.gz archive #322

Merged
james merged 2 commits from feat/data-export into main 2026-06-28 19:01:47 +00:00
Owner

What

Adds GET /api/export, the read side of #286 — the first slice of import/export. The authenticated user downloads all of their domain data as a portable carol-export-<date>.tar.gz: YAML docs at the archive root plus a media/ dir for blobs.

carol-export/
  manifest.yaml        # formatVersion: 1, exportedAt, appVersion, per-entity counts
  profile.yaml         # profile (+ contacts), picture -> media/profile-avatar.webp
  skills.yaml          # skill_sections (+ skills)
  education.yaml       # educations
  experience.yaml      # jobs -> positions -> contributions
  projects.yaml        # projects
  people.yaml          # people (+ contacts, notes, relatives, met_through, organizations)
  organizations.yaml   # organizations (+ links, key_people)
  notes.yaml           # notes
  media/profile-avatar.webp

How

  • apps/api/lib/export/serialize.tsbuildExportData(db, userId): gathers every domain entity through the existing repositories, nests children under their parents (parent UUID is the nesting key), and omits user_id from every emitted row (it is re-scoped to the importer on import, and omitting it keeps the archive instance-portable). Pure data — no YAML/tar here. Resolves the profile-picture blob into media/ (skips silently if absent).
  • apps/api/lib/export/archive.tsbuildExportArchive(data): YAML-serializes each doc (yaml), packs with tar-stream, gzips (zlib), and builds the manifest (formatVersion: 1, exportedAt, appVersion from apps/api/package.json, per-entity counts).
  • apps/api/app/api/export/route.ts — authed GET (getAuthIdentityunauthorized() on null), returns application/gzip with Content-Disposition: attachment; filename="carol-export-<YYYY-MM-DD>.tar.gz". Mirrors the binary response shape of GET /api/profile/picture. Not added to the public-routes allowlist — auth required.
  • OpenAPI — registered under a new data tag; openapi.json regenerated; openapi:check + openapi:coverage pass.
  • ADR-0028 — documents the gzipped-tar-of-a-directory format, domain-only scope (auth/identity/settings excluded and why), the three (future) import modes, dry-run-preview semantics, user_id re-scoping, and the transaction decision (real db.transaction() + a file-backed SQLite test for the future import apply path, superseding the libsql in-memory caveat at apps/api/db/repositories/skill-sections.ts).
  • Tests — both-engine serializer test (tests/db/export-serialize.test.ts) and an API-level route test (tests/api/export.test.ts) that gunzips + untars the response and verifies manifest counts, YAML contents, media bytes, and a 401 unauth case.

Corrections to #286's sketch

  • projects.yaml + contributions is a ticket error. contributions are children of positions (contributions.position_id → positions → jobs), so they nest under experience.yaml. projects.yaml holds projects only (projects have no child rows).
  • Contracts. Contracts share the jobs table (is_contract), but JobsRepository.listByUserId filters them out. They are domain data and their positions would otherwise be orphaned, so the serializer reads every job for the user directly and keeps the is_contract flag on each emitted row.
  • Media naming. Profiles have no surrogate id (the row is keyed by user_id), so the single avatar is stored at the fixed media/profile-avatar.webp rather than media/<uuid>-<name>. Documented in ADR-0028.

Out of scope (filed as follow-ups)

Import preview (#317), import apply with three modes (#318), settings UI (#319), and the cross-cutting i18n + remaining import tests (#320). This PR also lands ADR-0028 and the export tests (tracked under #320).

Verification

  • pnpm -F @carol/api test — full suite green; new export tests pass (SQLite leg locally; the Postgres leg runs in CI).
  • pnpm -F @carol/api openapi:check && openapi:coverage — no drift, route covered.
  • pnpm -F @carol/api build — compiles; /api/export registered.
  • pnpm -F @carol/api lint — clean.

Closes #316
Refs #286

🤖 Generated with Claude Code

## What Adds `GET /api/export`, the read side of #286 — the first slice of import/export. The authenticated user downloads **all of their domain data** as a portable `carol-export-<date>.tar.gz`: YAML docs at the archive root plus a `media/` dir for blobs. ``` carol-export/ manifest.yaml # formatVersion: 1, exportedAt, appVersion, per-entity counts profile.yaml # profile (+ contacts), picture -> media/profile-avatar.webp skills.yaml # skill_sections (+ skills) education.yaml # educations experience.yaml # jobs -> positions -> contributions projects.yaml # projects people.yaml # people (+ contacts, notes, relatives, met_through, organizations) organizations.yaml # organizations (+ links, key_people) notes.yaml # notes media/profile-avatar.webp ``` ## How - **`apps/api/lib/export/serialize.ts`** — `buildExportData(db, userId)`: gathers every domain entity through the existing repositories, nests children under their parents (parent UUID is the nesting key), and **omits `user_id` from every emitted row** (it is re-scoped to the importer on import, and omitting it keeps the archive instance-portable). Pure data — no YAML/tar here. Resolves the profile-picture blob into `media/` (skips silently if absent). - **`apps/api/lib/export/archive.ts`** — `buildExportArchive(data)`: YAML-serializes each doc (`yaml`), packs with `tar-stream`, gzips (`zlib`), and builds the manifest (`formatVersion: 1`, `exportedAt`, `appVersion` from `apps/api/package.json`, per-entity counts). - **`apps/api/app/api/export/route.ts`** — authed `GET` (`getAuthIdentity` → `unauthorized()` on null), returns `application/gzip` with `Content-Disposition: attachment; filename="carol-export-<YYYY-MM-DD>.tar.gz"`. Mirrors the binary response shape of `GET /api/profile/picture`. Not added to the public-routes allowlist — auth required. - **OpenAPI** — registered under a new `data` tag; `openapi.json` regenerated; `openapi:check` + `openapi:coverage` pass. - **ADR-0028** — documents the gzipped-tar-of-a-directory format, domain-only scope (auth/identity/settings excluded and why), the three (future) import modes, dry-run-preview semantics, `user_id` re-scoping, and the transaction decision (real `db.transaction()` + a file-backed SQLite test for the future import apply path, superseding the libsql in-memory caveat at `apps/api/db/repositories/skill-sections.ts`). - **Tests** — both-engine serializer test (`tests/db/export-serialize.test.ts`) and an API-level route test (`tests/api/export.test.ts`) that gunzips + untars the response and verifies manifest counts, YAML contents, media bytes, and a 401 unauth case. ## Corrections to #286's sketch - **`projects.yaml` + contributions is a ticket error.** `contributions` are children of `positions` (`contributions.position_id → positions → jobs`), so they nest under `experience.yaml`. `projects.yaml` holds projects only (projects have no child rows). - **Contracts.** Contracts share the `jobs` table (`is_contract`), but `JobsRepository.listByUserId` filters them out. They are domain data and their positions would otherwise be orphaned, so the serializer reads every job for the user directly and keeps the `is_contract` flag on each emitted row. - **Media naming.** Profiles have no surrogate id (the row is keyed by `user_id`), so the single avatar is stored at the fixed `media/profile-avatar.webp` rather than `media/<uuid>-<name>`. Documented in ADR-0028. ## Out of scope (filed as follow-ups) Import preview (#317), import apply with three modes (#318), settings UI (#319), and the cross-cutting i18n + remaining import tests (#320). This PR also lands ADR-0028 and the export tests (tracked under #320). ## Verification - `pnpm -F @carol/api test` — full suite green; new export tests pass (SQLite leg locally; the Postgres leg runs in CI). - `pnpm -F @carol/api openapi:check && openapi:coverage` — no drift, route covered. - `pnpm -F @carol/api build` — compiles; `/api/export` registered. - `pnpm -F @carol/api lint` — clean. Closes #316 Refs #286 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(api): export user domain data as a tar.gz archive
Some checks failed
Commits / Conventional Commits (pull_request) Successful in 10s
PR / OSV-Scanner (pull_request) Successful in 1m23s
PR / pnpm audit (pull_request) Successful in 1m49s
PR / Typecheck (pull_request) Failing after 2m3s
PR / Lint (pull_request) Successful in 2m56s
PR / OpenAPI (pull_request) Failing after 3m1s
PR / Client (web export smoke) (pull_request) Successful in 3m9s
PR / Test (sqlite) (pull_request) Successful in 3m16s
PR / Package age policy (soft) (pull_request) Successful in 1m12s
Secrets / gitleaks (pull_request) Successful in 27s
PR / Test (postgres) (pull_request) Successful in 3m26s
PR / Build (pull_request) Successful in 4m1s
PR / Coverage (soft) (pull_request) Successful in 2m27s
PR / Trivy (image) (pull_request) Successful in 3m29s
PR / Static analysis (pull_request) Successful in 5m46s
1c368ee0df
Add GET /api/export, the read side of #286. The authenticated user
downloads all of their domain data as a portable
carol-export-<date>.tar.gz: YAML docs at the archive root plus a
media/ dir for blobs.

- lib/export/serialize.ts gathers every domain entity via the
  repositories, nests children under parents, and omits user_id from
  every emitted row (re-scoped on import; keeps the archive portable).
  Reads all jobs (incl. contracts) directly so contract positions
  aren't orphaned.
- lib/export/archive.ts YAML-serializes each doc, packs with
  tar-stream, gzips, and builds the manifest (formatVersion 1,
  exportedAt, appVersion, per-entity counts).
- OpenAPI registration under a new `data` tag; regenerated openapi.json.
- ADR-0028 documents the archive format, domain-only scope, the three
  (future) import modes, dry-run preview, user_id re-scoping, and the
  transaction decision for the import apply path.
- Both-engine serializer test + API-level route test (gunzip+untar,
  manifest counts, YAML contents, media bytes, 401 unauth).

Closes #316
Refs #286

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

📊 Test coverage

Patch coverage: no testable lines changed.

Overall (app/, lib/, db/, excluding UI per ADR-0019):

Metric Value Soft target
Lines 82.3% ≥ 50%
Branches 73.4% ⚠️ ≥ 75%
Functions 91.6% informational

Soft thresholds per ADR-0019. Coverage is informational and does not block merge.

<!-- coverage-comment --> ## 📊 Test coverage **Patch coverage:** no testable lines changed. **Overall** (`app/`, `lib/`, `db/`, excluding UI per ADR-0019): | Metric | Value | Soft target | |---|---|---| | Lines | 82.3% ✅ | ≥ 50% | | Branches | 73.4% ⚠️ | ≥ 75% | | Functions | 91.6% | informational | Soft thresholds per [ADR-0019](docs/adr/0019-coverage-soft-targets.md). Coverage is informational and does not block merge.
fix(api): typecheck export test + regenerate api-client types
All checks were successful
Commits / Conventional Commits (pull_request) Successful in 4s
PR / OSV-Scanner (pull_request) Successful in 1m54s
PR / Static analysis (pull_request) Successful in 2m3s
PR / pnpm audit (pull_request) Successful in 2m13s
PR / Test (sqlite) (pull_request) Successful in 2m13s
PR / Build (pull_request) Successful in 2m21s
PR / Test (postgres) (pull_request) Successful in 2m26s
PR / OpenAPI (pull_request) Successful in 2m53s
PR / Lint (pull_request) Successful in 2m59s
PR / Client (web export smoke) (pull_request) Successful in 3m13s
PR / Package age policy (soft) (pull_request) Successful in 59s
Secrets / gitleaks (pull_request) Successful in 1m0s
PR / Typecheck (pull_request) Successful in 3m20s
PR / Coverage (soft) (pull_request) Successful in 1m58s
PR / Trivy (image) (pull_request) Successful in 2m12s
a04e59bd63
The export-serialize test indexed arrays under noUncheckedIndexedAccess
(TS2532) and the api-client generated types were not regenerated after
adding GET /api/export, failing the Typecheck and OpenAPI CI jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
james force-pushed feat/data-export from a04e59bd63
All checks were successful
Commits / Conventional Commits (pull_request) Successful in 4s
PR / OSV-Scanner (pull_request) Successful in 1m54s
PR / Static analysis (pull_request) Successful in 2m3s
PR / pnpm audit (pull_request) Successful in 2m13s
PR / Test (sqlite) (pull_request) Successful in 2m13s
PR / Build (pull_request) Successful in 2m21s
PR / Test (postgres) (pull_request) Successful in 2m26s
PR / OpenAPI (pull_request) Successful in 2m53s
PR / Lint (pull_request) Successful in 2m59s
PR / Client (web export smoke) (pull_request) Successful in 3m13s
PR / Package age policy (soft) (pull_request) Successful in 59s
Secrets / gitleaks (pull_request) Successful in 1m0s
PR / Typecheck (pull_request) Successful in 3m20s
PR / Coverage (soft) (pull_request) Successful in 1m58s
PR / Trivy (image) (pull_request) Successful in 2m12s
to 966ca62988
All checks were successful
Commits / Conventional Commits (pull_request) Successful in 3s
PR / OSV-Scanner (pull_request) Successful in 1m28s
PR / Static analysis (pull_request) Successful in 1m46s
PR / pnpm audit (pull_request) Successful in 2m18s
PR / Client (web export smoke) (pull_request) Successful in 2m31s
PR / OpenAPI (pull_request) Successful in 3m3s
PR / Lint (pull_request) Successful in 3m8s
PR / Typecheck (pull_request) Successful in 3m20s
PR / Test (postgres) (pull_request) Successful in 3m21s
PR / Build (pull_request) Successful in 3m35s
PR / Package age policy (soft) (pull_request) Successful in 1m15s
Secrets / gitleaks (pull_request) Successful in 1m6s
PR / Test (sqlite) (pull_request) Successful in 3m44s
PR / Trivy (image) (pull_request) Successful in 2m20s
PR / Coverage (soft) (pull_request) Successful in 2m21s
2026-06-28 18:48:04 +00:00
Compare
james merged commit d13f7ca265 into main 2026-06-28 19:01:47 +00:00
james deleted branch feat/data-export 2026-06-28 19:01:48 +00:00
Sign in to join this conversation.
No description provided.