import/export all user's data #286

Open
opened 2026-06-24 22:47:52 +00:00 by james · 0 comments
Owner

Let a user export all of their domain data to a portable archive and import it back — into the same instance or a different one. UI lives in the settings screen.

Format — gzipped directory tree (.tar.gz)

Evolved from the original "single gzipped yaml": binary media don't belong inline, so the archive is a gzipped tar of a directory: YAML files at the root, a media/ dir for blobs.

carol-export/
  manifest.yaml          # formatVersion, exportedAt, appVersion, per-entity counts
  profile.yaml           # profiles + profile_contacts
  skills.yaml            # skill_sections + skills
  education.yaml         # educations
  experience.yaml        # jobs + positions
  projects.yaml          # projects + contributions
  people.yaml            # people + person_contacts/notes/relatives/met_through/organizations
  organizations.yaml     # organizations + organization_links + organization_key_people
  notes.yaml             # notes
  media/
    <uuid>-<name>        # avatars / profile + org photos from blob storage
  • UUID linking is free. Every domain table already uses text PKs populated by randomUUID(), so FKs are real UUIDs today. Export emits the existing IDs; cross-file references are by UUID. Each YAML file is a list of entities keyed by their UUID; child rows nest under their parent or carry the parent UUID.
  • manifest.yaml carries formatVersion: 1 (forward-compat gate), exportedAt, the Carol appVersion (ties to the versioning work in #285), and per-entity counts (drives the import preview).
  • media/ holds the raw blobs; YAML references them by relative path (media/<file>). Empty/absent when the user has no uploaded assets.

Scope — domain data only

Included: profile (+contacts), skill sections + skills, education, jobs + positions, projects + contributions, people (+ all person_* children), organizations (+ links + key people), notes — plus their media.

Excluded: users/auth (password hashes, OAuth identities, sessions, PATs, login attempts), and user settings/preferences. Auth is instance-bound and unsafe to ship; settings are out of scope for this pass. Revisit if a "full account backup" ticket lands.

Import — user chooses the mode, with a dry-run preview first

Two-step, never trust the file's user_id (every imported row is re-scoped to the authenticated importer — prevents cross-user injection):

  1. Preview (dry-run). Upload the archive → server validates (format version, per-file schema, referential integrity, media presence, tar path-traversal/zip-slip guard, size/content-type limits reusing blob-storage validation) and returns a summary: counts per entity, format version, and what each mode would do to the current account. No writes.
  2. Apply. User picks a mode + confirms. Runs in a single transaction through the DB abstraction (must pass on SQLite and Postgres); all-or-nothing.

Modes (offered in the UI at import time):

  • Replace — delete the importer's domain rows, then restore from the archive (UUIDs preserved). Exact snapshot restore. Destructive → explicit warning + confirm.
  • Merge by UUID — upsert: rows whose UUID matches are updated, the rest inserted. Children reconciled under their parent. Good for syncing two instances.
  • Import as copies — regenerate every UUID (consistently remapping FKs) and always insert. Never touches existing data; can duplicate.

API (auth required, user-scoped, RFC 7807 + zod)

  • GET /api/exportapplication/gzip, Content-Disposition: attachment; filename=carol-export-<date>.tar.gz. Streams the archive for the authenticated user.
  • POST /api/import/preview (multipart upload) → dry-run summary JSON (counts, per-mode effects, validation errors). Stateless — the client keeps the file.
  • POST /api/import (multipart: file + mode + confirm) → applies in one transaction, returns a result summary. Re-sends the same file (no server-side staging to expire).

File-transfer endpoints, not lists → cursor pagination N/A.

UI — settings screen (apps/client/app/(app)/account.tsx)

New "Your data" card. Strings via the account.* i18n namespace; Carol's voice (first person, sentence case, no emoji).

  • Export — "Export my data" → builds + downloads the archive (web: blob download; native: expo-file-system save / share sheet). Spinner while building.
  • Import — file picker (expo-document-picker) → POST /api/import/preview → summary screen showing counts + a mode selector (Replace / Merge / Import as copies) with a destructive-action warning on Replace → confirm → POST /api/import → success/error result.

Cross-cutting

  • ADR documenting the format (gzipped tar dir), domain-only scope, the three import modes, and dry-run-preview semantics (per CLAUDE.md's ADR rule).
  • Both-engine tests for the import apply path (round-trip export→import on SQLite and Postgres).
  • README: no new runtime env var expected (temp/extraction uses existing storage conventions); add one only if extraction needs a configurable dir.

Suggested breakdown (child tickets)

  1. Export serializer + GET /api/export (tar.gz assembly, media copy from blob storage).
  2. Import parser + validator + POST /api/import/preview (schema, referential integrity, zip-slip/size guards).
  3. Import apply — three modes, transactional, both engines, user_id re-scoping.
  4. Settings UI — export action + import flow (picker → preview → mode picker → confirm → result).
  5. ADR + i18n strings + both-engine tests.

Out of scope (today)

Auth/identity/token export, settings export, scheduled/automatic backups, archive encryption. Pull in via a follow-up ticket if needed.

Let a user export **all of their domain data** to a portable archive and import it back — into the same instance or a different one. UI lives in the settings screen. ## Format — gzipped directory tree (`.tar.gz`) > Evolved from the original "single gzipped yaml": binary media don't belong inline, so the archive is a gzipped **tar of a directory**: YAML files at the root, a `media/` dir for blobs. ``` carol-export/ manifest.yaml # formatVersion, exportedAt, appVersion, per-entity counts profile.yaml # profiles + profile_contacts skills.yaml # skill_sections + skills education.yaml # educations experience.yaml # jobs + positions projects.yaml # projects + contributions people.yaml # people + person_contacts/notes/relatives/met_through/organizations organizations.yaml # organizations + organization_links + organization_key_people notes.yaml # notes media/ <uuid>-<name> # avatars / profile + org photos from blob storage ``` - **UUID linking is free.** Every domain table already uses `text` PKs populated by `randomUUID()`, so FKs are real UUIDs today. Export emits the existing IDs; cross-file references are by UUID. Each YAML file is a list of entities keyed by their UUID; child rows nest under their parent or carry the parent UUID. - `manifest.yaml` carries `formatVersion: 1` (forward-compat gate), `exportedAt`, the Carol `appVersion` (ties to the versioning work in #285), and per-entity counts (drives the import preview). - `media/` holds the raw blobs; YAML references them by relative path (`media/<file>`). Empty/absent when the user has no uploaded assets. ## Scope — domain data only **Included:** profile (+contacts), skill sections + skills, education, jobs + positions, projects + contributions, people (+ all person_* children), organizations (+ links + key people), notes — plus their media. **Excluded:** users/auth (password hashes, OAuth identities, sessions, PATs, login attempts), and user settings/preferences. Auth is instance-bound and unsafe to ship; settings are out of scope for this pass. Revisit if a "full account backup" ticket lands. ## Import — user chooses the mode, with a dry-run preview first Two-step, never trust the file's `user_id` (every imported row is re-scoped to the authenticated importer — prevents cross-user injection): 1. **Preview (dry-run).** Upload the archive → server validates (format version, per-file schema, referential integrity, media presence, tar path-traversal/zip-slip guard, size/content-type limits reusing blob-storage validation) and returns a summary: counts per entity, format version, and what **each mode** would do to the current account. No writes. 2. **Apply.** User picks a mode + confirms. Runs in a single transaction through the DB abstraction (must pass on **SQLite and Postgres**); all-or-nothing. **Modes (offered in the UI at import time):** - **Replace** — delete the importer's domain rows, then restore from the archive (UUIDs preserved). Exact snapshot restore. Destructive → explicit warning + confirm. - **Merge by UUID** — upsert: rows whose UUID matches are updated, the rest inserted. Children reconciled under their parent. Good for syncing two instances. - **Import as copies** — regenerate every UUID (consistently remapping FKs) and always insert. Never touches existing data; can duplicate. ## API (auth required, user-scoped, RFC 7807 + zod) - `GET /api/export` → `application/gzip`, `Content-Disposition: attachment; filename=carol-export-<date>.tar.gz`. Streams the archive for the authenticated user. - `POST /api/import/preview` (multipart upload) → dry-run summary JSON (counts, per-mode effects, validation errors). Stateless — the client keeps the file. - `POST /api/import` (multipart: file + `mode` + confirm) → applies in one transaction, returns a result summary. Re-sends the same file (no server-side staging to expire). File-transfer endpoints, not lists → cursor pagination N/A. ## UI — settings screen (`apps/client/app/(app)/account.tsx`) New **"Your data"** card. Strings via the `account.*` i18n namespace; Carol's voice (first person, sentence case, no emoji). - **Export** — "Export my data" → builds + downloads the archive (web: blob download; native: `expo-file-system` save / share sheet). Spinner while building. - **Import** — file picker (`expo-document-picker`) → `POST /api/import/preview` → summary screen showing counts + a mode selector (Replace / Merge / Import as copies) with a destructive-action warning on Replace → confirm → `POST /api/import` → success/error result. ## Cross-cutting - **ADR** documenting the format (gzipped tar dir), domain-only scope, the three import modes, and dry-run-preview semantics (per CLAUDE.md's ADR rule). - **Both-engine tests** for the import apply path (round-trip export→import on SQLite and Postgres). - README: no new runtime env var expected (temp/extraction uses existing storage conventions); add one only if extraction needs a configurable dir. ## Suggested breakdown (child tickets) 1. Export serializer + `GET /api/export` (tar.gz assembly, media copy from blob storage). 2. Import parser + validator + `POST /api/import/preview` (schema, referential integrity, zip-slip/size guards). 3. Import apply — three modes, transactional, both engines, `user_id` re-scoping. 4. Settings UI — export action + import flow (picker → preview → mode picker → confirm → result). 5. ADR + i18n strings + both-engine tests. ## Out of scope (today) Auth/identity/token export, settings export, scheduled/automatic backups, archive encryption. Pull in via a follow-up ticket if needed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
james/carol#286
No description provided.