diff --git a/docs/README.md b/docs/README.md index 4155163..2de86de 100644 --- a/docs/README.md +++ b/docs/README.md @@ -30,6 +30,7 @@ These will be deleted once the backend services are stood up: - [guides/backend-linear-tickets.md](./guides/backend-linear-tickets.md) - [guides/template-recommendation-matrix.md](./guides/template-recommendation-matrix.md) - [guides/ops-backend-deploy.md](./guides/ops-backend-deploy.md) — technical deploy handoff + cutover plan (Cloudron, env vars, health checks, follow-up tickets). +- [guides/ops-runbook.md](./guides/ops-runbook.md) — steady-state operator runbook: deploy, rollback, restore drill, single-instance limits. ## Cursor rules diff --git a/docs/guides/backend-linear-tickets.md b/docs/guides/backend-linear-tickets.md index 740b3b2..61440c4 100644 --- a/docs/guides/backend-linear-tickets.md +++ b/docs/guides/backend-linear-tickets.md @@ -632,9 +632,9 @@ _Section B — Final Review screen `+` button per category:_ **Depends on:** Tickets 1–8 complete enough to deploy a vertical slice. -**Server / admin:** Cloudron admin access on `my.medlab.host` granted. Scope of this ticket is the **handoff doc + cutover plan** — exactly what's in place, what the side-by-side cutover looks like, and what open product/infra questions remain. The steady-state operator runbook is split out into [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) (we write it after we've done the work). +**Server / admin:** Cloudron admin access on `my.medlab.host` granted. Scope of this ticket is the **handoff doc + cutover plan** — exactly what's in place, what the side-by-side cutover looks like, and closed product/infra decisions. The steady-state operator runbook is [`ops-runbook.md`](ops-runbook.md) ([CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) — **Done**). -**Goal:** Short doc that captures (a) granted access + auto-injected vs. manually-set env vars + platform settings, (b) the side-by-side → apex cutover plan with the legacy `communityrule.info` service, and (c) the remaining open questions (apex vs. permanent-subdomain final URL, legacy `rules` data communication, container registry choice). +**Goal:** Short doc that captures (a) granted access + auto-injected vs. manually-set env vars + platform settings, (b) the side-by-side → apex cutover plan with the legacy `communityrule.info` service, and (c) closed product/infra decisions (final URL, legacy rules archive, container registry). **Platform context:** Target is **Cloudron at MEDLab** (`my.medlab.host`). The legacy `communityrule.info` is a single Cloudron **LAMP** app (`lamp.cloudronapp.php74@5.1.2`, 512 MiB at apex) hosting **three things stuffed into one container** under `/app/data/public/`: the static marketing site, the Express/MySQL backend at [`CommunityRule/CommunityRuleBackend`](https://git.medlab.host/CommunityRule/CommunityRuleBackend) (kept alive by a 30-min `run.sh` watchdog on port 3000; MySQL is the LAMP package's bundled MySQL, not a Cloudron addon), and the Flask chatbot at [`CommunityRule/CommunityRuleChatBot`](https://git.medlab.host/CommunityRule/CommunityRuleChatBot) (currently crash-looping with `ModuleNotFoundError`, last touched May 2024). New app is a properly packaged Cloudron app (Docker image + `CloudronManifest.json`, **postgresql + sendmail + localstorage** addons) and replaces all three — **no data migration**. Cloudron's container supervisor replaces the watchdog. @@ -646,7 +646,7 @@ _Section B — Final Review screen `+` button per category:_ - **§3 Env vars** split into Cloudron auto-injected (`CLOUDRON_POSTGRESQL_URL`, `CLOUDRON_MAIL_SMTP_*`) vs. manually-set (`SESSION_SECRET`, `SMTP_FROM`, `NEXT_PUBLIC_ENABLE_BACKEND_SYNC`). Notes that addons are manifest-declared, not platform-enabled, and that platform mail is SES-relayed on `communityrule.info` with custom-from allowed. - **§4 Platform settings** (`httpPort: 3000`, `healthCheckPath: /api/health`, 512 MiB to start, automatic backups already on). - **§5 Cutover plan** — staging at `staging.communityrule.info`, soft-launch, apex cutover at scheduled low-traffic window (~5–15 min downtime). - - **§6 Open questions** — apex vs. permanent subdomain final URL; legacy `rules` data communication; container registry choice. + - **§6 Decisions** — final URL (`communityrule.info` apex); legacy `rules` export to Gitea archive (§6.1); container registry (Gitea, done). - **§7 Old vs new deltas** (LAMP-package detail, watchdog, OTP→magic link, sender, API surface, chatbot). - **§8 Follow-up tickets** (the six tickets below). 2. Cross-links: [`docs/guides/backend-roadmap.md`](backend-roadmap.md) §11 (environments — names Cloudron at MEDLab) and §8 (migrations policy — never rewrite applied migrations). @@ -656,7 +656,7 @@ _Section B — Final Review screen `+` button per category:_ - [x] Admin handoff covers exactly the access that was needed (most self-serve via Cloudron admin login). - [x] Cutover plan is side-by-side and explicitly avoids in-place apex replacement. - [x] Six follow-up tickets enumerated and linked, with CR-99 + CR-101 scope corrected to reflect that legacy is one LAMP slot containing marketing + backend + chatbot (all retire together). -- [x] Open product/infra questions surfaced rather than assumed. +- [x] Closed product/infra decisions documented (§6 + §6.1). **Files:** [`docs/guides/ops-backend-deploy.md`](ops-backend-deploy.md), [`docs/guides/backend-roadmap.md`](backend-roadmap.md), [`docs/README.md`](../README.md), [`CONTRIBUTING.md`](../../CONTRIBUTING.md). @@ -672,9 +672,9 @@ All six are titled `[Backend] …`, assigned to Vinod, in the **community-rule** | 2 | [CR-97](https://linear.app/community-rule/issue/CR-97/backend-container-image-registry-choose-build-push) | `[Backend] Container image registry: choose, build, push` | **Done** — first image `0.1.0` verified | | 3 | [CR-98](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke) | `[Backend] Cloudron staging install + smoke` | Cloudron CLI token (§2) — **next** | | 4 | [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) | `[Backend] Cloudron production install + apex cutover` | CR-98 green for the agreed overlap window | -| 5 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | `[Backend] Steady-state operator runbook` | CR-98 (write what we actually did) | +| 5 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | `[Backend] Steady-state operator runbook` | **Done** — [ops-runbook.md](ops-runbook.md) | | 6 | [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app) | `[Backend] Decommission legacy CommunityRule LAMP app` | CR-99 + sign-off window | -| 7 | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | `[Backend] Decide fate of legacy rules table (read-only export?)` | must resolve before CR-99 maintenance window | +| 7 | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | `[Backend] Legacy rules archive export` | execute during CR-99 window (§6.1) | ### PR plan (CR-96 – CR-102) @@ -684,16 +684,16 @@ All six are titled `[Backend] …`, assigned to Vinod, in the **community-rule** | ----- | ------ | ---------------- | ---- | ------ | ---------- | | 1 | [CR-96](https://linear.app/community-rule/issue/CR-96/backend-bridge-cloudron-env-vars-to-canonical-names) | `adilallo/Backend/BridgeCloudronEnv` — *[Backend] Cloudron-native environment variables* | repo | **Done** | — | | 2 | [CR-97](https://linear.app/community-rule/issue/CR-97/backend-container-image-registry-choose-build-push) | Container registry packaging + `docker-release.sh` | repo | **Done** | — | -| — | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | TBD — optional repo PR if export tooling/docs needed | product / repo | **Parallel** | row count from legacy MySQL (pre–CR-99 backup) | +| — | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | — (ops during CR-99; [ops-backend-deploy.md §6.1](ops-backend-deploy.md#61-legacy-rules-archive-cr-102)) | ops | **Parallel** | CR-99 window | | 3 | [CR-98](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke) | — (ops checklist; [ops-backend-deploy.md §10](ops-backend-deploy.md#10-staging-install--smoke-cr-98)) | ops | **Next** | Cloudron CLI token only | -| 4 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | TBD — `docs/guides/ops-runbook.md` | docs | Backlog | CR-98 (write what we actually did) | -| 5 | [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) | — (ops; maintenance window) | ops | Backlog | CR-98 green + CR-102 resolved | +| 4 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | [`ops-runbook.md`](ops-runbook.md) | docs | **Done** | — | +| 5 | [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) | — (ops; maintenance window + §6.1 export) | ops | Backlog | CR-98 green | | 6 | [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app) | — (ops; uninstall LAMP slot) | ops | Backlog | CR-99 + sign-off window | **What's next:** **CR-98** — staging install + smoke at `staging.communityrule.info` ([ops-backend-deploy.md §10](ops-backend-deploy.md#10-staging-install--smoke-cr-98)). -Start **CR-102** product decision in parallel so it is resolved before the CR-99 -cutover window. +**CR-102** (legacy rules Gitea export) runs during the CR-99 cutover window +([§6.1](ops-backend-deploy.md#61-legacy-rules-archive-cr-102)). **Per-ticket detail:** @@ -705,7 +705,7 @@ cutover window. - **Configure:** `SESSION_SECRET`, `SMTP_FROM`, `NEXT_PUBLIC_ENABLE_BACKEND_SYNC=true`, `UPLOAD_ROOT=/app/data/uploads`. - **Acceptance:** `GET /api/health` → `{"ok":true,"database":"connected"}`; magic-link sign-in end-to-end; publish a rule succeeds. 4. **Cloudron production install + DNS cutover.** Acceptance: production subdomain resolves to the new app; old subdomain still works during overlap; sign-in + publish succeed against production; backups confirmed. -5. **Steady-state operator runbook.** Lives at `docs/guides/ops-runbook.md` (sibling to the handoff). Covers deploy a new version, rollback, restore drill cadence, multi-instance limitations from [`backend-roadmap.md`](backend-roadmap.md) §5/§7. Acceptance: a fresh reader can deploy + roll back using only this doc. +5. **Steady-state operator runbook.** **Done** — [`docs/guides/ops-runbook.md`](ops-runbook.md). Covers deploy, rollback, restore drill, single-instance limits. 6. **Decommission legacy Express/MySQL backend.** Acceptance: old Cloudron app stopped + uninstalled; old MySQL addon backed up once and removed; legacy Gitea repo README updated to point at this app. Priority: Low. --- @@ -848,10 +848,10 @@ Tickets **10–11** can be deferred without blocking the core “auth + drafts + | 12.1 | [CR-96](https://linear.app/community-rule/issue/CR-96/backend-bridge-cloudron-env-vars-to-canonical-names) | Cloudron-native env vars | **Done** | | 12.2 | [CR-97](https://linear.app/community-rule/issue/CR-97/backend-container-image-registry-choose-build-push) | Container image registry + CI | **Done** — image `0.1.0` | | 12.3 | [CR-98](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke) | Cloudron staging install + smoke | **Next** — [ops-backend-deploy.md §10](ops-backend-deploy.md#10-staging-install--smoke-cr-98) | -| 12.4 | [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) | Production install + apex cutover | Ops — after CR-98 + CR-102 | -| 12.5 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | Steady-state operator runbook | Docs PR — after CR-98 | +| 12.4 | [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) | Production install + apex cutover | Ops — after CR-98; includes §6.1 export | +| 12.5 | [CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) | Steady-state operator runbook | **Done** — [ops-runbook.md](ops-runbook.md) | | 12.6 | [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app) | Decommission legacy LAMP app | Ops — after CR-99 + sign-off | -| 12.7 | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | Legacy `rules` table fate / export | **Parallel** — before CR-99 | +| 12.7 | [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) | Legacy rules Gitea archive export | Ops — during CR-99 window (§6.1) | | 13 | [CR-84](https://linear.app/community-rule/issue/CR-84/backend-api-error-contract-request-id-logging) | API errors + request-id logging | — | | 14 | [CR-85](https://linear.app/community-rule/issue/CR-85/backend-custom-session-lifecycle-cleanup-invalidation-policy) | Session lifecycle + cleanup **Done** | — | | 15 | [CR-86](https://linear.app/community-rule/issue/CR-86/backend-profile-dashboard-account-figma-profile) | Profile + account (Figma 22143:900069) | — | diff --git a/docs/guides/backend-roadmap.md b/docs/guides/backend-roadmap.md index 0dc535b..1452f88 100644 --- a/docs/guides/backend-roadmap.md +++ b/docs/guides/backend-roadmap.md @@ -221,7 +221,7 @@ npm run dev **Optional QA:** Run automated tests against an **ephemeral** database in CI instead of maintaining a fourth long-lived server. -**Target platform:** **Cloudron at MEDLab** — same host as the legacy [`CommunityRule/CommunityRuleBackend`](https://git.medlab.host/CommunityRule/CommunityRuleBackend) (Express + MySQL). The new app is packaged as a proper Cloudron app (Docker image + `CloudronManifest.json`, **postgresql + sendmail + localstorage** addons). Cloudron's container supervisor replaces the legacy 30-min `run.sh` watchdog. Admin handoff (access, env vars, platform settings, open decisions): [`docs/guides/ops-backend-deploy.md`](ops-backend-deploy.md). The app reads Cloudron-injected `CLOUDRON_POSTGRESQL_URL` and `CLOUDRON_MAIL_SMTP_*` via [`lib/server/env.ts`](../../lib/server/env.ts) (CR-96). +**Target platform:** **Cloudron at MEDLab** — same host as the legacy [`CommunityRule/CommunityRuleBackend`](https://git.medlab.host/CommunityRule/CommunityRuleBackend) (Express + MySQL). The new app is packaged as a proper Cloudron app (Docker image + `CloudronManifest.json`, **postgresql + sendmail + localstorage** addons). Cloudron's container supervisor replaces the legacy 30-min `run.sh` watchdog. First-time install and cutover: [`docs/guides/ops-backend-deploy.md`](ops-backend-deploy.md). Steady-state deploy, rollback, and restore drill: [`docs/guides/ops-runbook.md`](ops-runbook.md). The app reads Cloudron-injected `CLOUDRON_POSTGRESQL_URL` and `CLOUDRON_MAIL_SMTP_*` via [`lib/server/env.ts`](../../lib/server/env.ts) (CR-96). **Admin / infra (coordinate with whoever runs the server):** diff --git a/docs/guides/ops-backend-deploy.md b/docs/guides/ops-backend-deploy.md index 585c518..8e49236 100644 --- a/docs/guides/ops-backend-deploy.md +++ b/docs/guides/ops-backend-deploy.md @@ -131,11 +131,13 @@ apex. only step with brief downtime (~5–15 min). Sequence: 1. Take one final manual backup of the legacy LAMP app (Cloudron *Backups* tab → *Backup now*). - 2. `cloudron uninstall` the legacy app at `communityrule.info`. - 3. `cloudron configure --location communityrule.info` to move the + 2. Export legacy `rules` + `version_history` to the Gitea archive + per [§6.1](#61-legacy-rules-archive-cr-102). + 3. `cloudron uninstall` the legacy app at `communityrule.info`. + 4. `cloudron configure --location communityrule.info` to move the validated staging install to the apex (or `cloudron install` fresh at apex if cleaner). - 4. Re-run `prisma migrate deploy`, re-set production env vars if + 5. Re-run `prisma migrate deploy`, re-set production env vars if not preserved by the move, smoke again. 4. **Decommission** — see [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-expressmysql-backend). Hold the final LAMP backup ≥ 90 days for safety. @@ -155,11 +157,11 @@ Product decisions (closed): 1. **Final URL — `communityrule.info` apex.** New app fully replaces the legacy site, including the marketing surface. Brief cutover downtime (~5–15 min) is accepted. -2. **Legacy `rules` data — not migrated.** No data moves into the new - app's Postgres. A pre-cutover **read-only export** of the - `rules` + `version_history` MySQL tables is under consideration; - approach depends on the actual row count, which we'll pull as - part of the CR-99 pre-cutover backup. Tracked in +2. **Legacy `rules` data — not migrated; exported to Gitea.** No data + moves into the new app's Postgres. Before CR-99 uninstalls the + legacy MySQL, operators export the `rules` + `version_history` + tables to a new read-only Gitea repo on `git.medlab.host` (see + [§6.1](#61-legacy-rules-archive-cr-102)). Tracked in [CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export). Infra decision closed: @@ -184,6 +186,50 @@ Infra decision closed: Container Registry** app and re-tag against its hostname; no other changes required. +### 6.1 Legacy rules archive (CR-102) + +The legacy Express backend stores published rules in bundled MySQL +tables `rules` and `version_history` (soft-delete via a `deleted` +column). These do not map to the new app's Postgres schema and are +**not imported**. Instead, a one-time export preserves the library for +posterity and operator lookup. + +**Archive repo:** create +[`CommunityRule/legacy-rules-archive`](https://git.medlab.host/CommunityRule/legacy-rules-archive) +on `git.medlab.host` (same org as the other CommunityRule repos). +Mark the repo **archived** after the cutover push. + +**When:** during the [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) +maintenance window — after the final Cloudron backup (§5 phase 3 step +1) and **before** `cloudron uninstall` (step 3). + +**Export steps (operator):** + +1. From the legacy LAMP container or a restored backup, pull row counts + for `rules` and `version_history` (include non-deleted vs soft-deleted + if useful for the README summary). +2. `mysqldump` both tables to SQL files (`rules.sql`, + `version_history.sql`). +3. Derive human-readable exports (JSON and/or CSV) from the dump for + anyone browsing the archive without MySQL tooling. +4. Commit artifacts + a `README.md` to the archive repo. The README + should record: + - cutover date; + - row counts and a brief activity summary; + - a short field glossary (`deleted`, version rows, etc.); + - a pointer to the new app at `communityrule.info`. +5. Tag the commit (e.g. `legacy-rules-YYYY-MM-DD`) and archive the + Gitea repo. + +**Safety net:** the final Cloudron LAMP backup is retained ≥ 90 days +([CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app)) +for operator recovery if manual lookup from the export is ever needed. + +**User discoverability:** link to the archive repo (or a release +download) from the new app — footer, help page, or a static +`/legacy-archive` page — so users looking for pre-cutover rules can +find it without knowing Gitea exists. + ## 7. Old vs new deltas So nothing surprises anyone at cutover: @@ -231,19 +277,20 @@ All filed in Linear, titled `[Backend] …`, assigned to me, in the 4. [**CR-99**](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover) — `[Backend] Cloudron production install + apex cutover`. Side-by-side cutover at scheduled low-traffic window per §5. - Blocked by CR-98 green + CR-102 resolved. + Blocked by CR-98 green. Includes legacy rules export (§6.1) before + uninstall. 5. [**CR-100**](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook) - — `[Backend] Steady-state operator runbook`. Blocked by CR-98 - (write what we actually did). + — `[Backend] Steady-state operator runbook` (**Done** — + [`ops-runbook.md`](ops-runbook.md)). 6. [**CR-101**](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app) — `[Backend] Decommission legacy CommunityRule LAMP app`. Uninstall the entire LAMP slot (marketing + Express backend + chatbot in one go); preserve final backup ≥ 90 days. Blocked by CR-99 + sign-off window. Priority: Low. 7. [**CR-102**](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export) - — `[Backend] Decide fate of legacy rules table (read-only export?)`. - Count rows + decide whether to publish a static archive before - CR-99 uninstalls the legacy MySQL. Priority: Low. + — `[Backend] Legacy rules archive export`. Decision: export to Gitea + (§6.1). Execute during the CR-99 maintenance window before + uninstall. Priority: Low. ## 9. Build and push image workflow @@ -440,10 +487,12 @@ apex cutover. The app uses an **in-memory** rate limiter in [`lib/server/rateLimit.ts`](../../lib/server/rateLimit.ts) (magic-link requests, organizer inquiry, etc.). This is sufficient for the current **single Cloudron container** per environment. -**Before horizontal scale-out** (multiple app instances behind a load balancer), replace or back the limiter with a shared store (e.g. Redis) so per-IP / per-user windows apply across instances. Until then, document expected limits in the steady-state runbook ([CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook)). +**Before horizontal scale-out** (multiple app instances behind a load balancer), replace or back the limiter with a shared store (e.g. Redis) so per-IP / per-user windows apply across instances. Until then, see [`ops-runbook.md` §6](ops-runbook.md#6-single-instance-limitations). ## 12. Related docs +- [`docs/guides/ops-runbook.md`](ops-runbook.md) — steady-state deploy, + rollback, restore drill, single-instance limits ([CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook)). - [`docs/guides/backend-roadmap.md`](backend-roadmap.md) §11 (environments) and §8 (Prisma migrations policy). - [`docs/guides/backend-linear-tickets.md`](backend-linear-tickets.md) diff --git a/docs/guides/ops-runbook.md b/docs/guides/ops-runbook.md new file mode 100644 index 0000000..5a66797 --- /dev/null +++ b/docs/guides/ops-runbook.md @@ -0,0 +1,274 @@ +# Steady-state operator runbook + +Day-to-day deploy, rollback, and recovery for CommunityRule on MEDLab +Cloudron. Assumes staging or production is already installed and smoke-tested. + +> **First-time install, apex cutover, and legacy decommission** live in +> [`ops-backend-deploy.md`](ops-backend-deploy.md). Use this doc once an +> environment is already running. + +## 1. Quick reference + +| Item | Value | +| ---- | ----- | +| Cloudron dashboard | `https://my.medlab.host` | +| Cloudron CLI login | `cloudron login my.medlab.host` | +| Staging app | `staging.communityrule.info` | +| Production app | `communityrule.info` (after apex cutover) | +| Container image | `git.medlab.host/communityrule/community-rule:` | +| Health check | `GET /api/health` → `200 {"ok":true,"database":"connected"}` | +| Manifest version | [`CloudronManifest.json`](../../CloudronManifest.json) `version` field (must increase for each release) | +| Current manifest | `0.1.8` at time of writing — always read the file before deploying | + +Replace `` below with the Cloudron location (`staging.communityrule.info` +or `communityrule.info`). + +## 2. Prerequisites (one-time per operator) + +1. **Cloudron access** — admin login on `my.medlab.host` and a CLI API token + (*Profile → API Tokens* on the dashboard). Save the token in 1Password. +2. **Cloudron CLI** — logged in: + ```bash + cloudron login my.medlab.host + ``` +3. **Docker + buildx** — Docker Desktop or equivalent with `docker buildx`. +4. **Gitea registry auth** — personal access token on `git.medlab.host` with + `read:package` + `write:package`; then: + ```bash + docker login git.medlab.host + ``` +5. **Repo checkout** — clone + [`CommunityRule/community-rule`](https://git.medlab.host/CommunityRule/community-rule) + and work from a clean commit that matches the release you intend to ship. + +Images are **`linux/amd64` only** (Cloudron host is x86_64). On Apple +Silicon, the release script still builds amd64 via buildx; a bare +`docker pull` without `--platform linux/amd64` failing on arm64 is expected. + +## 3. Deploy a new version + +Typical release flow: bump manifest → build/push image → `cloudron update` +→ smoke. + +### 3.1 Build and push + +1. Check out the commit to release (`main` or a release branch). +2. **Bump** [`CloudronManifest.json`](../../CloudronManifest.json) `version` + (e.g. `0.1.8` → `0.1.9`). Cloudron requires the manifest version to + **increase** for `cloudron update --image` to be accepted. +3. From the repo root, build and push (tag should match the manifest version + for sanity): + + ```bash + TAG=0.1.9 ./scripts/docker-release.sh + # equivalent: + TAG=0.1.9 npm run docker:release + ``` + + Omit `TAG=` to push `git rev-parse --short HEAD` instead — only do that + for ad-hoc staging experiments; production releases should use semver tags + aligned with the manifest. + +4. **Verify anonymous pull** (simulates Cloudron): + ```bash + docker logout git.medlab.host + docker pull --platform linux/amd64 \ + git.medlab.host/communityrule/community-rule:0.1.9 + ``` + +5. **Commit the manifest bump** in git alongside the code that shipped in + this build. + +Registry details and one-time Gitea setup: [`ops-backend-deploy.md` §9](ops-backend-deploy.md#9-build-and-push-image-workflow). + +### 3.2 Update Cloudron + +```bash +cloudron update --app staging.communityrule.info \ + --image git.medlab.host/communityrule/community-rule:0.1.9 +``` + +Use `communityrule.info` for production. Cloudron pulls the image (no registry +credentials on the host), restarts the container, and runs +[`scripts/start.sh`](../../scripts/start.sh), which: + +1. `chown`s `/app/data` (localstorage mount), +2. runs **`prisma migrate deploy`**, +3. execs the Next.js standalone server. + +Watch the app **Logs** tab in the Cloudron dashboard for a clean migration and +`Listening on port 3000`. + +### 3.3 Migrations + +**Normal case:** migrations apply automatically on container start — no +separate step. + +**Manual re-run** (only if debugging a failed deploy or verifying before +traffic): + +```bash +cloudron exec --app staging.communityrule.info -- npm run db:deploy +``` + +(`npm run db:deploy` → `prisma migrate deploy`.) + +**Policy:** never run `prisma migrate reset` against staging or production. +Never edit migration files already applied to a shared database. Fix schema +drift by adding a **new** migration locally (`prisma migrate dev`) and +deploying a new image. See [`backend-roadmap.md` §8](backend-roadmap.md#8-prisma-migrations-policy). + +### 3.4 Seed data (not every deploy) + +Template + facet seed (`MethodFacet` rows for create-flow “Recommended” tags) +is **not** applied at boot. Run once per environment after first install, or +when recommendations return all-zero scores: + +```bash +cloudron exec --app staging.communityrule.info -- \ + node prisma/seed.bundle.cjs +``` + +Re-running is safe (idempotent upserts). JSON lives at `/app/seed-data/` in +the image — not under `/app/data` (Cloudron localstorage overwrites that +mount). + +### 3.5 Smoke after deploy + +**Automated** (from your laptop, repo root): + +```bash +./scripts/staging-smoke.sh staging.communityrule.info +# production: +./scripts/staging-smoke.sh communityrule.info + +# optional — exercises magic-link request (check inbox manually): +EMAIL=you@example.com ./scripts/staging-smoke.sh staging.communityrule.info +``` + +**Manual** (still required for full acceptance): + +- Click a magic link → signed in → `GET /api/auth/session` returns a user. +- Publish a rule end-to-end → public detail page loads. +- Optional: Save & Exit draft sync; upload with `UPLOAD_ROOT` set. + +Full checklist and failure table: [`ops-backend-deploy.md` §10](ops-backend-deploy.md). + +## 4. Roll back (code-only) + +To revert application code without touching the database: + +```bash +cloudron update --app staging.communityrule.info \ + --image git.medlab.host/communityrule/community-rule: +``` + +Pick a tag you know was healthy (previous manifest version or git tag recorded +at last good deploy). + +**Database implications:** + +- Rolling back the **image** does **not** undo migrations already applied. +- If the bad release added a migration, rolling back to an older image may + leave the DB schema **ahead** of what that code expects — usually safe if + the migration was additive (new nullable columns, new tables). +- If the bad release broke because of a **destructive or incompatible** + migration, do **not** reset production. Restore from a Cloudron backup + (§5) or fix forward with a corrective migration. + +**Never** `prisma migrate reset` on staging or production. + +## 5. Restore drill (quarterly) + +Verify Cloudron backups are restorable without touching the live app. + +**Cadence:** at least once per quarter, or after any backup-policy change. + +**Steps:** + +1. In the Cloudron dashboard, pick a recent automatic backup of + `` (*Backups* tab). +2. **Restore to a scratch location** — e.g. + `restore-drill-YYYYMMDD.communityrule.info` — not over the live app. +3. After restore completes, confirm the container starts and migrations are + current: + ```bash + curl -sS "https://restore-drill-YYYYMMDD.communityrule.info/api/health" + ``` + Expect `200` with `"database":"connected"`. +4. Optional: `cloudron exec --app restore-drill-YYYYMMDD.communityrule.info -- npm run db:deploy` + if logs show pending migrations on an older snapshot. +5. Run `./scripts/staging-smoke.sh restore-drill-YYYYMMDD.communityrule.info`. +6. **Uninstall** the scratch app when done. + +Record the drill date and outcome in your ops notes. Cloudron retains +automatic backups per platform defaults; confirm retention in the dashboard. + +## 6. Single-instance limitations + +The current Cloudron deploy runs **one container per environment**. Do not +scale to multiple app instances without addressing these per-process limits: + +### 6.1 In-memory rate limiter + +[`lib/server/rateLimit.ts`](../../lib/server/rateLimit.ts) stores windows in +process memory. Limits apply **per container**, not globally across instances. + +| Route / action | Key | Min interval | +| -------------- | --- | ------------ | +| Magic-link request | per email | 60 s | +| Magic-link request | per IP | 20 s | +| Email change request | per email / IP / user | 60 s | +| Organizer inquiry | per email / IP | 60 s / 20 s | +| Publish with stakeholder invites | per IP | 60 s | +| Stakeholder add / resend | per IP / invite | 60 s | +| File upload | per user | 5 s | + +Before horizontal scale-out, replace with a shared store (e.g. Redis) or edge +rate limits. See [`backend-roadmap.md` §5](backend-roadmap.md#5-session-and-authentication-v1). + +### 6.2 Web vitals storage + +Production defaults to **`external`** mode: vitals are structured log lines, not +written to Postgres or local files. Setting `WEB_VITALS_STORAGE=local` uses a +**per-process** file store under `.next/web-vitals` — suitable for dev/admin +only, not multi-instance. See [`backend-roadmap.md` §7](backend-roadmap.md#7-api-responses-errors-and-observability). + +## 7. Environment variables (steady-state) + +Cloudron **auto-injects** addon vars (`CLOUDRON_POSTGRESQL_URL`, +`CLOUDRON_MAIL_SMTP_*`). Operators set these manually once per app; they +persist across image updates unless changed: + +| Variable | Purpose | +| -------- | ------- | +| `SESSION_SECRET` | Session cookie signing (≥ 16 chars). Rotating logs everyone out. | +| `SMTP_FROM` | Visible From on sign-in emails (e.g. `Community Rule `). | +| `NEXT_PUBLIC_ENABLE_BACKEND_SYNC` | `true` in staging/production — Postgres draft persistence. | +| `UPLOAD_ROOT` | `/app/data/uploads` on Cloudron — required for file uploads. | + +Full detail: [`ops-backend-deploy.md` §3](ops-backend-deploy.md#3-environment-variables). + +## 8. Troubleshooting + +| Symptom | Likely cause | Action | +| ------- | ------------ | ------ | +| Image pull error on update | Private repo, wrong tag, or amd64 manifest missing | Confirm repo is public; verify pull with `--platform linux/amd64` (§3.1) | +| Health `503` / `database: disconnected` | Postgres addon or `CLOUDRON_POSTGRESQL_URL` missing | Cloudron app → Environment | +| Container crash on start | Migration failure | App logs around `prisma migrate deploy`; fix forward with new migration | +| Magic link not sent | Mail addon or `SMTP_FROM` | Cloudron mail logs; `CLOUDRON_MAIL_SMTP_*` vars | +| Upload `server_misconfigured` | `UPLOAD_ROOT` unset | `cloudron env set --app UPLOAD_ROOT=/app/data/uploads` | +| No “Recommended” on method cards | Seed not run | §3.4 — `node prisma/seed.bundle.cjs` | +| Rate limit too aggressive after deploy | Expected per §6.1 | Single instance only; limits reset on container restart | + +App logs: Cloudron dashboard → *Logs* tab, or `cloudron logs --app -f`. + +## 9. Related docs + +- [`ops-backend-deploy.md`](ops-backend-deploy.md) — first install, cutover + plan, legacy rules archive, build/push deep dive. +- [`backend-roadmap.md`](backend-roadmap.md) — migrations policy (§8), + rate limiting (§5), environments (§11). +- [`../relaunch-brief.md`](../relaunch-brief.md) — plain-language summary + for MEDLab admin. +- [`../../CONTRIBUTING.md`](../../CONTRIBUTING.md) — local dev setup. diff --git a/docs/relaunch-brief.md b/docs/relaunch-brief.md index 96355ad..b86965d 100644 --- a/docs/relaunch-brief.md +++ b/docs/relaunch-brief.md @@ -23,7 +23,7 @@ All three retire together when the new app goes live. The chatbot is **not** bei ## What does NOT carry over - **No user accounts.** New sign-ins start fresh. -- **No published rules from the old database.** We'll count the existing `rules` table before cutover and decide whether to publish a read-only archive (CSV/JSON) somewhere for anyone looking for their old work. +- **No published rules from the old database.** Pre-cutover rules are exported to a read-only Gitea archive (`CommunityRule/legacy-rules-archive` on `git.medlab.host`); they are not imported into the new app. See [`docs/guides/ops-backend-deploy.md`](guides/ops-backend-deploy.md) §6.1. - **No chatbot.** ## How the cutover will work @@ -35,7 +35,7 @@ until the new one is verified. `staging.communityrule.info` (auto-provisioned by Cloudron). Legacy app at the apex is not touched. Quiet testing within MEDLab/stakeholders. 2. **Cutover phase.** When staging is green and we're ready, schedule a low-traffic window. During the window (roughly 5–15 minutes of apex downtime): - Take a final backup of the legacy app (Cloudron one-click). - - Pull a copy of the legacy `rules` table if we decided to publish an archive. + - Export the legacy `rules` + `version_history` tables to the Gitea archive (see ops-backend-deploy §6.1). - Uninstall the legacy app at the apex `communityrule.info`. - Move the new app to the apex. - Smoke-test, confirm backups are on, done. @@ -53,6 +53,7 @@ Roughly this order: 3. **Install at staging** subdomain, smoke test, soft launch (CR-98). 4. **Apex cutover window** — the brief downtime above. 5. **Uninstall legacy**, archive legacy repos. -6. **Write the steady-state runbook** based on what actually worked. +6. ~~**Write the steady-state runbook** based on what actually worked + ([`ops-runbook.md`](guides/ops-runbook.md), CR-100).~~ **Done.** Staging should be ready to deploy in 1-2 weeks, and we can go from there.