Files
community-rule/docs/guides/ops-runbook.md
T
2026-05-24 16:00:11 -06:00

275 lines
11 KiB
Markdown

# Steady-state operator runbook
Day-to-day deploy, rollback, and recovery for CommunityRule on MEDLab
Cloudron. Assumes staging or production is already installed and smoke-tested.
> **First-time install, apex cutover, and legacy decommission** live in
> [`ops-backend-deploy.md`](ops-backend-deploy.md). Use this doc once an
> environment is already running.
## 1. Quick reference
| Item | Value |
| ---- | ----- |
| Cloudron dashboard | `https://my.medlab.host` |
| Cloudron CLI login | `cloudron login my.medlab.host` |
| Staging app | `staging.communityrule.info` |
| Production app | `communityrule.info` (after apex cutover) |
| Container image | `git.medlab.host/communityrule/community-rule:<tag>` |
| Health check | `GET /api/health``200 {"ok":true,"database":"connected"}` |
| Manifest version | [`CloudronManifest.json`](../../CloudronManifest.json) `version` field (must increase for each release) |
| Current manifest | `0.1.8` at time of writing — always read the file before deploying |
Replace `<app>` below with the Cloudron location (`staging.communityrule.info`
or `communityrule.info`).
## 2. Prerequisites (one-time per operator)
1. **Cloudron access** — admin login on `my.medlab.host` and a CLI API token
(*Profile → API Tokens* on the dashboard). Save the token in 1Password.
2. **Cloudron CLI** — logged in:
```bash
cloudron login my.medlab.host
```
3. **Docker + buildx** — Docker Desktop or equivalent with `docker buildx`.
4. **Gitea registry auth** — personal access token on `git.medlab.host` with
`read:package` + `write:package`; then:
```bash
docker login git.medlab.host
```
5. **Repo checkout** — clone
[`CommunityRule/community-rule`](https://git.medlab.host/CommunityRule/community-rule)
and work from a clean commit that matches the release you intend to ship.
Images are **`linux/amd64` only** (Cloudron host is x86_64). On Apple
Silicon, the release script still builds amd64 via buildx; a bare
`docker pull` without `--platform linux/amd64` failing on arm64 is expected.
## 3. Deploy a new version
Typical release flow: bump manifest → build/push image → `cloudron update`
→ smoke.
### 3.1 Build and push
1. Check out the commit to release (`main` or a release branch).
2. **Bump** [`CloudronManifest.json`](../../CloudronManifest.json) `version`
(e.g. `0.1.8` → `0.1.9`). Cloudron requires the manifest version to
**increase** for `cloudron update --image` to be accepted.
3. From the repo root, build and push (tag should match the manifest version
for sanity):
```bash
TAG=0.1.9 ./scripts/docker-release.sh
# equivalent:
TAG=0.1.9 npm run docker:release
```
Omit `TAG=` to push `git rev-parse --short HEAD` instead — only do that
for ad-hoc staging experiments; production releases should use semver tags
aligned with the manifest.
4. **Verify anonymous pull** (simulates Cloudron):
```bash
docker logout git.medlab.host
docker pull --platform linux/amd64 \
git.medlab.host/communityrule/community-rule:0.1.9
```
5. **Commit the manifest bump** in git alongside the code that shipped in
this build.
Registry details and one-time Gitea setup: [`ops-backend-deploy.md` §9](ops-backend-deploy.md#9-build-and-push-image-workflow).
### 3.2 Update Cloudron
```bash
cloudron update --app staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:0.1.9
```
Use `communityrule.info` for production. Cloudron pulls the image (no registry
credentials on the host), restarts the container, and runs
[`scripts/start.sh`](../../scripts/start.sh), which:
1. `chown`s `/app/data` (localstorage mount),
2. runs **`prisma migrate deploy`**,
3. execs the Next.js standalone server.
Watch the app **Logs** tab in the Cloudron dashboard for a clean migration and
`Listening on port 3000`.
### 3.3 Migrations
**Normal case:** migrations apply automatically on container start — no
separate step.
**Manual re-run** (only if debugging a failed deploy or verifying before
traffic):
```bash
cloudron exec --app staging.communityrule.info -- npm run db:deploy
```
(`npm run db:deploy` → `prisma migrate deploy`.)
**Policy:** never run `prisma migrate reset` against staging or production.
Never edit migration files already applied to a shared database. Fix schema
drift by adding a **new** migration locally (`prisma migrate dev`) and
deploying a new image. See [`backend-roadmap.md` §8](backend-roadmap.md#8-prisma-migrations-policy).
### 3.4 Seed data (not every deploy)
Template + facet seed (`MethodFacet` rows for create-flow “Recommended” tags)
is **not** applied at boot. Run once per environment after first install, or
when recommendations return all-zero scores:
```bash
cloudron exec --app staging.communityrule.info -- \
node prisma/seed.bundle.cjs
```
Re-running is safe (idempotent upserts). JSON lives at `/app/seed-data/` in
the image — not under `/app/data` (Cloudron localstorage overwrites that
mount).
### 3.5 Smoke after deploy
**Automated** (from your laptop, repo root):
```bash
./scripts/staging-smoke.sh staging.communityrule.info
# production:
./scripts/staging-smoke.sh communityrule.info
# optional — exercises magic-link request (check inbox manually):
EMAIL=you@example.com ./scripts/staging-smoke.sh staging.communityrule.info
```
**Manual** (still required for full acceptance):
- Click a magic link → signed in → `GET /api/auth/session` returns a user.
- Publish a rule end-to-end → public detail page loads.
- Optional: Save & Exit draft sync; upload with `UPLOAD_ROOT` set.
Full checklist and failure table: [`ops-backend-deploy.md` §10](ops-backend-deploy.md).
## 4. Roll back (code-only)
To revert application code without touching the database:
```bash
cloudron update --app staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:<previous-tag>
```
Pick a tag you know was healthy (previous manifest version or git tag recorded
at last good deploy).
**Database implications:**
- Rolling back the **image** does **not** undo migrations already applied.
- If the bad release added a migration, rolling back to an older image may
leave the DB schema **ahead** of what that code expects — usually safe if
the migration was additive (new nullable columns, new tables).
- If the bad release broke because of a **destructive or incompatible**
migration, do **not** reset production. Restore from a Cloudron backup
(§5) or fix forward with a corrective migration.
**Never** `prisma migrate reset` on staging or production.
## 5. Restore drill (quarterly)
Verify Cloudron backups are restorable without touching the live app.
**Cadence:** at least once per quarter, or after any backup-policy change.
**Steps:**
1. In the Cloudron dashboard, pick a recent automatic backup of
`<app>` (*Backups* tab).
2. **Restore to a scratch location** — e.g.
`restore-drill-YYYYMMDD.communityrule.info` — not over the live app.
3. After restore completes, confirm the container starts and migrations are
current:
```bash
curl -sS "https://restore-drill-YYYYMMDD.communityrule.info/api/health"
```
Expect `200` with `"database":"connected"`.
4. Optional: `cloudron exec --app restore-drill-YYYYMMDD.communityrule.info -- npm run db:deploy`
if logs show pending migrations on an older snapshot.
5. Run `./scripts/staging-smoke.sh restore-drill-YYYYMMDD.communityrule.info`.
6. **Uninstall** the scratch app when done.
Record the drill date and outcome in your ops notes. Cloudron retains
automatic backups per platform defaults; confirm retention in the dashboard.
## 6. Single-instance limitations
The current Cloudron deploy runs **one container per environment**. Do not
scale to multiple app instances without addressing these per-process limits:
### 6.1 In-memory rate limiter
[`lib/server/rateLimit.ts`](../../lib/server/rateLimit.ts) stores windows in
process memory. Limits apply **per container**, not globally across instances.
| Route / action | Key | Min interval |
| -------------- | --- | ------------ |
| Magic-link request | per email | 60 s |
| Magic-link request | per IP | 20 s |
| Email change request | per email / IP / user | 60 s |
| Organizer inquiry | per email / IP | 60 s / 20 s |
| Publish with stakeholder invites | per IP | 60 s |
| Stakeholder add / resend | per IP / invite | 60 s |
| File upload | per user | 5 s |
Before horizontal scale-out, replace with a shared store (e.g. Redis) or edge
rate limits. See [`backend-roadmap.md` §5](backend-roadmap.md#5-session-and-authentication-v1).
### 6.2 Web vitals storage
Production defaults to **`external`** mode: vitals are structured log lines, not
written to Postgres or local files. Setting `WEB_VITALS_STORAGE=local` uses a
**per-process** file store under `.next/web-vitals` — suitable for dev/admin
only, not multi-instance. See [`backend-roadmap.md` §7](backend-roadmap.md#7-api-responses-errors-and-observability).
## 7. Environment variables (steady-state)
Cloudron **auto-injects** addon vars (`CLOUDRON_POSTGRESQL_URL`,
`CLOUDRON_MAIL_SMTP_*`). Operators set these manually once per app; they
persist across image updates unless changed:
| Variable | Purpose |
| -------- | ------- |
| `SESSION_SECRET` | Session cookie signing (≥ 16 chars). Rotating logs everyone out. |
| `SMTP_FROM` | Visible From on sign-in emails (e.g. `Community Rule <hello@communityrule.info>`). |
| `NEXT_PUBLIC_ENABLE_BACKEND_SYNC` | `true` in staging/production — Postgres draft persistence. |
| `UPLOAD_ROOT` | `/app/data/uploads` on Cloudron — required for file uploads. |
Full detail: [`ops-backend-deploy.md` §3](ops-backend-deploy.md#3-environment-variables).
## 8. Troubleshooting
| Symptom | Likely cause | Action |
| ------- | ------------ | ------ |
| Image pull error on update | Private repo, wrong tag, or amd64 manifest missing | Confirm repo is public; verify pull with `--platform linux/amd64` (§3.1) |
| Health `503` / `database: disconnected` | Postgres addon or `CLOUDRON_POSTGRESQL_URL` missing | Cloudron app → Environment |
| Container crash on start | Migration failure | App logs around `prisma migrate deploy`; fix forward with new migration |
| Magic link not sent | Mail addon or `SMTP_FROM` | Cloudron mail logs; `CLOUDRON_MAIL_SMTP_*` vars |
| Upload `server_misconfigured` | `UPLOAD_ROOT` unset | `cloudron env set --app <app> UPLOAD_ROOT=/app/data/uploads` |
| No “Recommended” on method cards | Seed not run | §3.4 — `node prisma/seed.bundle.cjs` |
| Rate limit too aggressive after deploy | Expected per §6.1 | Single instance only; limits reset on container restart |
App logs: Cloudron dashboard → *Logs* tab, or `cloudron logs --app <app> -f`.
## 9. Related docs
- [`ops-backend-deploy.md`](ops-backend-deploy.md) — first install, cutover
plan, legacy rules archive, build/push deep dive.
- [`backend-roadmap.md`](backend-roadmap.md) — migrations policy (§8),
rate limiting (§5), environments (§11).
- [`../relaunch-brief.md`](../relaunch-brief.md) — plain-language summary
for MEDLab admin.
- [`../../CONTRIBUTING.md`](../../CONTRIBUTING.md) — local dev setup.