Files
2026-05-23 19:35:38 -06:00

453 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Backend deploy — admin handoff + cutover plan
This doc captures everything needed to deploy the new CommunityRule
(Next.js + Postgres) onto MEDLab's Cloudron and replace the legacy
LAMP-packaged service at `communityrule.info`. Cloudron admin access
has been granted, CR-96 (Cloudron-native env vars) and CR-97 (container
registry + first image push) are done; the remaining gate is
[CR-98](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke)
(staging install + smoke — §10).
> **For a plain-language summary to hand to MEDLab's Cloudron admin,
> see [`../relaunch-brief.md`](../relaunch-brief.md).** This doc is the
> technical version.
## 1. Context
- This app **fully replaces** the existing `communityrule.info`
service — both the marketing site and the backend API.
- The existing service is a single Cloudron **LAMP** app
(`lamp.cloudronapp.php74@5.1.2`, installed at the
`communityrule.info` apex, 512 MiB) that hosts three things stuffed
into one container under `/app/data/public/`:
1. The static **marketing site** (HTML / CSS / images).
2. The **Express/MySQL backend** at
[`CommunityRule/CommunityRuleBackend`](https://git.medlab.host/CommunityRule/CommunityRuleBackend),
kept alive by a 30-min `lsof`-based `run.sh` watchdog on port
3000. MySQL is the LAMP package's bundled MySQL, persisted
inside `/app/data` (not a Cloudron addon).
3. A **Flask chatbot** at
[`CommunityRule/CommunityRuleChatBot`](https://git.medlab.host/CommunityRule/CommunityRuleChatBot)
on port 5000, also watchdog-supervised; currently crash-looping
with `ModuleNotFoundError: No module named 'flask'` and last
touched in May 2024. **Not migrated.** Dies with the LAMP
container at decommission.
- The new app is a **properly packaged Cloudron app** (Docker image +
`CloudronManifest.json`, postgresql + sendmail + localstorage
addons). Cloudron's container supervisor replaces the watchdog.
- **Greenfield Postgres.** No data migration from the LAMP container's
internal MySQL. Old auth (4-digit OTP in `email_otp`) is replaced
by hashed magic-link tokens. Old API and `rules` /
`version_history` tables do not map to anything in the new app.
## 2. Access — granted
Cloudron admin login on `my.medlab.host` granted (note: this is the
**Cloudron dashboard**, not `cloud.medlab.host`, which is MEDLab's
Nextcloud file portal). From the dashboard the deployer can self-serve:
- [x] **Cloudron admin login** (full admin on the MEDLab instance).
- [x] **DNS for `communityrule.info`** — domain is managed inside
Cloudron, so new subdomains and TLS certs are one-click.
- [x] **App log access** — Cloudron web log viewer.
- [x] **Read of legacy app config** — visible in admin UI.
- [ ] **`cloudron` CLI token** — generate at *Profile → API Tokens*
before first install. Save in 1Password.
## 3. Environment variables
### Cloudron auto-injects (provisioned by addons declared in `CloudronManifest.json`)
Cloudron addons are not "enabled" platform-wide; they are requested
per-app in the manifest and provisioned at install time.
- `CLOUDRON_POSTGRESQL_URL` — from the **postgresql** addon. The app
reads this name directly (Prisma + [`lib/server/env.ts`](../../lib/server/env.ts)).
- `CLOUDRON_MAIL_SMTP_SERVER` / `_PORT` / `_USERNAME` / `_PASSWORD`
from the **sendmail** addon. The platform Mail server is configured
for `communityrule.info` with **Amazon SES relay** + "allow custom
from address" on, so `SMTP_FROM` of our choice will deliver. The app
assembles a Nodemailer transport URL from these four vars in
[`lib/server/env.ts`](../../lib/server/env.ts).
### I set manually via `cloudron env set --app <id/location>`
- `SESSION_SECRET` — long random (`openssl rand -hex 32`). Required,
≥ 16 chars. Rotating it logs everyone out.
- `SMTP_FROM` — visible "From:" address on sign-in emails. Cloudron
does not inject this. Use `hello@communityrule.info` (continuity
with the legacy service; SES relay accepts it).
- `NEXT_PUBLIC_ENABLE_BACKEND_SYNC=true` — turns on Postgres draft
persistence for signed-in users. Required in production.
- `UPLOAD_ROOT` — absolute path to a writable directory on the Cloudron
**localstorage** mount for `POST /api/uploads` (community photo +
custom-method attachments). Use **`/app/data/uploads`** on Cloudron
(`start.sh` chowns `/app/data` for the `node` user). When unset, upload
routes return `server_misconfigured`. See [CONTRIBUTING.md](../../CONTRIBUTING.md)
API table.
## 4. Platform settings
- Container `httpPort`: **3000** (matches [`Dockerfile`](../../Dockerfile)
`ENV PORT=3000`).
- Health-check path: **`/api/health`**
([`app/api/health/route.ts`](../../app/api/health/route.ts) returns
`200 {"ok":true,"database":"connected"}` when healthy, `503`
otherwise).
- Memory limit: **768 MiB** in
[`CloudronManifest.json`](../../CloudronManifest.json) (`memoryLimit:
805306368`). The legacy LAMP app ran at 512 MiB; raise further only if
Next.js standalone OOMs under load.
- Backups: Cloudron's automatic backups are already on for the host
(legacy app shows weekly snapshots ~451 MB each). Same default
applies to new apps.
- TLS / DNS / SPF / DKIM: handled by Cloudron for any subdomain of
`communityrule.info`.
## 5. Cutover plan (side-by-side, never in-place)
The legacy app is at the apex `communityrule.info` and is still
serving real traffic. Best practice is **side-by-side cutover** — new
app gets validated at a fresh subdomain before any swap touches the
apex.
### Phases
1. **Staging install** — from a checkout whose
[`CloudronManifest.json`](../../CloudronManifest.json) `version` matches the
pushed image tag, run:
```bash
cloudron install --location staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:<tag>
```
Set manual env vars from §3. `prisma migrate deploy` runs automatically in
[`scripts/start.sh`](../../scripts/start.sh) on container start. Smoke per
[CR-98](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke)
(§12).
2. **Soft launch / acceptance** — share the staging URL with a small
group, exercise sign-in + publish + draft sync end-to-end. Hold
here until confident.
3. **Apex cutover at a scheduled low-traffic window** — this is the
only step with brief downtime (~515 min). Sequence:
1. Take one final manual backup of the legacy LAMP app (Cloudron
*Backups* tab → *Backup now*).
2. `cloudron uninstall` the legacy app at `communityrule.info`.
3. `cloudron configure --location communityrule.info` to move the
validated staging install to the apex (or `cloudron install`
fresh at apex if cleaner).
4. Re-run `prisma migrate deploy`, re-set production env vars if
not preserved by the move, smoke again.
4. **Decommission** — see [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-expressmysql-backend).
Hold the final LAMP backup ≥ 90 days for safety.
### Why not in-place?
Uninstalling the legacy app and installing the new one at apex
without a staging step means the live site is down for the entire
duration of the first install — and the first install is exactly when
all the env-var / addon / port surprises happen. Side-by-side keeps
those surprises out of view.
## 6. Decisions — status
Product decisions (closed):
1. **Final URL — `communityrule.info` apex.** New app fully replaces
the legacy site, including the marketing surface. Brief cutover
downtime (~515 min) is accepted.
2. **Legacy `rules` data — not migrated.** No data moves into the new
app's Postgres. A pre-cutover **read-only export** of the
`rules` + `version_history` MySQL tables is under consideration;
approach depends on the actual row count, which we'll pull as
part of the CR-99 pre-cutover backup. Tracked in
[CR-102](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export).
Infra decision closed:
3. **Container registry — Gitea Container Registry on `git.medlab.host`.**
Same host as Cloudron (`193.46.198.90`). The
[`CommunityRule/community-rule`](https://git.medlab.host/CommunityRule/community-rule)
repo must be **public** so the container package inherits public visibility
(Gitea does not expose per-package visibility toggles — visibility follows
the owning repo). Public pull sidesteps the [same-host docker-login
"socket hangup"
bug](https://forum.cloudron.io/topic/14572/private-docker-registry-in-cloudron),
so Cloudron pulls without credentials. Push auth from operator laptops uses
a Gitea personal access token (`read:package` + `write:package`). Canonical
image ref: `git.medlab.host/communityrule/community-rule:<tag>`. Images are
built **`linux/amd64` only** (Cloudron host is x86_64). Operator build/push
workflow lives in [§9](#9-build-and-push-image-workflow). First verified
image: `…:0.1.0` (digest
`sha256:e652f9f4bfa4154412cc9d8b63d55c94a128e8935579d101b5ab8977e2080e52`).
Tracked in [CR-97](https://linear.app/community-rule/issue/CR-97/backend-container-image-registry-choose-build-push)
(**Done**). Fallback if same-host pull ever breaks: install the **Cloudron
Container Registry** app and re-tag against its hostname; no other changes
required.
## 7. Old vs new deltas
So nothing surprises anyone at cutover:
- Legacy is a **LAMP package** with bundled MySQL inside the
container. New app uses the Cloudron **postgresql + sendmail +
localstorage** addons — entirely different storage, no shared
state.
- Legacy stuffs three apps (marketing + Node backend + Python
chatbot) into one container with a `run.sh` watchdog. New app is
one Next.js process, supervised by Cloudron natively.
- Old auth = plaintext 4-digit OTP. New auth = hashed magic **link**
in email. If users report "I'm not getting a code," remind them to
look for a link instead.
- Old code hardcoded `from: 'hello@communityrule.info'` in
[`controllers/emailController.js`](https://git.medlab.host/CommunityRule/CommunityRuleBackend/raw/branch/master/controllers/emailController.js)
because Cloudron does not inject a `MAIL_FROM`. New app reads
`SMTP_FROM` — see §3.
- Old API surface (`/api/send_otp`, `/api/publish_rule`, etc.) and
schema (`rules` + `version_history` tables, soft-delete via
`deleted` column) **do not overlap** with the new app. No data
migration.
- The Flask chatbot at
[`CommunityRule/CommunityRuleChatBot`](https://git.medlab.host/CommunityRule/CommunityRuleChatBot)
is currently crash-looping inside the LAMP container and is **not
being migrated** — confirmed with admin. It dies when the LAMP
container is uninstalled in [CR-101](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-expressmysql-backend).
## 8. Follow-up tickets
All filed in Linear, titled `[Backend] …`, assigned to me, in the
**Community-rule** team, **Backlog** state.
1. [**CR-96**](https://linear.app/community-rule/issue/CR-96/backend-bridge-cloudron-env-vars-to-canonical-names)
— `[Backend] Cloudron-native env vars` (**Done** — app reads
`CLOUDRON_POSTGRESQL_URL` and `CLOUDRON_MAIL_SMTP_*` only).
2. [**CR-97**](https://linear.app/community-rule/issue/CR-97/backend-container-image-registry-choose-build-push)
— `[Backend] Container image registry: choose, build, push` (**Done**).
Registry decided (§6.3); packaging + build/push workflow shipped (§9).
First image pushed and verified via anonymous `docker pull` (§9).
3. [**CR-98**](https://linear.app/community-rule/issue/CR-98/backend-cloudron-staging-install-smoke)
— `[Backend] Cloudron staging install + smoke` at
`staging.communityrule.info`. **Next** — checklist in §10. Requires
Cloudron CLI token (§2) only; CR-96 and CR-97 are done.
4. [**CR-99**](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover)
— `[Backend] Cloudron production install + apex cutover`.
Side-by-side cutover at scheduled low-traffic window per §5.
Blocked by CR-98 green + CR-102 resolved.
5. [**CR-100**](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook)
— `[Backend] Steady-state operator runbook`. Blocked by CR-98
(write what we actually did).
6. [**CR-101**](https://linear.app/community-rule/issue/CR-101/backend-decommission-legacy-communityrule-lamp-app)
— `[Backend] Decommission legacy CommunityRule LAMP app`.
Uninstall the entire LAMP slot (marketing + Express backend +
chatbot in one go); preserve final backup ≥ 90 days. Blocked by
CR-99 + sign-off window. Priority: Low.
7. [**CR-102**](https://linear.app/community-rule/issue/CR-102/backend-decide-fate-of-legacy-rules-table-read-only-export)
— `[Backend] Decide fate of legacy rules table (read-only export?)`.
Count rows + decide whether to publish a static archive before
CR-99 uninstalls the legacy MySQL. Priority: Low.
## 9. Build and push image workflow
The repo is packaged as a Cloudron app via
[`CloudronManifest.json`](../../CloudronManifest.json),
[`Dockerfile`](../../Dockerfile),
[`scripts/start.sh`](../../scripts/start.sh), and
[`scripts/docker-release.sh`](../../scripts/docker-release.sh). The
manifest declares `httpPort 3000`, `healthCheckPath /api/health`,
`memoryLimit 768 MiB`, `minBoxVersion 9.0.0`, and the
`postgresql + sendmail + localstorage` addons. The Dockerfile reuses
the base image's `node` user (uid 1000), installs `gosu` for the
privilege drop, and symlinks `.next/cache → /tmp/next-cache` so
Next.js ISR works on Cloudron's read-only rootfs. `start.sh` runs as
root to chown `/app/data` (localstorage mount), then drops to
`node:node`, applies `prisma migrate deploy`, and execs the Next.js
standalone server.
### One-time setup (per operator)
1. **Generate a Gitea PAT.** In Gitea web UI: avatar → Settings →
Applications → Manage Access Tokens → Generate New Token. Check
`read:package` and `write:package`. Save in 1Password.
2. **`docker login git.medlab.host`** with your Gitea username and the
PAT as password. Expect `Login Succeeded`.
3. Confirm you have package-write rights on the `CommunityRule` org
(you do if you can push commits to the repo).
### Per-release workflow
1. **Bump the manifest version.** Edit
[`CloudronManifest.json`](../../CloudronManifest.json):
- increment `version` (e.g. `0.1.0` → `0.1.1`) — Cloudron requires
it to **increase** for `cloudron update --image` to be accepted.
2. **Run the release script** from the repo root:
```bash
./scripts/docker-release.sh
# or, equivalently:
npm run docker:release
```
Override the tag with `TAG=0.1.1 ./scripts/docker-release.sh` for
semver releases. The script prints the exact `cloudron install` /
`cloudron update --image …` commands to run next.
3. **First push only:** confirm the
[`CommunityRule/community-rule`](https://git.medlab.host/CommunityRule/community-rule)
repo is **Public** (Settings → General). Gitea inherits container-package
visibility from the repo — there is no per-package visibility toggle. Org
owners are not required if you have repo-admin rights on this repo.
4. **Verify the pull works without credentials** (simulates Cloudron's
anonymous pull):
```bash
docker logout git.medlab.host
# Image is linux/amd64 only. On Apple Silicon, add --platform:
docker pull --platform linux/amd64 git.medlab.host/communityrule/community-rule:<tag>
```
A bare `docker pull` on arm64 Macs fails with "no matching manifest for
linux/arm64" — that is expected and does **not** indicate an auth problem.
Cloudron (x86_64) pulls the amd64 manifest without `--platform`.
5. **Commit the manifest change** alongside any code changes that
shipped in this build, so the manifest and image stay in lockstep.
### Install / update on Cloudron
From the repo dir on the operator's machine, with `cloudron` CLI
logged in to `my.medlab.host`:
```bash
# First install (staging):
cloudron install --location staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:<tag>
# Subsequent updates:
cloudron update --app staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:<tag>
```
Pass the registry image with `--image`; it is not a field in
[`CloudronManifest.json`](../../CloudronManifest.json).
### CI — deferred (stretch goal)
CR-97 acceptance lists a stretch goal of building and pushing on merge
to `main` via Gitea Actions. Deferred: no hosted runners are available
today, and the manual workflow above is acceptable for v1 staging and
production. Revisit when runners return or when release cadence
justifies the runner cost.
## 10. Staging install + smoke (CR-98)
**Goal:** Install the pushed image at `staging.communityrule.info`, configure
production env vars, and verify the vertical slice before apex cutover
([CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover)).
**Prerequisites (all satisfied unless noted):**
- [x] **CR-96** — app reads `CLOUDRON_POSTGRESQL_URL` and
`CLOUDRON_MAIL_SMTP_*` only ([`lib/server/env.ts`](../../lib/server/env.ts),
[`prisma/schema.prisma`](../../prisma/schema.prisma)). No `DATABASE_URL` /
`SMTP_URL` shim.
- [x] **CR-97** — image pushed to
`git.medlab.host/communityrule/community-rule:0.1.0` (or current tag in
manifest); repo is **public**; anonymous amd64 pull verified (§9).
- [ ] **Cloudron CLI token** — generate at *Profile → API Tokens* on
`my.medlab.host`; save in 1Password (§2).
- [x] **Cloudron admin login** on `my.medlab.host` (§2).
- [x] **DNS** — `communityrule.info` managed in Cloudron; staging subdomain
will be provisioned at install time.
**Install steps:**
1. **Checkout** a commit whose [`CloudronManifest.json`](../../CloudronManifest.json)
`version` and `memoryLimit` match the image you intend to run (currently
`0.1.1` → `git.medlab.host/communityrule/community-rule:0.1.1`).
2. **Log in to Cloudron CLI:**
```bash
cloudron login my.medlab.host
```
3. **Install or update** from the repo root (manifest is read for addons;
image comes from `--image`):
```bash
cloudron update --app staging.communityrule.info \
--image git.medlab.host/communityrule/community-rule:0.1.1
```
(Use `cloudron install --location … --image …` only if staging is not
already installed.) Cloudron provisions **postgresql**, **sendmail**, and
**localstorage** addons from the manifest, pulls the image (no registry
credentials needed), and starts the container. `scripts/start.sh` chowns
`/app/data`, runs `prisma migrate deploy`, then execs the Next.js server.
4. **Set manual env vars** (Cloudron does not inject these):
```bash
cloudron env set --app staging.communityrule.info \
SESSION_SECRET="$(openssl rand -hex 32)" \
SMTP_FROM="Community Rule <hello@communityrule.info>" \
NEXT_PUBLIC_ENABLE_BACKEND_SYNC=true \
UPLOAD_ROOT=/app/data/uploads
```
Rotating `SESSION_SECRET` logs everyone out. `SMTP_FROM` must be an address
SES accepts on `communityrule.info` (platform mail addon is SES-relayed with
custom-from allowed — §3).
5. **Confirm the app is running** in the Cloudron dashboard (Logs tab). Look
for a clean `prisma migrate deploy` and Next.js listening on port 3000.
6. **Seed facet data (one-time per environment)** — templates + `MethodFacet`
rows for create-flow "Recommended" tags are **not** applied at boot. After
first install (or when recommendations return all-zero scores), run:
```bash
cloudron exec --app staging.communityrule.info -- \
node prisma/seed.bundle.cjs
```
JSON lives at `/app/seed-data/` (`SEED_DATA_DIR`); do not use `/app/data`
(Cloudron localstorage overwrites it). Re-run after deploy is safe
(idempotent upserts / per-section swaps).
**Smoke checklist (acceptance):**
Automated curl checks: `./scripts/staging-smoke.sh staging.communityrule.info`
(optional `EMAIL=you@example.com` to exercise magic-link request). Manual UI
steps below are still required.
- [ ] **Health:** `curl -sS https://staging.communityrule.info/api/health`
returns `200` with `{"ok":true,"database":"connected"}`.
- [ ] **Magic link:** request sign-in from the UI → email arrives at a real
inbox → click link → land signed in →
`GET /api/auth/session` returns a user. Confirm the link host matches
`staging.communityrule.info` (reverse proxy / `Host` alignment).
- [ ] **Publish:** complete create flow → publish a rule → public rule detail
loads.
- [ ] **Draft sync (optional):** signed-in Save & Exit persists to Postgres;
resume works after re-login.
- [ ] **Upload (optional):** with `UPLOAD_ROOT` set, attach a community photo
in create flow and confirm it renders after publish.
**If something fails:**
| Symptom | Likely cause | Check |
| ------- | ------------ | ----- |
| Image pull error on install | Repo still private, or wrong tag in manifest | §6.3; `docker pull --platform linux/amd64 …` from laptop |
| Health `503` / `database: disconnected` | Postgres addon not provisioned or URL missing | Cloudron app → Environment; expect `CLOUDRON_POSTGRESQL_URL` |
| Magic link not sent | Mail addon or `SMTP_FROM` | Cloudron mail logs; `CLOUDRON_MAIL_SMTP_*` vars |
| Upload `server_misconfigured` | `UPLOAD_ROOT` unset | Set to `/app/data/uploads` (§3) |
| Container crash on start | Migration failure | App logs around `prisma migrate deploy` |
| No "Recommended" on method cards | `MethodFacet` not seeded | §10 step 6; API should return `matches.score > 0` for some methods when `facet.*` set |
| `seed.bundle.cjs` ENOENT on `/app/data/...` | Old image without `/app/seed-data` | Deploy ≥ 0.1.8; JSON is at `SEED_DATA_DIR=/app/seed-data` |
**Done when:** all smoke checklist items pass. Then proceed to soft-launch
(§5 phase 2) and, when ready, [CR-99](https://linear.app/community-rule/issue/CR-99/backend-cloudron-production-install-apex-cutover)
apex cutover.
## 11. Rate limiting (single-instance deploys)
The app uses an **in-memory** rate limiter in [`lib/server/rateLimit.ts`](../../lib/server/rateLimit.ts) (magic-link requests, organizer inquiry, etc.). This is sufficient for the current **single Cloudron container** per environment.
**Before horizontal scale-out** (multiple app instances behind a load balancer), replace or back the limiter with a shared store (e.g. Redis) so per-IP / per-user windows apply across instances. Until then, document expected limits in the steady-state runbook ([CR-100](https://linear.app/community-rule/issue/CR-100/backend-steady-state-operator-runbook)).
## 12. Related docs
- [`docs/guides/backend-roadmap.md`](backend-roadmap.md) §11
(environments) and §8 (Prisma migrations policy).
- [`docs/guides/backend-linear-tickets.md`](backend-linear-tickets.md)
Ticket 12 / CR-83 — this doc satisfies it.
- [`CONTRIBUTING.md`](../../CONTRIBUTING.md) — local dev setup
(Postgres, magic-link, draft sync).