Maskinporten Key Rotation
Operator runbook for rotating the production Maskinporten signing keypair (MASKINPORTEN_KID + MASKINPORTEN_PRIVATE_KEY).
[Cite this as: Apier.no Docs v0.1.0 — last updated 2026-05-19]
Overview
This runbook is for Apier operators rotating the live Maskinporten signing keypair. It documents the six-step rotation procedure (generate the new keypair, upload to Samarbeidsportalen, flip the two relevant env vars in Vercel, verify via readiness probe, wait for old-token expiry, then remove the old kid to close the validity-overlap window), the rollback path that stays available until the final cleanup, and the Sentry observability signals an on-call engineer reads while watching the rotation land.
Use it when running a scheduled rotation, when the keypair hits a certificate-expiry deadline, during a security incident, or when there is reason to suspect the active signing key has been compromised. The first scheduled deadline is fixed by the test-environment keypair expiring on 2027-04-16 (tracked in APPROVALS-CHECKLIST row #5), so the procedure here will be exercised before that date.
The runbook does NOT cover Samarbeidsportalen account access or the initial cutover from mock to live; both are documented in Maskinporten Production Setup. Read that first if Apier is not yet in live mode — a rotation against a not-yet-live integration is a setup task, not a rotation.
Prerequisites
Before starting Step 1, confirm three things:
MASKINPORTEN_MODE=liveis currently set in Vercel production env. A rotation against a mock-mode deployment is a no-op because the mock adapter does not sign with the configured PEM.- The readiness probe currently returns 200 with
status: "ready". The probe is documented in Maskinporten Production Setup → Readiness probe; a probe that is not green before rotation will not be green after rotation, and rolling into a rotation while live mode is already broken just makes the post-rotation triage harder. - On-call coordination is in place. Production key rotation is a maintenance-window event, not a fire-and-forget deploy — every step has a verification action attached, and the validity-overlap window between Steps 2 and 6 is the safety net that lets you roll back without service degradation.
Step 1 — Generate the new keypair
Generate a fresh RSA-2048 private key locally, then extract the public half so it can be uploaded to Samarbeidsportalen in Step 2. Two openssl commands cover both halves:
openssl genrsa -out maskinporten-new.pem 2048
openssl rsa -in maskinporten-new.pem -pubout -out maskinporten-new.pub.pemThe first command writes the PKCS#8 private key — that PEM body becomes the MASKINPORTEN_PRIVATE_KEY env var in Step 3. The second command extracts the matching public key as a PEM file. Samarbeidsportalen's Nøkler page accepts the PEM directly (it converts to JWK server-side at upload time), so no extra conversion step is required for the most common workflow.
Pick a new kid value — Apier follows the UUID v4 convention documented in the MASKINPORTEN_KID schema comment in src/lib/env.ts. Capture all three artifacts for the next steps: the new kid string, the private-key PEM (including -----BEGIN PRIVATE KEY----- / -----END PRIVATE KEY----- lines verbatim), and the public-key PEM.
Keep the new private key out of source control and out of any chat channel. It belongs in your local keyring until it is set as an env var in Step 3.
Step 2 — Upload the new public key to Samarbeidsportalen
Open the client management UI at sjolvbetjening.samarbeid.digdir.no, navigate to your Apier client, and open the Nøkler page. Upload the public PEM from Step 1 (maskinporten-new.pub.pem) and assign the new kid value chosen in Step 1.
DO NOT delete the old key entry yet. This action opens the validity-overlap window: both kids are now trusted at Maskinporten, so a token signed under either kid will be accepted. The overlap window stays open until Step 6 closes it, and the entire rollback path documented below depends on it being open.
Step 3 — Update Vercel env vars
Flip exactly TWO env vars in the Vercel production environment:
MASKINPORTEN_KID→ the new kid string from Step 1.MASKINPORTEN_PRIVATE_KEY→ the new PEM body from Step 1 (including the-----BEGIN PRIVATE KEY-----and-----END PRIVATE KEY-----lines).
The other four Maskinporten env vars stay constant: MASKINPORTEN_MODE, MASKINPORTEN_CLIENT_ID, MASKINPORTEN_TOKEN_ENDPOINT, MASKINPORTEN_ISSUER. Trigger a redeploy after saving the two flipped values.
Step 4 — Verify post-rotation
Once the redeploy completes, exercise the readiness probe. The host apier.no redirects to www.apier.no with a 308; run curl -L so the redirect is followed automatically:
curl -L -H "Authorization: Bearer $INTERNAL_API_SECRET" \
https://apier.no/api/v1/_internal/maskinporten/readinessExpected response:
{"status":"ready","maskinporten":{"kid":"<NEW_KID>"}}The probe is more than a configuration sanity check. It runs a dry-run JWS sign that exercises the PEM parse + RS256 sign path with the new key, so a green response confirms that boot validation accepted the new PEM AND that the live adapter can produce a client_assertion JWT under the new kid. A 503 here means the new key never reaches the live exchange in the first place, and the failure is structural rather than transient.
Cache behaviour worth understanding: by construction the token-manager cacheKey is ${scope}|${kid} (per PR-MASKINPORTEN-CACHE-KID), so the first live request after the redeploy reads the new kid env value, produces a new cacheKey, sees a cache miss, and fetches a fresh token under the new kid. There is no "wait for cache eviction" delay — the rotation is effective on the very first post-redeploy request that hits the adapter.
Step 5 — Wait for old-kid token expiry
Let any access tokens previously issued under the OLD kid expire naturally. Token lifetime is capped at 120 seconds and the cache REFRESH_THRESHOLD_MS is 60 seconds, so after roughly three minutes no live request is still signing under the old kid.
Skipping this wait does not break anything while the overlap window is open — the old kid is still trusted at Maskinporten — but the wait is what guarantees that Step 6 cannot accidentally cause a 401 on an in-flight token that was already issued under the old kid.
Step 6 — Remove the old key from Samarbeidsportalen
Return to sjolvbetjening.samarbeid.digdir.no, open the Nøkler page, and delete the previous kid entry. This closes the validity-overlap window: only the new kid is now trusted at Maskinporten.
After this step the rollback path documented below requires re-uploading the old public key as a new Samarbeidsportalen entry, which starts a new validity-overlap cycle. Plan the timing so you are confident in the new key before deleting the old one.
Rollback path
If the readiness probe in Step 4 returns 503 with failures[0].field matching MASKINPORTEN_PRIVATE_KEY or MASKINPORTEN_KID, the new key was not accepted by boot validation. Roll back in this order:
- (a) Revert both env vars in Vercel to their prior values.
- (b) Trigger a redeploy.
- (c) Re-verify that the readiness probe returns 200 with
status: "ready". - (d) Leave the old key entry in Samarbeidsportalen intact — Step 6 has not yet executed, so the old kid is still uploaded and trusted.
Rollback is always available between Steps 4 and 6, because the validity-overlap window remains open until Step 6 closes it. Once Step 6 deletes the old key, a rollback requires the additional step of re-uploading the old public key as a new Samarbeidsportalen entry under a kid value — effectively the start of a new validity-overlap cycle.
Observability
A clean rotation produces zero token_request_exhausted Sentry events and zero MASKINPORTEN_AUTH_FAILED codes in the consumer reliability surface. The four PR-MASKINPORTEN-OBSERVABILITY breadcrumb categories — token_request_start, token_request_retry, token_request_success, token_request_exhausted — appear in the same proportions as on any other day; rotation is not a load event.
A botched rotation, by contrast, is loud: the most common failure mode is a redeploy that flips MASKINPORTEN_KID but leaves MASKINPORTEN_PRIVATE_KEY pointing at the old PEM (or vice-versa), and the resulting client_assertion JWT is signed under the wrong key. Maskinporten rejects with a 401 on the token exchange, which propagates to consumers as MASKINPORTEN_AUTH_FAILED — the same error code documented in Maskinporten Production Setup → MASKINPORTEN_AUTH_FAILED.
The outcome tag enum on the token_request_exhausted capture is the fastest disambiguation surface: client_error indicates a 401/403 from Maskinporten (likely the signing-key mismatch above), exhausted indicates a 5xx loop that exhausted the retry budget without authentication ever entering the picture, and timeout indicates that the final attempt was an AbortSignal.timeout rather than an authoritative upstream response.
Related reading
- Maskinporten Production Setup — operator setup, readiness probe walkthrough, and the full
MASKINPORTEN_AUTH_FAILEDtriage flow. - Error handling and the Compliance Explainer — consumer-facing framing of
MASKINPORTEN_AUTH_FAILEDand the other reliability error codes. - Maskinporten Developer Guide — Maskinporten OAuth2 background and the JWS assertion contract.