Validator Key Rotation Playbook¶
Revision: 2025-10-12
This playbook outlines how DATU validator operators initiate, execute, and close out key rotation events. It is designed to satisfy Phase 1 compliance requirements while remaining actionable for on-call engineers during an incident.
1. When to Rotate Keys¶
| Trigger | Examples | Decision Owner |
|---|---|---|
| Compromise suspected | Unauthorized signature observed, HSM tamper alert, validator host breach. | Security incident commander. |
| Scheduled maintenance | Annual hygiene rotation, HSM firmware upgrade requiring new key material. | Validator operator lead. |
| Operational change | Custodian transfer, validator tier upgrade/downgrade, or quorum weight modification. | Steering committee representative. |
Rotation events must be logged in the ledger security register with timestamps, custodians, and supporting evidence before execution begins.
2. Preparation Checklist¶
- Convene the response team. Confirm incident commander, validator operator, governance liaison, and compliance observer on a dedicated bridge. Capture minutes in the incident tracker.
- Snapshot current state. Export the existing validator public key, quorumset entry, and SCP envelope hashes for the last 10 ledgers. Store artefacts in the secure evidence locker for post-event audit.
- Stage replacement hardware. Validate that spare HSMs or logical partitions are online, backed up, and accessible from the validator host. Run vendor self-tests and confirm firmware baselines.
- Generate replacement keys. Use the vendor tooling to create a new ED25519 keypair. Record:
- Public key string (
G...). - Slot/partition identifier.
- Custodian contact information.
- Hash of the key metadata bundle.
- Draft network updates. Prepare pull requests for:
infrastructure/stellar-fork/config/quorumsets.toml(update public key and weight).infrastructure/stellar-fork/config/validators/<validator>/stellar-core.cfg(replaceNODE_SEED).docs/operations/ledger-security-register.md(append metadata entry).- Notify stakeholders. Brief governance and compliance stakeholders on the planned timeline. Obtain written approval to proceed for scheduled rotations, or incident commander sign-off for emergency events.
3. Execution Steps¶
- Pause consensus participation.
- For Tier 0/Tier 1 nodes: set
UNSAFE_QUORUM=truetemporarily if quorum intersection would otherwise break. - Use
docker compose stop <validator>to halt the affected container. - Export and archive history. Trigger
stellar-core --c "history-cmd catchup current 1"to ensure the history archive contains the final ledgers signed with the old key. - Install new credentials.
- Update the validator host to point
SIGNING_KEY_SEED_PATH(or the PKCS#11 URI) at the new HSM slot. - Mount the refreshed credential path into the Docker Compose profile. Validate permissions (
chmod 600). - Deploy configuration updates. Merge the prepared pull requests once reviewer approvals are captured. Redeploy the
validator container using
docker compose up -d <validator>. - Rejoin consensus. Monitor the validator via
stellar-core http-command infoand confirm theledgerfield advances in lockstep with peers. Validate quorum status viastellar-core http-command quorum. - Post-rotation validation.
- Run
scripts/stellar-fork/start.sh --with-validatorslocally to ensure the new key participates in test quorum. - Execute the HSM smoke test (
scripts/stellar-fork/hsm-smoketest.sh) against the refreshed slot to confirm the quickstart container can read the credentials and initialisestellar-coresuccessfully.
4. Communication Requirements¶
| Audience | Message | Channel | Timing |
|---|---|---|---|
| Steering committee | Rotation approved and executed, include ledger range covered. | Governance Slack / email | Within 1 hour of completion. |
| Compliance liaison | Updated custodian roster and evidence location. | Compliance tracker | Within 4 hours. |
| Public status page | For emergency rotations, publish a transparency note summarising impact and mitigations. | Status page | Within 24 hours. |
5. Post-Event Activities¶
- Update documentation. Confirm the ledger security register, quorum configuration, and any operational guides reference the new key.
- Conduct a retrospective. Within 3 business days, hold a 30-minute review covering root cause, detection gaps, and follow-up actions.
- Audit trail packaging. Bundle logs, history snapshots, and meeting notes. Store them under the incident ID per the artifact retention policy.
- Compliance attestation. Obtain written acknowledgement from the compliance liaison that rotation controls were executed.
6. Appendix – Quick Commands¶
# Export last signed ledger hash before shutdown
stellar-core http-command ledgers?cursor=now
# Validate quorum status after restart
stellar-core http-command quorum
# Rebuild validator container with new credentials
SIGNING_KEY_SEED_PATH=/secure/path/new-key.txt docker compose up -d datu-tier1-east
Keep this playbook under version control. Proposed changes require approval from security engineering and the steering committee before merging.