Skip to content

Validator Key Rotation Playbook

Revision: 2025-10-12

This playbook outlines how DATU validator operators initiate, execute, and close out key rotation events. It is designed to satisfy Phase 1 compliance requirements while remaining actionable for on-call engineers during an incident.

1. When to Rotate Keys

Trigger Examples Decision Owner
Compromise suspected Unauthorized signature observed, HSM tamper alert, validator host breach. Security incident commander.
Scheduled maintenance Annual hygiene rotation, HSM firmware upgrade requiring new key material. Validator operator lead.
Operational change Custodian transfer, validator tier upgrade/downgrade, or quorum weight modification. Steering committee representative.

Rotation events must be logged in the ledger security register with timestamps, custodians, and supporting evidence before execution begins.

2. Preparation Checklist

  1. Convene the response team. Confirm incident commander, validator operator, governance liaison, and compliance observer on a dedicated bridge. Capture minutes in the incident tracker.
  2. Snapshot current state. Export the existing validator public key, quorumset entry, and SCP envelope hashes for the last 10 ledgers. Store artefacts in the secure evidence locker for post-event audit.
  3. Stage replacement hardware. Validate that spare HSMs or logical partitions are online, backed up, and accessible from the validator host. Run vendor self-tests and confirm firmware baselines.
  4. Generate replacement keys. Use the vendor tooling to create a new ED25519 keypair. Record:
  5. Public key string (G...).
  6. Slot/partition identifier.
  7. Custodian contact information.
  8. Hash of the key metadata bundle.
  9. Draft network updates. Prepare pull requests for:
  10. infrastructure/stellar-fork/config/quorumsets.toml (update public key and weight).
  11. infrastructure/stellar-fork/config/validators/<validator>/stellar-core.cfg (replace NODE_SEED).
  12. docs/operations/ledger-security-register.md (append metadata entry).
  13. Notify stakeholders. Brief governance and compliance stakeholders on the planned timeline. Obtain written approval to proceed for scheduled rotations, or incident commander sign-off for emergency events.

3. Execution Steps

  1. Pause consensus participation.
  2. For Tier 0/Tier 1 nodes: set UNSAFE_QUORUM=true temporarily if quorum intersection would otherwise break.
  3. Use docker compose stop <validator> to halt the affected container.
  4. Export and archive history. Trigger stellar-core --c "history-cmd catchup current 1" to ensure the history archive contains the final ledgers signed with the old key.
  5. Install new credentials.
  6. Update the validator host to point SIGNING_KEY_SEED_PATH (or the PKCS#11 URI) at the new HSM slot.
  7. Mount the refreshed credential path into the Docker Compose profile. Validate permissions (chmod 600).
  8. Deploy configuration updates. Merge the prepared pull requests once reviewer approvals are captured. Redeploy the validator container using docker compose up -d <validator>.
  9. Rejoin consensus. Monitor the validator via stellar-core http-command info and confirm the ledger field advances in lockstep with peers. Validate quorum status via stellar-core http-command quorum.
  10. Post-rotation validation.
  11. Run scripts/stellar-fork/start.sh --with-validators locally to ensure the new key participates in test quorum.
  12. Execute the HSM smoke test (scripts/stellar-fork/hsm-smoketest.sh) against the refreshed slot to confirm the quickstart container can read the credentials and initialise stellar-core successfully.

4. Communication Requirements

Audience Message Channel Timing
Steering committee Rotation approved and executed, include ledger range covered. Governance Slack / email Within 1 hour of completion.
Compliance liaison Updated custodian roster and evidence location. Compliance tracker Within 4 hours.
Public status page For emergency rotations, publish a transparency note summarising impact and mitigations. Status page Within 24 hours.

5. Post-Event Activities

  1. Update documentation. Confirm the ledger security register, quorum configuration, and any operational guides reference the new key.
  2. Conduct a retrospective. Within 3 business days, hold a 30-minute review covering root cause, detection gaps, and follow-up actions.
  3. Audit trail packaging. Bundle logs, history snapshots, and meeting notes. Store them under the incident ID per the artifact retention policy.
  4. Compliance attestation. Obtain written acknowledgement from the compliance liaison that rotation controls were executed.

6. Appendix – Quick Commands

# Export last signed ledger hash before shutdown
stellar-core http-command ledgers?cursor=now

# Validate quorum status after restart
stellar-core http-command quorum

# Rebuild validator container with new credentials
SIGNING_KEY_SEED_PATH=/secure/path/new-key.txt docker compose up -d datu-tier1-east

Keep this playbook under version control. Proposed changes require approval from security engineering and the steering committee before merging.