Runbooks

Overview

This section provides step-by-step operational procedures for common validator tasks and emergency scenarios.

These runbooks are designed for production validator operations and assume you have basic familiarity with Linux system administration, the Cosmos SDK, and the evmd command-line interface.

Important: Always test procedures in a non-production environment first. For critical operations affecting validator uptime, ensure you have recent backups and a rollback plan.

Runbook 1: Validator Restart

Purpose: Gracefully restart validator without getting jailed

Prerequisites:

SSH access to validator
Backup of critical files

Steps:

Pre-restart checks:

# Check current status
curl -s http://localhost:26657/status | jq '.result.sync_info'

# Check validator signing
# Note: Set $VALCONS if not already set (see "Checking Validator Status" section)
evmd query slashing signing-info $VALCONS --node http://localhost:26657

Stop validator:

sudo systemctl stop evmd

# Verify stopped
ps aux | grep evmd

Perform maintenance:
```
# Update binary if needed
make install

# Verify new version
evmd version
```
Note: The make install command must be run from the Bitplanet repository directory. It builds the binary with the correct version information and installs it to your $GOPATH/bin (typically ~/go/bin/evmd).
Start validator:
```
sudo systemctl start evmd
```

Post-restart validation:

# Check service status
sudo systemctl status evmd

# Monitor logs for 2 minutes
sudo journalctl -u evmd -f --output cat

# Verify syncing
curl -s http://localhost:26657/status | jq '.result.sync_info.catching_up'

# Check signing (VALCONS should already be set from "Checking Validator Status" section)
sleep 60
evmd query slashing signing-info $VALCONS --node http://localhost:26657

Expected Duration: 2-5 minutes

Rollback Plan: Restore from backup if new binary fails

Runbook 2: Consensus Failure Recovery

Purpose: Recover from consensus failure and rejoin network

Symptoms:

Node not producing blocks
Persistent errors in logs
Validator jailed

Steps:

Immediate actions:

# Stop node
sudo systemctl stop evmd

# Backup current state
cp -r $HOME/.evmd/data $HOME/.evmd/data.backup-$(date +%Y%m%d-%H%M%S)

Diagnose issue:

# Check last logs
sudo journalctl -u evmd --no-pager | tail -100

# Check database integrity
evmd tendermint unsafe-reset-all --home $HOME/.evmd --dry-run

Recovery options:
Option A: Reset Tendermint state (if app state is intact)
```
evmd tendermint unsafe-reset-all --home $HOME/.evmd
sudo systemctl start evmd
```
What this does: Removes all blockchain data (blocks, consensus state, transaction index) but keeps your configuration and keys. The node will resync from the network starting from genesis or a state sync snapshot. This is useful when consensus state is corrupted but your keys and configuration are fine.
Option B: Restore from snapshot (if corruption is severe)
```
rm -rf $HOME/.evmd/data
tar -xzf latest-snapshot.tar.gz -C $HOME/.evmd/data
sudo systemctl start evmd
```
Option C: Resync from genesis (last resort)
```
evmd tendermint unsafe-reset-all --home $HOME/.evmd
# Ensure genesis.json is correct
sudo systemctl start evmd
```

Monitor recovery:

# Watch sync progress
watch -n 5 'curl -s http://localhost:26657/status | jq ".result.sync_info"'

# Monitor logs
sudo journalctl -u evmd -f

Re-enable validator (if was jailed):

# Wait until fully synced
# Then unjail
evmd tx slashing unjail --from validator --keyring-backend file --home $HOME/.evmd --chain-id 9001 --yes

Expected Duration: 30 minutes to several hours (depending on sync method)

Runbook 3: Reward Claim Failure Resolution

Purpose: Troubleshoot and resolve issues with claiming validator rewards

Symptoms:

Transaction fails when claiming rewards
"insufficient funds" error
Rewards not visible

Steps:

Check rewards availability:

# Query commission rewards
evmd query distribution commission $(evmd keys show validator --bech val -a --keyring-backend file --home $HOME/.evmd) \
  --node http://localhost:26657

# Query delegation rewards
evmd query distribution rewards $(evmd keys show validator -a --keyring-backend file --home $HOME/.evmd) \
  --node http://localhost:26657

Check account balance for fees:

evmd query bank balances $(evmd keys show validator -a --keyring-backend file --home $HOME/.evmd) \
  --node http://localhost:26657

Attempt reward withdrawal:

# Withdraw commission
evmd tx distribution withdraw-rewards $(evmd keys show validator --bech val -a --keyring-backend file --home $HOME/.evmd) \
  --commission \
  --from validator \
  --keyring-backend file \
  --home $HOME/.evmd \
  --chain-id 9001 \
  --gas auto \
  --gas-adjustment 1.5 \
  --gas-prices 10000000bplcoin \
  --yes

Note: The --commission flag withdraws validator commission only. To withdraw delegation rewards (your own staked tokens' rewards), omit the --commission flag. You can withdraw both in separate transactions or combine them.

If transaction fails:

# Increase gas limit
evmd tx distribution withdraw-rewards $(evmd keys show validator --bech val -a --keyring-backend file --home $HOME/.evmd) \
  --commission \
  --from validator \
  --keyring-backend file \
  --home $HOME/.evmd \
  --chain-id 9001 \
  --gas 300000 \
  --gas-prices 10000000bplcoin \
  --yes

Verify withdrawal:

# Check updated balance
evmd query bank balances $(evmd keys show validator -a --keyring-backend file --home $HOME/.evmd) \
  --node http://localhost:26657

Expected Duration: 5-10 minutes

Runbook 4: Emergency Validator Shutdown

Purpose: Emergency procedure for immediate validator shutdown

When to Use:

Security breach detected
Double signing risk
Critical infrastructure failure

Steps:

Immediate shutdown:

# Stop service immediately
sudo systemctl stop evmd

# Kill any remaining processes
sudo pkill -9 evmd

# Verify all processes stopped
ps aux | grep evmd

Secure validator keys:
```
# Backup keys to secure location
tar -czf emergency-backup-$(date +%Y%m%d-%H%M%S).tar.gz \
  $HOME/.evmd/config/priv_validator_key.json \
  $HOME/.evmd/config/node_key.json \
  $HOME/.evmd/keyring-file/

# Move to secure offline storage
# DO NOT leave on the server
```
Critical: The priv_validator_key.json is your validator's consensus key. If this is compromised, an attacker could double-sign blocks using your validator identity. Always store backups encrypted and in multiple secure locations (hardware security module, encrypted USB drive, secure cloud storage with strong encryption).
Prevent automatic restart:
```
sudo systemctl disable evmd
```

Document incident:

# Create incident report
cat > incident-report-$(date +%Y%m%d-%H%M%S).txt <<EOF
Timestamp: $(date)
Reason: [DOCUMENT REASON]
Actions Taken: Emergency shutdown
Status: Validator offline
Next Steps: [DOCUMENT RECOVERY PLAN]
EOF

Notify stakeholders:
- Inform delegators via social media
- Update status page
- Contact network coordinators if needed

Recovery: Follow Runbook 2 or 3 based on the incident cause

Next Steps

Review hardware requirements in Hardware Requirements
Check setup procedures in Setup & Configuration
Learn about daily operations in Operations
Access quick reference in Additional Resources

PreviousTroubleshooting NextAdditional Resources

Last updated 1 month ago

hashtagOverview

hashtagRunbook 1: Validator Restart

hashtagRunbook 2: Consensus Failure Recovery

hashtagRunbook 3: Reward Claim Failure Resolution

hashtagRunbook 4: Emergency Validator Shutdown

hashtagNext Steps

Overview

Runbook 1: Validator Restart

Runbook 2: Consensus Failure Recovery

Runbook 3: Reward Claim Failure Resolution

Runbook 4: Emergency Validator Shutdown

Next Steps