Sort
Goal: Reduce time required for maintenance.
Targets:
- Infrastructure changes are rolled out automatically...
- for VMs
- for physical hosts
- Monitoring makes sure OPS gets notified if a
- host has problems (storage, memory, load) or is not available (SSH, ping)
- service is misbehaving. (port, metrics)
- We have remote access to all important machines, also during boot time.
- Automatic HDD unlock with secure boot
- Remote management option on important hosts
- Commits are tested automatically to reduce risk of breaking things.
- There is a disaster recovery path for our services that are
- consumer facing (cloud, vault)
- OPS-critical (Backups, VPN)
- developer services (git, CI)