How do you write an SLO for a service when higher reliability is needed only during special events?
How do you test changes to Jenkins plugins before deploying them?
How do you simulate sudo su <user> in Ansible?
How do you securely deploy large number of Kubernetes components in isolation?
How do you monitor status of multiple docker containers?
How do you know which secrets and credentials of your production services were used, and by whom/what?