Manager, Site Reliability Engineer
Command Alkon
Title: Manager, Site Reliability Engineer (SRE)
Summary of Role
The Site Reliability Engineer (SRE) Manager leads the teams responsible for ensuring the availability, performance, and reliability of mission-critical systems. This role bridges the gap between software engineering and operations by implementing automation, observability, and scalability practices. The SRE Manager sets the vision for reliability engineering, enabling rapid product delivery while maintaining high service uptime and customer trust.
Responsibilities span building resilient infrastructure, driving incident management processes, optimizing system performance, and fostering a culture of continuous improvement. The role also requires staying ahead of industry practices in monitoring, automation, and distributed systems, ensuring the organization delivers secure, reliable, and scalable services.
How You’ll Succeed
-Ensure Reliability & Uptime: Monitor and manage the reliability of production systems, maintaining high availability and scalability across global environments.
-Incident Leadership: Lead incident response, root cause analysis, and postmortem reviews while driving systemic improvements.
-Automation First: Reduce manual work by implementing automation for deployments, monitoring, capacity management, and self-healing systems.
-Service Level Ownership: Define, track, and enforce SLAs, SLOs, and SLIs across services, ensuring alignment with business objectives.
-Operational Excellence: Optimize performance, capacity planning, disaster recovery, and resilience engineering practices.
-Cross-Team Collaboration: Partner with product engineering, DevOps, and security to design for reliability from the ground up.
-Team Leadership: Mentor, coach, and guide SRE teams to drive technical growth and operational maturity.
-Process Improvement: Continuously refining reliability engineering processes, integrating lessons learned into new standards and SOPs.
-Customer-Centric Mindset: Advocate for reliability as a core feature, ensuring customer experience and trust are at the forefront.
What You Bring
-Strong leadership experience managing SRE or operations-focused engineering teams.
-Expertise in distributed systems, cloud-native architectures, and large-scale production environments.
-Proficiency with observability tools, performance monitoring, and incident management frameworks.
-Deep knowledge of automation, CI/CD, infrastructure as code, and cloud services (AWS, Azure, GCP).
-Familiarity with chaos engineering, resilience design patterns, and capacity planning.
-Clear understanding of SLAs, SLOs, and SLIs and how to implement them across services.
-Solid background in system security, compliance, and risk management in production environments.
-Ability to balance reliability with speed of delivery by partnering closely with development and product leaders.
-Proven ability to develop talent, build high-performing teams, and cultivate collaboration.
Who You Are
Manages Complexity - You make sense of complex, high quantity, and sometimes contradictory information to effectively solve problems.
Decision Quality – You make good and timely decisions that keep the organization moving forward.
Optimizes Work Processes – You know the most effective and efficient processes to get things done, with a focus on continuous improvement.
Builds Effective Teams – You build strong-identity teams that apply their diverse skills and perspectives to achieve common goals.
Strategic Mindset – You see ahead to future possibilities and translate them into breakthrough strategies.
All Company Core Competencies
Customer Focus: You build strong customer relationships and deliver customer-centric solutions.
Cultivates Innovation: You create new and better ways for the organization to be successful.
Collaborates: You build partnerships and work collaboratively with others to meet shared objectives.
Instills Trust: You gain the confidence and trust of others through honesty, integrity, and authenticity.
Self-Development: You actively seek new ways to grow and be challenged using both formal and informal development channels.
Develops Talent (Mgmt Only): You develop people to meet both their career goals and the organization's goals.