Manager - Site Reliability Engineer (SRE) Operations

Exciting opportunity for Manager - Site Reliability Engineer (SRE)
Company: Retail Firm
Role: Manager - Site Reliability Engineer (SRE)
Location: Remote, Toronto
Duration: Full Time

Your new company

Our client, an established leader in the Retail sector is hiring for an experienced Manager - Site Reliability Engineer (SRE) for a permanent opportunity.

Your new role

Reporting to the AVP, Tech Deliver SRE OPS, the SRE Manager will be responsible for building and leading a team of SREs who are responsible for defining and managing SLOs, onboarding new applications into the SRE framework while maintaining the existing portfolio, and being a champion of SRE This role is also an active participant in all aspects of Site Reliability Engineering, including technical vision, telemetry and observation decisions, automation strategy, framework development, solution delivery, incident, and problem management.

Responsibilities:

  • Designing and building the team from the ground up. This includes creating the SRE framework, process, and procedures that will set the group up for success
  • Provide short-term and long-term goals for the team. This includes roadmap development and strategic decisions for the framework, vendors, and tools used by SRE and other Supply Chain teams
  • Contribute in a leadership capacity on day-to-day activities. Concepts such as SLI & SLO development, runbooks and automation, and monitoring and alerting
  • Ensure the team has no roadblocks ahead of them. This will require cross-functional collaboration with multiple teams across the Canadian Tire organization and being a strong proponent of SRE
  • Analysis for new services (in the production or design stage) to align with industry best practices & CTC monitoring framework.
  • Lead weekly operational state reviews covering performance trends, anomalies, errors, and other availability events with SREs, product owners, and development teams
  • Lead workshops to drive the implementation of SRE principles and techniques

What you bring

  • Experience managing or leading a software development or operations team, setting measurable success criteria for engineering roles
  • Strong technical & analytical skills in troubleshooting and correlating information. Previous developer/application or system administrator experience
  • SRE experience creating and designing meaningful SLO/SLI/SLA and error budget definitions
  • Experience with monitoring, logging & telemetry tools like New Relic, Sumologic, Grafana, Splunk, Azure Monitor or similar. Leveraging said tools to provide metric based analytics to support decision making
  • Hands on experience in cloud technologies including defining solutions and supporting applications on the migration to cloud
  • Ability to identify toil and remove redundant tasks leveraging scripting and automation
  • Excellent ability to liaise with business users, IT personnel, and vendors gathering requirements and delivering solutions

What you'll need to succeed:

  • 5+ years building or managing a group of support engineers supporting a critical production environment in a DevOps or SRE domain
  • 5+ year experience creating, collecting, tuning & responding to all things monitoring: alerts, events, metrics, tracing & dashboarding
  • 5+ years experience using APM including New Relic, Grafana, Datadog, Sumologic, Splunk or similar & applying to SRE monitoring techniques (golden signals, USE & RED methodologies)
  • Systems engineering basics including networking, DNS, virtualization, containers, & various OS (Linux, AIX, Windows)
  • Python, Bash, or similar programming/scripting technologies
  • Cloud technologies including Azure, AWS, GCP

Skills
  • Experience with Collaboration & Change Management tools: Jira, Confluence, ServiceNow
  • Experience with database management
  • PowerBI, Tableau & other BI tools
  • Familiarity with microservices architecture & system integrations
  • Knowledge of Retail Business/Supply Chain

What you need to do now If you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now. #1101122

Summary

Job Type
Permanent
Industry
Retail & Consumer Goods
Location
Toronto
Specialism
Data & Advanced Analytics
Pay
Competitive salary
Ref:
1101122

Talk to a consultant

Talk to Dana Palmer, the specialist consultant managing this position, located in Toronto (EN)
8 King Street East, 20th Floor

Telephone: 4162453013