Technical Site Reliability Engineering (SRE) Lead

Nov 26, 2024
Montréal, Canada
... Not specified
... Intermediate
Full time
... Office work

As a Technical Site Reliability Engineering (SRE) Lead within Ubisoft’s IT department, you will manage a team of SREs to ensure the reliability, scalability, and performance of our IT platform. You will play a pivotal role in shaping the architecture and operations of our cloud-native infrastructure, with a strong focus on automation and large-scale system management.

Responsibilities:

  • Leadership: manage and mentor a team of SREs, fostering a culture of continuous learning and improvement.
  • Design and Development: Oversee the design and development of tools and solutions for the smooth operation of the Kubernetes environments.
  • Maintenance and Operation: Ensure the maintenance and operation of various components of the Ubisoft IT Platform, emphasizing documented and automated installation and support procedures.
  • Continuous Improvement: Drive enhancements in continuous integration and delivery systems, ensuring they meet the highest standards of reliability and performance.
  • Collaboration: Collaborate closely with Developer teams to assess their needs and ensure the platform is designed for operability and ease of use.
  • Advocate: Advocate for the use of Kubernetes and other cloud-native technologies within Ubisoft.
  • Evaluation: steer the evaluation of new requirements, technical designs, and standards to ensure they align with best practices and organizational goals.
  • Strategic Planning: Contribute to strategic planning and decision-making processes to guide the future direction of the platform.Qualifications

This role involves on-call.* 

  • Expertise in cloud-native architectures, Kubernetes (e.g., CRD, CNI, admission controllers), and Linux systems.
  • Strong CI/CD capabilities with tools like GitLab CI and ArgoCD, plus experience with public cloud providers (Azure, AWS, GCP).
  • Proficient in scripting or development (preferably Go and/or Python) and infrastructure automation with Terraform.
  • Advanced understanding of Linux networking, system configuration, and network administration.
  • Effective collaboration skills, including experience working with remote teams.

Bonus:

  • Familiarity with OpenStack, Docker, Flask, OPA, and other DevOps tools.
  • Previous leadership experience managing large-scale production systems.

Just a heads up: If you require a work permit, your eligibility may depend on your education and years of relevant work experience, as required by the government.

Skills and competencies show up in different forms and can be based on different experiences, that is why we strongly encourage you to apply even though you may not have all the requirements listed above.

At Ubisoft, we embrace diversity in all its forms. We’re committed to fostering an inclusive and respectful work environment for all. We know the importance of providing a pleasant interview experience, therefore if you need any accommodation, please let us know if there is anything we can do to facilitate the interview process.