ROLE: Site Reliability Engineer II
EXP: 5-8 years
TECH STACK:
Containerisation & Orchestration : Docker, Kubernetes, Rancher, EKS, ECS, GKE, Elastic Beanstalk, Google App Engine
Cloud Platform : AWS, GCP
IaaC : Terraform, AWS-CloudFormation / GCP-CloudDeploymentManager, Ansible
Infra Monitoring : Prometheus, Datadog, Alert Manager, Thanos, AWS Cloudwatch
CI/CD : GITLAB CI-CD, Jenkins
Scripting : Python, Golang
VCS : GITLAB, Perforce, Subversion
OS : UBUNTU, CENTOS, Amazon LINUX, Redhat Linux
Nice to Have : Experience with supporting systems orchestrated on AWS OpsWorks
RESPONSIBILITIES:
- Implement, Own, maintain, monitor & support the backend servers & micro-services infrastructure for the studio titles which runs on wide-variety of tech stack
- Implement/maintain various automation tools for development, testing, operations and IT infrastructure
- Be available for on-call duty during production outages in 24/7 PAGERDUTY support
- Work very closely with all the disciplines/stakeholders and keep them communicated on all impacted aspects
- Defining and setting development, test, release, update, and support processes for the SRE operations
- Excellent troubleshooting skills in areas of systems Infrastructure engineering
- Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimising the workflow times
- Encouraging and building automated processes wherever possible
- Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
- Incidence management and root cause analysis.