SRE - Resilience
An SRE focused on Resilience. Someone who can look at a complex system of services, products, applications, and contents that work together for a full E2E customer experience in a telco company and identify areas for improvement to make it more solid, stable, reliable.
Closely related to the Googles definition of an SRE however here almost exclusively focused on resilience itself. This can be before, during or after code has been written for that product.
As a part of your job, you will:
- Define/create/implement standards and drive implementation of resilient design
- Understand what happens if a downstream service fails. How is our upstream response handled? What is the customer experience (impact)?
- Define/create/implement fallback mechanisms/circuit breakers, understand if its appropriate to create one at all. Define/create logic for aforementioned circuit breakers (experience shows todays implementations may have a negative impact)
- How do we tackle E2E resilience on a customer journey?
- Define/create/implement timeouts settings E2E (these have caused negative outcomes in the past)
- Participate in complex operational issues E2E, identifying root causes and architectural solutions (or other improvements) to avoid re-occurrence
- Work closely with architecture team and Tech Leads in early stages of SDLC
An environment where services can be built in mobile, web, integration or backend technologies, Google Cloud based and Apigee exposure. Some of the technologies involved are:
- Strapi CMS
- Squid Proxy
- Kotlin and Swift
Availability to travel is important, the project requires trips to the UK (once every two months).
- Ability to adapt to different contexts, teams and Clients
- Teamwork skills but also sense of autonomy
- Motivation for international projects and ok if travel is included
- Willingness to collaborate with other players
- Strong communication skills
Want to apply?
Faça upload do seu CV* (max. 4MB)
Upload your photo or video (max. 4MB)