Submeter

Senior Distributed Systems & SRE Lead

Almada

Descrição da posição

We are looking for a Senior Distributed Systems & SRE Lead, Hybrid-3days in location.

You will be joining a high-impact engineering engagement responsible for the health and performance of a data infrastructure comprising over 2,750 nodes across 11 distinct technologies (Oracle, PostgreSQL, MySQL, MariaDB, Cassandra, Solr, ElasticSearch, OpenSearch, Redis, Kafka, RabbitMQ, MapR). You’ll serve as an L3 team to an existing operational team for all technologies listed, and provide some L2 support for Oracle and PostgreSQL. The mission is to support and solve major incidents, while simultaneously building an automation library (Ansible/Terraform) to standardize operations across the global footprint.


Responsibilities:
  • Ensure distributed systems (Kafka, Cassandra, Elasticsearch, Solr, OpenSearch, MapR/Hadoop) are running optimally and take first-line response for major incidents on the database side.
  • Serve as a Subject Matter Expert for relational, non-relational, and messaging technologies.
  • Collaborate with other engineering teams on complex lifecycle events, such as upgrades and migrations.
  • Lead deep-dive forensic Root Cause Analysis (RCA) for recurring issues and production outages, permanently eliminating recurring problems.
  • Manage the health and scaling of distributed clusters (Kafka, Cassandra, Elasticsearch, OpenSearch, Solr), including partition rebalancing and node decommissioning.
  • Lead the configuration and tuning of Elasticsearch/OpenSearch and Solr indexing strategies for shard stability and search optimization.
  • Develop Terraform modules and Ansible roles to standardize and automate environment management and deployments.
  • Maintain the stability of large-scale big data solutions (MapR/Hadoop).
  • Participate in a 24/7 on-call rotation and cover weekend interventions as required, with flexibility in business week scheduling.




Requirements

Requirements:
  • Bachelor´s or Master´s degree in Computer Science
  • 7+ years of hands-on experience managing Apache Kafka (brokers/Zookeeper/Kraft), Apache Cassandra (ring management/repair), Elasticsearch/OpenSearch, and Solr.
  • Solid experience with Redis, RabbitMQ, and MapR/Hadoop.
  • Expert-level Linux/Unix administration and shell scripting (Bash).
  • Proven track record with Ansible and Terraform for automation, deployment, and patch management.
  • Schedule flexibility (working hours: 14:00-23:00 / 09:00-18:00; must accommodate team meeting schedules).
  • Willingness and availability to take part in a 24/7 on-call rotation.
  • Commitment to flexible shift patterns, including weekends, balanced with business week time-off to ensure a sustainable allocation.

Nice to Have:
  • Willingness to cross-train and handle triage for any technologies in scope (including relational and NoSQL databases, and messaging systems).
  • Experience with Oracle, PostgreSQL, MySQL, MariaDB, Cassandra, Solr, ElasticSearch, OpenSearch, Redis, Kafka, RabbitMQ, and MapR.



Quer se candidatar?
Cargo
Nome*
Email*
Telefone*
País*
Cidade*
Linkedin
Upload your CV* (máx. 4MB)
Faça upload da sua foto ou video (máx. 4MB)
Submeter