Remote Otter LogoRemoteOtter

Senior Site Reliability Engineer - Remote

Posted Yesterday
DevOps / Sysadmin
Full Time
Worldwide

Overview

As a Senior Site Reliability Engineer in our Platform Engineering Organization, you help to build and run large-scale, distributed, fault-tolerant systems. In this role, you are involved in the complete lifecycle of our products from inception to operation, ensuring they are reliable, performant and meet appropriate uptime and availability targets. You design and maintain resilient systems, implement robust observability through metrics, logging, and tracing, and build automation that improves deployment, monitoring, and incident response workflows. This includes leveraging AI/ML for intelligent alerting, anomaly detection, and predictive incident response to enhance system reliability and scalability. Working with others in the organization, you help develop and influence operational tooling, best practices, and standards that empower the engineering organization and help ensure Rithum's effective and efficient operations. As a Senior Engineer, you operate independently, self-prioritizing work, design and lead projects from start to completion, engaging with stakeholders for successful delivery. You mentor and assist less experienced people on the team and coach them to help improve their skills.

In Short

  • Collaborate with developers, Client Support, and cross-functional teams to build production automation, analysis tools, and improving reliability and performance.
  • Design, implement, and maintain robust application monitoring and observability systems for a distributed, highly available, and scalable software stack leveraging AI/ML to detect anomalies and assist with incidents.
  • Analyse and resolve problems in legacy environments while designing and implementing modern, scalable solutions from the ground up.
  • Participate in the rotating on-call schedule, ensuring that user emergencies, platform alerts, and support requests are addressed.
  • Drives automation and operational efficiency.

Requirements

  • 3+ years' experience working as an SRE, DevOps Engineer or related.
  • Experience with logging and monitoring systems like CloudWatch, Grafana or Prometheus.
  • Experience with AWS foundations, including compute, storage, and security.
  • Good AWS knowledge including application design, migration support, cost planning, capacity allocation, and application resiliency.
  • Expertise in creating multi-region cloud systems with a solid disaster recovery plan.
  • Experience with both high-level and scripting languages like Python, Bash or Typescript.
  • Experience troubleshooting and debugging complex, distributed applications.
  • IaC experience automating infrastructure with CDK, Terraform or Ansible.
  • Experience with continuous deployment pipelines and containerization like EKS or ECS.
  • Strong understanding of software engineering fundamentals, including object-oriented design, modular architecture, and maintainable coding practices.

Benefits

  • Medical, Dental and Psychology benefits.
  • Life insurance and disability benefits.
  • Competitive time off package with 25 Days of PTO, 13 Company-Paid Holidays, 2 Wellness days and 1 Paid Volunteer Day.
  • Voucher program for Transportation, Meals & Childcare.
  • Remote Working Stipend: €40/month automatically applied in payroll.
  • Access to tools to support your wellbeing such as the Calm App and an Employee Assistance Program.
  • Professional development stipend and learning and development offerings to help you build the skills and connections you need to move forward in your career.
  • Charitable contribution match per team member.
Rithum LinkedIn Board logo

Rithum LinkedIn Board

Rithum™ is a leading commerce network that facilitates seamless collaboration between brands, suppliers, and retailers to enhance e-commerce experiences. With over 40,000 companies relying on its platform, Rithum enables businesses to accelerate growth, optimize operations, and improve margins across various channels, representing more than $50 billion in annual gross merchandise volume (GMV). The company offers comprehensive commerce, marketing, and delivery solutions that help customers create optimized shopping journeys. Rithum fosters a supportive and inclusive work environment, prioritizing employee well-being and professional development while maintaining a commitment to diversity and equal opportunity.

Share This Job!

Save This Job!

Similar Jobs:

F.P

Senior Site Reliability Engineer - Remote

Fullsteam Personnel

3 days ago

Join Fullsteam as a Senior Site Reliability Engineer to ensure the reliability and performance of our infrastructure and applications.

USA
Full-time
DevOps / Sysadmin
Virta Health logo

Senior Site Reliability Engineer - Remote

Virta Health

4 days ago

Join Virta Health as a Senior Site Reliability Engineer to enhance system reliability and implement AI-driven observability.

USA
Full-time
DevOps / Sysadmin
$167,249 - $216,000/year
Tilt logo

Senior Site Reliability Engineer - Remote

Tilt

6 days ago

Tilt is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of their systems while integrating AI and automation tools.

Worldwide
Full-time
DevOps / Sysadmin
$165,000 - $175,000/year

TensorWave

Senior Site Reliability Engineer - Remote

TensorWave

1 week ago

TensorWave is seeking a Senior Site Reliability Engineer to build and maintain scalable infrastructure while ensuring platform reliability.

Worldwide
Full-time
DevOps / Sysadmin
Citizen Health logo

Senior Site Reliability Engineer - Remote

Citizen Health

1 week ago

Citizen Health is looking for a Senior Site Reliability Engineer to ensure the resilience and performance of their AI-powered healthcare platform.

CA, USA
Full-time
DevOps / Sysadmin