Site Reliability Engineer - Remote

Posted 46 weeks ago

DevOps / Sysadmin

Full Time

LATAM

Site Reliability Engineering

Overview

This role involves ensuring the reliability, performance, and scalability of our MarTech SaaS platform that serves millions of users running thousands of marketing campaigns daily.

In Short

Monitor systems, respond to incidents, and implement automation to improve platform reliability.
Design, implement, and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, and DataDog.
Lead incident response efforts, conduct root cause analyses, and implement preventive measures.
Build and maintain automation tools and processes to reduce manual work and enhance system resilience.
Identify and implement reliability improvements across our platform.
Monitor system performance trends and plan for scaling needs.
Create and maintain runbooks, procedures, and system documentation.

Requirements

3+ years of hands-on experience in site reliability engineering, DevOps, or similar roles.
Strong knowledge of SRE best practices including SLIs/SLOs, error budgets, and reliability engineering principles.
Cloud Platform experience with services like Compute Engine, Kubernetes, Cloud SQL, and related infrastructure components.
DataDog or similar expertise for monitoring, alerting, and observability.
Backend development experience with Java, PHP and/or Node.js.
Incident management skills including on-call experience and troubleshooting under pressure.
Automation mindset with experience in scripting and Infrastructure as Code principles.

Benefits

Remote-first culture with flexible working arrangements.
High-impact role in a small, collaborative team.
Growth opportunities as we scale our platform and expand our engineering team.
Competitive compensation and benefits package.
Learning budget for professional development and certifications.
Modern tech stack with opportunities to work with cutting-edge solutions.

SproutLoud Latam S.A.S

SproutLoud Latam S.A.S is a dynamic MarTech SaaS company that specializes in providing innovative marketing technology solutions to businesses. With a focus on reliability, performance, and scalability, the company serves millions of users and supports thousands of marketing campaigns daily. SproutLoud fosters a remote-first culture, promoting flexibility and collaboration within a small, high-impact team. The company is committed to professional development, offering growth opportunities and a modern tech stack to its employees.

Share This Job!

Save This Job!