Remote Otter LogoRemoteOtter

Director of Reliability Engineering - Remote

Posted 3 weeks ago
DevOps / Sysadmin
Full Time
USA
$260,000 - $290,000/year

Overview

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is the company behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. Astro accelerates building reliable data products that unlock insights, unleash AI value, and powers data-driven applications. Trusted by more than 700 of the world's leading enterprises, Astronomer lets businesses do more with their data.

In Short

  • Define and lead the strategic direction for SRE, reliability, and operational excellence across the organization.
  • Collaborate with Software Engineers and Product Managers on projects that impact users and be directly responsible for service uptime.
  • Own end-to-end availability and performance of key services; build automation to prevent recurrence of issues and automate responses to all non-exceptional service conditions.
  • Design, write, and deliver software to improve the availability, scalability, latency, and efficiency of services.
  • Champion observability, automation, and self-healing systems to proactively prevent downtime and reduce manual toil.
  • Evolve and manage our incident and change management processes, including root cause analysis and postmortems.
  • Drive adoption of SLOs, SLIs, and error budgets to align engineering efforts with business priorities.
  • Work with operational support to manage global on-call rotations using a follow-the-sun model to ensure around-the-clock coverage.
  • Support on-call culture by defining best practices for incident response, escalation policies, and operational readiness.
  • Partner closely with engineering, product, security, and program management teams to improve reliability without slowing innovation.
  • Cultivate a culture of continuous improvement, high accountability, and blameless incident management.
  • Lead and mentor the team, establishing credibility through high-quality technical execution.
  • Provide strong mentorship and leadership to grow the next generation of reliability and engineering leaders.

Requirements

  • 10+ years of experience in software engineering, SRE, or DevOps roles.
  • 5+ years in a technical leadership capacity, ideally in a high-growth, cloud-native SaaS environment.
  • Proven success operating and scaling large-scale, distributed, mission-critical systems.
  • Deep expertise in public cloud platforms (AWS, Azure, or GCP).
  • Hands-on knowledge of infrastructure as code (Terraform, CloudFormation), container orchestration (Kubernetes), and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
  • Experience implementing and managing CI/CD pipelines and secure development practices.
  • Demonstrated ability to hire, grow, and lead globally distributed SRE teams.
  • Strong decision-making, communication, and cross-functional collaboration skills.

Benefits

  • The estimated salary for this role ranges from $260,000 - $290,000, along with an equity component.
  • This range is merely an estimate, and the width of the range reflects willingness to consider candidates with broad prior seniority.
  • Actual compensation may deviate from this range based on skills, experience, and qualifications.
  • Astronomer is a remote-first company.
  • We value diversity and are an equal opportunity employer.
Astronomer logo

Astronomer

Astronomer is a rapidly growing, globally-distributed company that specializes in data orchestration and observability through its innovative platform, Astro, powered by Airflow. The company is dedicated to empowering data teams to create mission-critical analytics, AI, and software solutions. With a focus on diverse experiences and unconventional career paths, Astronomer fosters a collaborative environment for learners and innovators. The team is committed to delivering reliable data products that unlock insights and drive data-driven applications, making them a key player in the data industry.

Share This Job!

Save This Job!

Similar Jobs:

IntegriChain logo

Director, Site Reliability Engineer - Remote

IntegriChain

4 weeks ago

IntegriChain is seeking a Director of Site Reliability Engineering to lead and enhance their cloud-based data processing infrastructure.

PA, USA
Full-time
DevOps / Sysadmin
Brook logo

Director of Engineering - Remote

Brook

1 week ago

Brook Health is looking for a Director of Engineering to lead a multidisciplinary engineering team in a hybrid role based in the Seattle metro area.

WA, USA
Full-time
Software Development
1Password logo

Director of Engineering - Remote

1Password

2 weeks ago

Join 1Password as a Director of Engineering, leading teams to enhance digital security solutions.

USA
Full-time
Software Development
255000 - 345000 USD/year
Fivetran logo

Director of Engineering - Remote

Fivetran

2 weeks ago

Fivetran is seeking a Director of Engineering to lead the Connectors Engineering Group and oversee software architecture and team collaboration.

Bengaluru, Karnataka, India
Full-time
Software Development
LifeRaft logo

Director of Engineering - Remote

LifeRaft

3 weeks ago

Liferaft is seeking a Director of Engineering to lead a team in developing scalable solutions for their threat intelligence platform.

Canada
Full-time
Software Development