Remote Otter LogoRemoteOtter

Senior Site Reliability Engineer - Remote

Posted 2 days ago

Overview

Dremio is the unified lakehouse platform for self-service analytics and AI, serving hundreds of global enterprises, including Maersk, Amazon, Regeneron, NetApp, and S&P Global. Customers rely on Dremio for cloud, hybrid, and on-prem lakehouses to power their data mesh, data warehouse migration, data virtualization, and unified data access use cases. Based on open source technologies, including Apache Iceberg and Apache Arrow, Dremio provides an open lakehouse architecture enabling the fastest time to insight and platform flexibility at a fraction of the cost.

In Short

  • Drive continuous improvements to our usage of Kubernetes, our Operators, and the GitOps deployment paradigm.
  • Extend our networking, service mesh and Kubernetes systems to support connectivity between GCP, AWS and Azure.
  • Collaborate with Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning, production readiness and service reviews.
  • Help define and instrument Service Level indicators and objectives (SLIs/SLOs) with service owners in the Engineering teams.
  • Collaborate within our virtual Observability team: develop and improve observability of the Dremio Cloud product.
  • Ability to debug and optimize code written by others and automate routine tasks.
  • Evangelize and advocate for resilience engineering and reliability practices across our organization.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Join an on-call rotation for systems and services that the SRE team owns.
  • Practice sustainable incident response and post-incident investigation analysis.

Requirements

  • 10+ years of relevant experience in SRE, DevOps, Distributed Systems, Cloud Operations, Software Engineering.
  • Expertise in Kubernetes, Istio, Terraform, Terragrunt, ArgoCD/Flux.
  • Expertise with software defined networking infrastructure.
  • Excellent command of cloud services on GCP/AWS/Azure, CI/CD pipelines.
  • Moderate-advanced experience in Python/Go, and at least reading knowledge of Java.
  • Systematic problem-solving approach with strong communication skills.
  • Ability to debug and optimize code and automate routine tasks.
  • Solid background in software development and architecting resilient applications.

Benefits

  • Workplace Wednesdays to improve cross-team communication.
  • Hybrid work environment.
  • Lunch catering and meal credits provided in the office.
  • Local socials align to Workplace Wednesdays.

Similar Jobs:

Airalo logo

Senior Site Reliability Engineer - Remote

Airalo

4 days ago

Join Airalo as a Senior Site Reliability Engineer to develop and maintain reliable systems in a remote-first environment.

Site Reliability Engineering
AWS
Kubernetes
Terraform
Worldwide
Full-time
DevOps / Sysadmin
Joinpaxos logo

Senior Site Reliability Engineer - Remote

Joinpaxos

5 days ago

Join Paxos as a Senior Site Reliability Engineer to enhance cloud infrastructure reliability and performance.

AWS
RDS
PostgreSQL
Aurora
USA
Full-time
DevOps / Sysadmin
$157,254 - $185,005 USD/year

P.W

Senior Site Reliability Engineer - Remote

Point Wild

7 days ago

Join Point Wild as a Senior Site Reliability Engineer to maintain and enhance the reliability and performance of our systems.

Site Reliability Engineering
DevOps
AWS
Azure
Worldwide
Full-time
DevOps / Sysadmin

M.M

Senior Site Reliability Engineer - Remote

Modernizing Medicine

1 week ago

Join Modernizing Medicine as a Senior Site Reliability Engineer to enhance cloud infrastructure and mentor junior engineers.

AWS
DataDog
Kubernetes
Jenkins
India
Full-time
DevOps / Sysadmin

M.M

Senior Site Reliability Engineer - Remote

Modernizing Medicine

1 week ago

Join ModMed as a Senior Site Reliability Engineer to enhance cloud infrastructure and empower developers.

AWS
Cloud Infrastructure
Site Reliability Engineering
DataDog
USA
Full-time
DevOps / Sysadmin