Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 35 weeks ago

Overview

As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient.

In Short

  • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
  • Establish standards and best practices for reliability and performance across the infrastructure.
  • Automate processes when relevant, particularly for managing CI/CD pipelines.
  • Own products and projects end-to-end, functioning as both an engineer and a project manager.
  • Collaborate with cross-functional teams to understand project requirements.
  • Mentor junior team members and contribute to knowledge sharing.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools.
  • Demonstrate pride, ownership, and accountability for your work.

Requirements

  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • 3+ years of professional work experience in a fast-paced, high-growth environment.
  • Extensive experience with Kubernetes.
  • Experience in building and maintaining scalable infrastructure.
  • Experience with infrastructure-as-code tools and CI/CD tooling.
  • Relevant OSS observability experience is a plus.
  • Ability to own projects end-to-end, from project specification to execution.
  • No prior machine learning experience required, but should be open to learning about it.

Benefits

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums).
  • A unique opportunity to be part of a rapidly growing startup.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Similar Jobs:

Software Mind logo

Site Reliability Engineer - Remote

Software Mind

2 days ago

Software Mind is looking for a Site Reliability Engineer to enhance the reliability of their software systems in a flexible and supportive work environment.

Site Reliability Engineering
Cloud Native Applications
Azure
AWS
LATAM
Full-time
DevOps / Sysadmin
Jackbox Games logo

Site Reliability Engineer - Remote

Jackbox Games

1 week ago

Join Jackbox Games as a Site Reliability Engineer to maintain AWS infrastructure and develop applications in Go.

Site Reliability Engineering
AWS
GO
ECS
USA
Full-time
DevOps / Sysadmin
$103,326 - $190,465/year
Pinterest logo

Site Reliability Engineer - Remote

Pinterest

1 week ago

Pinterest is seeking a Site Reliability Engineer to ensure the reliability of its large-scale distributed systems.

Site Reliability Engineering
Python
GO
Linux
USA
Full-time
Software Development
Printify logo

Site Reliability Engineer - Remote

Printify

1 week ago

Join our team as a Site Reliability Engineer, responsible for ensuring the reliability of our distributed systems and platforms in a dynamic international environment.

Site Reliability Engineering
System Design
Development
Configuration
Worldwide
Full-time
DevOps / Sysadmin
Zepz logo

Site Reliability Engineer - Remote

Zepz

1 week ago

Join Zepz as a Site Reliability Engineer to enhance service stability and resilience through innovative automation and observability practices.

SRE
DevOps
Automation
Monitoring
South Africa
Full-time
DevOps / Sysadmin