Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 48 weeks ago
DevOps / Sysadmin
Full Time
CA, USA

Overview

As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient.

In Short

  • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
  • Establish standards and best practices for reliability and performance across the infrastructure.
  • Automate processes when relevant, particularly for managing CI/CD pipelines.
  • Own products and projects end-to-end, functioning as both an engineer and a project manager.
  • Collaborate with cross-functional teams to understand project requirements.
  • Mentor junior team members and contribute to knowledge sharing.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools.
  • Demonstrate pride, ownership, and accountability for your work.

Requirements

  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • 3+ years of professional work experience in a fast-paced, high-growth environment.
  • Extensive experience with Kubernetes.
  • Experience in building and maintaining scalable infrastructure.
  • Experience with infrastructure-as-code tools and CI/CD tooling.
  • Relevant OSS observability experience is a plus.
  • Ability to own projects end-to-end, from project specification to execution.
  • No prior machine learning experience required, but should be open to learning about it.

Benefits

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums).
  • A unique opportunity to be part of a rapidly growing startup.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Baseten logo

Baseten

Baseten is a rapidly growing startup focused on creating a positive employee experience through innovative HR practices. As the first dedicated HR hire, the People Operations Specialist will play a crucial role in shaping the company's HR systems, managing employee benefits, overseeing onboarding and offboarding processes, and ensuring compliance with international hiring regulations. The company prides itself on its inclusive and supportive work culture, offering competitive compensation, unlimited PTO, and opportunities for professional growth and networking within the machine learning startup community.

Share This Job!

Save This Job!

Similar Jobs:

Software Mind logo

Site Reliability Engineer - Remote

Software Mind

13 weeks ago

Software Mind is looking for a Site Reliability Engineer to enhance the reliability of their software systems in a flexible and supportive work environment.

LATAM
Full-time
DevOps / Sysadmin
Jackbox Games logo

Site Reliability Engineer - Remote

Jackbox Games

14 weeks ago

Join Jackbox Games as a Site Reliability Engineer to maintain AWS infrastructure and develop applications in Go.

USA
Full-time
DevOps / Sysadmin
$103,326 - $190,465/year
Pinterest logo

Site Reliability Engineer - Remote

Pinterest

14 weeks ago

Pinterest is seeking a Site Reliability Engineer to ensure the reliability of its large-scale distributed systems.

USA
Full-time
Software Development
Printify logo

Site Reliability Engineer - Remote

Printify

14 weeks ago

Join our team as a Site Reliability Engineer, responsible for ensuring the reliability of our distributed systems and platforms in a dynamic international environment.

Worldwide
Full-time
DevOps / Sysadmin
Zepz logo

Site Reliability Engineer - Remote

Zepz

14 weeks ago

Join Zepz as a Site Reliability Engineer to enhance service stability and resilience through innovative automation and observability practices.

South Africa
Full-time
DevOps / Sysadmin