Remote Otter LogoRemoteOtter

Site Reliability Engineer - Remote

Posted 18 weeks ago
DevOps / Sysadmin
Full Time
Worldwide
$152,000 - $175,000/year

Overview

RunPod is pioneering the future of AI and machine learning, offering cutting-edge cloud infrastructure for full-stack AI applications. Founded in 2022, we are a rapidly growing, well-funded company with a remote-first organization spread globally. Our mission is to empower innovators and enterprises to unlock AI's true potential, driving technology and transforming industries. Join us as we shape the future of AI.

In Short

  • Seeking a full-time, remote Site Reliability Engineer.
  • Design, implement, and maintain robust, scalable systems.
  • Work with cutting-edge GPU/AI technologies.
  • Manage large-scale, distributed systems across multiple data centers.
  • Focus on automation, reliability, and performance.
  • Collaborate with cross-functional teams.
  • Participate in on-call rotations for 24/7 support.
  • Competitive compensation including stock options.
  • Flexible remote work culture.
  • Opportunity to contribute to a global impact.

Requirements

  • Deep knowledge of Linux kernel internals and networking.
  • Extensive experience with distributed system troubleshooting.
  • Proficiency in Python or Golang.
  • Experience with SLIs and SLOs.
  • Familiarity with configuration management tools like Chef or Puppet.
  • Ability to manage large-scale bare-metal fleets.
  • Strong background in secure best practices.
  • Understanding of OSI model Layers 3, 4, and 7.
  • Successful completion of a background check.

Benefits

  • Base pay ranges from $152,000 - $175,000.
  • Stock options.
  • Flexible remote work environment.
  • Generous vacation policy.
  • Opportunity for growth in an innovative company.
  • Inclusive and collaborative team culture.

RunPod

RunPod

RunPod is a pioneering platform that empowers developers to build, run, and scale AI models efficiently. With the ability to deploy AI models to 37 global data centers in just 78 seconds, RunPod has become the go-to choice for over 100,000 developers looking to enhance their applications with AI capabilities. The company is focused on creating a robust PaaS ecosystem that bridges frontend applications and cloud systems, ensuring seamless interaction and scalability. RunPod is committed to innovation, user-centric design, and fostering a diverse and inclusive workplace.

Share This Job!

Save This Job!

Similar Jobs:

Software Mind logo

Site Reliability Engineer - Remote

Software Mind

2 weeks ago

Software Mind is looking for a Site Reliability Engineer to enhance the reliability of their software systems in a flexible and supportive work environment.

LATAM
Full-time
DevOps / Sysadmin
Jackbox Games logo

Site Reliability Engineer - Remote

Jackbox Games

3 weeks ago

Join Jackbox Games as a Site Reliability Engineer to maintain AWS infrastructure and develop applications in Go.

USA
Full-time
DevOps / Sysadmin
$103,326 - $190,465/year
Pinterest logo

Site Reliability Engineer - Remote

Pinterest

3 weeks ago

Pinterest is seeking a Site Reliability Engineer to ensure the reliability of its large-scale distributed systems.

USA
Full-time
Software Development
Printify logo

Site Reliability Engineer - Remote

Printify

3 weeks ago

Join our team as a Site Reliability Engineer, responsible for ensuring the reliability of our distributed systems and platforms in a dynamic international environment.

Worldwide
Full-time
DevOps / Sysadmin
Zepz logo

Site Reliability Engineer - Remote

Zepz

3 weeks ago

Join Zepz as a Site Reliability Engineer to enhance service stability and resilience through innovative automation and observability practices.

South Africa
Full-time
DevOps / Sysadmin