Remote Otter LogoRemoteOtter

Senior Site Reliability Engineer - Remote

Posted 2 days ago
DevOps / Sysadmin
Full Time
USA
$130,000 - $140,000/year

Overview

The Discogs Platform team is focused on several objectives: building and supporting performant, cost-effective, reliable infrastructure; developer experience tooling and mentorship; and creating "golden paths" for organization-wide standards and velocity. As a Platform member, the Senior Site Reliability Engineer will contribute to the Platform team’s centralized infrastructure, including maintenance, monitoring, and automation of services ranging from databases to Kubernetes; lead incident response and postmortem efforts; and work closely with other engineering teams to understand their needs and drive improvements to both our technologies and processes.

In Short

  • Maintaining organization cloud presence in AWS
  • Automating and deploying infrastructure configurations using Infrastructure as Code (IAC)
  • Mentoring engineering squads on Platform best practices for Kubernetes, MySQL, Kafka, and other software development lifecycle areas
  • Assist engineering squads with capacity planning, infrastructure budgeting, and production readiness
  • Writing documentation and runbooks that contribute to the engineering organization’s knowledge base
  • Implementing monitoring and alerting systems with Discogs observability tools
  • Working in a containerized, orchestrated environment
  • Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues
  • Contribute to efforts on the reliability and design patterns of our Kafka, Kafka Connect and database implementations

Requirements

  • Experience with AWS and cloud infrastructure
  • Proficiency in Kubernetes and container orchestration
  • Strong knowledge of MySQL and Kafka
  • Experience with Infrastructure as Code tools
  • Ability to mentor and guide engineering teams
  • Strong troubleshooting and incident response skills
  • Excellent documentation skills

Benefits

  • Competitive salary
  • Remote work flexibility
  • Opportunities for professional development
  • Supportive team culture
  • Access to music and record collecting resources
Discogs logo

Discogs

Discogs is the largest crowd-sourced, community-driven database of recorded music information in the world, where millions of users connect to learn about music and buy or sell vinyl records, CDs, and cassettes. As a growing company, Discogs values individual contributions and emphasizes quality, critical thinking, and continuous improvement. The team operates collaboratively across geographical locations, utilizing open-source tools to enhance their work. Discogs is dedicated to serving the music community and is looking for motivated individuals to help realize its mission.

Share This Job!

Save This Job!

Similar Jobs:

Docplanner logo

Senior Site Reliability Engineer - Remote

Docplanner

2 days ago

Join Docplanner as a Senior Site Reliability Engineer to ensure reliable and high-performance software solutions in a remote-friendly environment.

Spain
Full-time
Software Development
DuckDuckGo logo

Senior Site Reliability Engineer - Remote

DuckDuckGo

3 days ago

Join DuckDuckGo as a Senior Site Reliability Engineer to enhance infrastructure and reliability for millions of users.

Worldwide
Full-time
DevOps / Sysadmin
$178,500 USD/year
Thumbtack logo

Senior Site Reliability Engineer - Remote

Thumbtack

1 week ago

Join Thumbtack as a Senior Site Reliability Engineer to enhance the reliability and scalability of our platform.

Worldwide
Full-time
DevOps / Sysadmin
$170,800 - $259,900/year
Airtasker logo

Senior Site Reliability Engineer - Remote

Airtasker

1 week ago

Join Airtasker as a Senior Site Reliability Engineer to manage infrastructure and support product teams.

Worldwide
Full-time
DevOps / Sysadmin
Gorgias logo

Senior Site Reliability Engineer - Remote

Gorgias

2 weeks ago

Join Gorgias as a Senior Site Reliability Engineer to ensure the reliability and performance of high-throughput systems.

Worldwide
Full-time
DevOps / Sysadmin