Remote Otter LogoRemoteOtter

Principal Site Reliability Engineer - Remote

Posted 2 weeks ago

Overview

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.

Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.

In Short

  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher.
  • Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools.
  • Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery.
  • Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack.
  • Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs.
  • Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues.
  • Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads.
  • Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency.
  • Mentor junior engineers, fostering a collaborative and growth-oriented team environment.
  • Guide architectural decisions that drive innovation and enhance system reliability.

Requirements

  • 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.
  • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
  • Proficiency in programming and scripting languages like Python, Go, and Bash.
  • Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
  • Deep understanding of networking, DNS, load balancing, and security principles.
  • Proven track record of managing high-availability systems in demanding environments.
  • Exceptional analytical and problem-solving skills.

Benefits

  • The opportunity to work with cutting-edge technologies in a transformative environment.
  • A collaborative and innovative work culture that values your expertise and contributions.
  • Professional growth and leadership development pathways tailored to your aspirations.
  • A chance to leave a lasting impact by shaping the future of reliable and scalable systems.

Similar Jobs:

Groupon

Principal Site Reliability Engineer - Remote

Groupon

2 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Site Reliability Engineering
DevOps
Cloud Platforms
Kubernetes
Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

2 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Site Reliability Engineering
DevOps
Cloud Platforms
Kubernetes
Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

2 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Site Reliability Engineering
DevOps
Cloud Platforms
Kubernetes
Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

2 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Site Reliability Engineering
DevOps
Cloud Platforms
Kubernetes
Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

2 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Site Reliability Engineering
DevOps
Cloud Platforms
Kubernetes
Colombia
Full-time
DevOps / Sysadmin