Remote Otter LogoRemoteOtter

Principal Site Reliability Engineer - Remote

Posted 9 weeks ago
DevOps / Sysadmin
Full Time
Colombia

Overview

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.

Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.

In Short

  • Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher.
  • Drive automation in infrastructure management and deployment using Terraform, Ansible, Kubernetes, and similar tools.
  • Create and optimize CI/CD pipelines to ensure reliable, secure, and efficient software delivery.
  • Build and enhance comprehensive observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and the ELK stack.
  • Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs.
  • Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis for critical issues.
  • Design and execute performance testing, capacity planning, and scalability strategies for evolving workloads.
  • Proactively identify and resolve bottlenecks, increasing system performance and developer efficiency.
  • Mentor junior engineers, fostering a collaborative and growth-oriented team environment.
  • Guide architectural decisions that drive innovation and enhance system reliability.

Requirements

  • 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles.
  • Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker).
  • Proficiency in programming and scripting languages like Python, Go, and Bash.
  • Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
  • Deep understanding of networking, DNS, load balancing, and security principles.
  • Proven track record of managing high-availability systems in demanding environments.
  • Exceptional analytical and problem-solving skills.

Benefits

  • The opportunity to work with cutting-edge technologies in a transformative environment.
  • A collaborative and innovative work culture that values your expertise and contributions.
  • Professional growth and leadership development pathways tailored to your aspirations.
  • A chance to leave a lasting impact by shaping the future of reliable and scalable systems.

Groupon

Groupon

Groupon is a leading marketplace that connects customers with local businesses, offering a platform for discovering new experiences and services. With over a million merchant partners worldwide and more than 16 million customers, Groupon is dedicated to helping local businesses thrive in a competitive e-commerce landscape. The company fosters a culture of innovation and autonomy, allowing employees to make significant impacts while benefiting from the resources and scale of a large organization. Groupon is committed to transforming its business and enhancing customer experiences through a focus on performance and operational excellence.

Share This Job!

Save This Job!

Similar Jobs:

Groupon

Principal Site Reliability Engineer - Remote

Groupon

9 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

9 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

9 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

9 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Worldwide
Full-time
DevOps / Sysadmin

Groupon

Principal Site Reliability Engineer - Remote

Groupon

9 weeks ago

Join Groupon as a Principal Site Reliability Engineer to enhance the reliability and scalability of mission-critical systems.

Czech Republic
Full-time
DevOps / Sysadmin