Groupon is a marketplace where customers discover new experiences and services every day and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.

Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We’re a "best of both worlds" kind of company. We’re big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.

In Short

Architect and maintain self-healing systems with 99.9%+ availability targets.
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns.
Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
Build AIOps-based observability and auto-remediation pipelines.
Apply predictive modeling to forecast failures before they impact users.
Lead chaos, performance, and resilience testing programs.
Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance.
Mentor engineers and drive reliability standards across teams.
Partner with platform, data, and product teams to ensure stability aligns with business goals.
Support major incident response, incident review, and participate in on-call rotations.

Requirements

10+ years in software/systems engineering, including 5+ years in SRE or platform reliability.
Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform.
Proficiency in Python or Go for automation and tooling.
Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy).
Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations.
Strong communication and influencing skills — data over hierarchy.

Benefits

The opportunity to work with cutting-edge technologies in a transformative environment.
Professional growth and leadership development pathways tailored to your aspirations.
A chance to leave a lasting impact by shaping the future of reliable and scalable systems.

Groupon

Groupon is a leading marketplace that connects customers with local businesses, offering a platform for discovering new experiences and services. With over a million merchant partners worldwide and more than 16 million customers, Groupon is dedicated to helping local businesses thrive in a competitive e-commerce landscape. The company fosters a culture of innovation and autonomy, allowing employees to make significant impacts while benefiting from the resources and scale of a large organization. Groupon is committed to transforming its business and enhancing customer experiences through a focus on performance and operational excellence.

Share This Job!

Save This Job!

Jobs from Groupon:

Business Operations Associate

Business Operations

Analytics

Excel

Senior Machine Learning Engineer

Machine Learning

Python

Business Development Representative

Sales

B2B

Negotiation

Graphic Designer (Freelance, 15–20 hrs/week)

Graphic Design

Adobe Creative Suite

Photoshop

Freelance Business Development Representative - Dutch

Sales

Marketing

Hospitality

Groupon

Share This Job!

Save This Job!

Jobs from Groupon:

Business Operations Associate

Business Operations

Analytics

Excel

Senior Machine Learning Engineer

Machine Learning

Python

Business Development Representative

Sales

B2B

Negotiation

Graphic Designer (Freelance, 15–20 hrs/week)

Graphic Design

Adobe Creative Suite

Photoshop

Freelance Business Development Representative - Dutch

Sales

Marketing

Hospitality

Similar Jobs:

Principal Site Reliability Engineer - Remote

Expel

38 weeks ago

Expel

Site Reliability Engineering

Kubernetes

GCP

AWS

Join Expel as a Principal Site Reliability Engineer to lead initiatives ensuring service reliability and mentor junior engineers.

Site Reliability Engineering

Kubernetes

GCP

AWS

USA

Full-time

DevOps / Sysadmin

$167,300 - $242,600/year

38 weeks ago

Principal Site Reliability Engineer - Remote

Jobgether

39 weeks ago

Jobgether

Cloud Technologies

Edge Technologies

Infrastructure Automation

Performance Optimization

Seeking a Principal Site Reliability Engineer to architect and maintain hybrid infrastructures in a collaborative environment.

Cloud Technologies

Edge Technologies

Infrastructure Automation

Performance Optimization

USA

Full-time

DevOps / Sysadmin

39 weeks ago

Principal Site Reliability Engineer - Remote

Jobgether

48 weeks ago

Jobgether

Site Reliability Engineering

Automation

Observability

Software Development

We are looking for a Principal Site Reliability Engineer to enhance the reliability and efficiency of large-scale distributed systems in a hybrid remote setup.

Site Reliability Engineering

Automation

Observability

Software Development

USA

Full-time

DevOps / Sysadmin

48 weeks ago