Remote Otter LogoRemoteOtter

Platform Reliability Engineer - Remote

Posted 16 hours ago
DevOps / Sysadmin
Full Time
Worldwide

Overview

The Platform Reliability Engineer will design and maintain a Kubernetes-based platform that hosts the autonomous AI agent, ensuring high reliability and zero manual intervention in marketing execution.

In Short

  • Design and maintain Kubernetes-based platforms for AI execution.
  • Automate infrastructure using Terraform and ArgoCD.
  • Optimize ML inference pipelines for real-time decision-making.
  • Establish SLOs for AI execution success rates.
  • Implement observability tools like OpenTelemetry and Grafana.
  • Design safety controls for autonomous agent execution.
  • Govern cloud spend and resource allocation.
  • Collaborate with cross-functional teams to enhance platform reliability.
  • Ensure sub-second latency for agent task execution.
  • Build self-healing systems to preemptively resolve failures.

Requirements

  • 6+ years in Platform Engineering, SRE, or Infrastructure roles.
  • Mastery of Terraform, ArgoCD, and GitOps workflows.
  • Expert-level Kubernetes networking, scaling, and security.
  • Hands-on experience with MLOps pipelines.
  • Proficiency in Python for automation tools.
  • Deep expertise in distributed tracing and monitoring.
  • Experience with high-frequency data pipelines.

Benefits

  • Work on cutting-edge AI technology.
  • Opportunity to shape the future of autonomous marketing.
  • Collaborative and innovative work environment.
  • Flexible work hours.
  • Competitive salary and benefits package.
Search Atlas logo

Search Atlas

Search Atlas Group is a rapidly growing SaaS organization focused on empowering businesses globally through advanced digital marketing tools and SEO solutions. With a commitment to operational excellence and strategic growth, the company fosters a collaborative and innovative culture that encourages team members to take on ambitious projects while supporting one another. Recognized for its outstanding workplace environment and rapid growth, Search Atlas values continuous improvement, excellence, and a proactive mindset among its diverse team of professionals.

Share This Job!

Save This Job!

Similar Jobs:

Mateo logo

Platform & Reliability Engineer - Remote

Mateo

29 weeks ago

We are seeking a Platform & Reliability Engineer to enhance system reliability and developer experience in our growing infrastructure.

Berlin, Germany
Full-time
DevOps / Sysadmin
Mateo logo

Platform & Reliability Engineer - Remote

Mateo

37 weeks ago

Join our team as a Platform & Reliability Engineer to ensure system reliability and enhance developer experience.

Berlin, Germany
Full-time
DevOps / Sysadmin
Flinks logo

Platform & Reliability Engineer - Remote

Flinks

39 weeks ago

Join Flinks as a Platform & Reliability Engineer to enhance financial data services through innovative tools and collaboration.

Canada
Full-time
DevOps / Sysadmin

Groupon

Platform Reliability Engineer - Remote

Groupon

54 weeks ago

Join Groupon as a Platform Reliability Engineer to support and optimize global data pipelines.

Brazil
Full-time
DevOps / Sysadmin
BHFT logo

Platform Reliability Engineer - Remote

BHFT

69 weeks ago

Join a global algo trading company as a Platform Reliability Engineer, ensuring platform reliability and efficiency.

Worldwide
Full-time
DevOps / Sysadmin