Remote Otter LogoRemoteOtter

Site Reliability Engineer - AI Platform - Remote

Posted 3 weeks ago

Overview

Algolia was built to help users deliver an intuitive search-as-you-type experience on their websites and mobile apps. We provide a search API used by thousands of customers in more than 100 countries. Billions of search queries are answered every month thanks to the code we push into production every day.

Join the AI Platform: Building Core components to speed up AI delivery

The AI Platform is dedicated to enable AI product delivery by providing other teams with turnkey tools, frameworks, and features so that they can focus on their core business instead of redundant work that falls outside their expertise. The areas covered by the AI Platform are two-fold: allowing teams to quickly design new models (AI development) and generating and serving predictions in production (AI productionization).

We’re looking for problem solvers with an entrepreneurial mindset—people who focus on outcomes and use data to drive decisions. If you're passionate about reliability, scalability, and automation, and want to contribute to a platform that powers AI at scale, we’d love to hear from you!

The team is composed of a variety of roles ranging from Site Reliability Engineer to Machine Learning specialists with a strong focus on Data Engineering, most of whom are fully remote, with different skill sets and backgrounds. Your experience, your knowledge and your perspective will add to this diversity and help the team deliver products that make a difference.

In Short

  • Implement, maintain, and improve the infrastructure that powers the AI Platform
  • Ensure the reliability and performance of Kubernetes-based deployments across cloud providers (GCP, AWS, Azure)
  • Develop and maintain infrastructure as code
  • Optimize CI/CD pipelines and deployment processes
  • Enhance monitoring, observability, and alerting systems
  • Contribute to incident response and post-mortem analysis

Requirements

  • Hands-on experience with Kubernetes and container orchestration in production environments
  • Experience with cloud providers (GCP, AWS, or Azure)
  • Experience with automation and infrastructure as code (e.g., Terraform)
  • Solid knowledge of CI/CD pipelines and deployment automation
  • Familiarity with monitoring and observability tools (e.g., Datadog)
  • A problem-solving mindset and a proactive approach to improving system reliability
  • Excellent spoken and written English skills

Benefits

  • Flexible workplace strategy
  • High-trust environment
  • Autonomy in work location
  • Inclusive and diverse workplace

Similar Jobs:

Algolia

Site Reliability Engineer - AI Platform - Remote

Algolia

3 weeks ago

Join Algolia as a Site Reliability Engineer to enhance the AI Platform and support cloud-based deployments.

Kubernetes
Container Orchestration
Infrastructure AS Code
Terraform
Worldwide
Full-time
DevOps / Sysadmin
Codurance logo

Platform Engineer / Site Reliability Engineer - Remote

Codurance

7 weeks ago

Join Codurance as a Platform Engineer to work on cloud migration and CI/CD projects in a collaborative environment.

DevOps
GitOps
CI/CD
Cloud Migration
Worldwide
Contract
DevOps / Sysadmin
PriceHubble logo

Platform & Site Reliability Engineer - Remote

PriceHubble

12 weeks ago

Join PriceHubble as a Senior Engineer to shape cloud architecture and promote DevOps practices in a dynamic team.

Site Reliability Engineering
Platform Engineering
DevOps
Terraform
Germany
Full-time
DevOps / Sysadmin
Megaport logo

Senior Platform Engineer / Site Reliability Engineer - Remote

Megaport

1 week ago

Join Megaport as a Senior Platform Engineer/Site Reliability Engineer to champion DevOps culture and ensure system security and availability.

DevOps
Site Reliability Engineering
Network AS A Service
Cloud Computing
CA, USA
Full-time
DevOps / Sysadmin
Kentik logo

Sr Site Reliability Engineer, Platform Engineering - Remote

Kentik

10 weeks ago

Kentik is seeking a Senior Engineer for their Platform Engineering group to ensure the efficiency and growth of their network observability infrastructure.

Systems Administration
SRE
Microservices
Containers
USA
Full-time
DevOps / Sysadmin
$186,000 - $251,000/year