Remote Otter LogoRemoteOtter

Senior Site Reliability Engineer, Model Serving Infrastructure - Remote

Posted 2 days ago
DevOps / Sysadmin
Full Time
USA

Overview

The Senior Site Reliability Engineer will be responsible for developing, deploying, and operating the AI platform delivering Cohere's large language models through easy to use API endpoints. This role involves working with multiple teams to deploy optimized NLP models in low latency, high throughput, and high availability environments.

In Short

  • Join a team focused on building high-performance, scalable AI systems.
  • Work on deploying large language models via API endpoints.
  • Collaborate with various teams to ensure smooth operations.
  • Engage with customers for customized deployments.
  • Focus on low latency and high availability for NLP applications.
  • Contribute to the development of cutting-edge AI technologies.
  • Be part of a diverse and inclusive work environment.
  • Enjoy a range of employee benefits including health and wellness.
  • Work remotely with flexible office options.
  • Participate in a culture that values diversity and inclusion.

Requirements

  • 5+ years of engineering experience in production infrastructure.
  • Experience with Kubernetes and GPU workloads.
  • Familiarity with cloud platforms like GCP, Azure, and AWS.
  • Strong skills in Linux-based computing environments.
  • Experience in resource and cost management.
  • Excellent collaboration and troubleshooting skills.
  • Ability to adapt and solve complex technical challenges.
  • Knowledge of computational characteristics of accelerators.
  • Strong understanding of distributed systems.
  • Experience in high-performance programming languages like Golang or C++.

Benefits

  • An open and inclusive culture and work environment.
  • Work closely with a team at the cutting edge of AI research.
  • Weekly lunch stipend, in-office lunches & snacks.
  • Full health and dental benefits, including mental health support.
  • 100% Parental Leave top-up for eligible employees.
  • Personal enrichment benefits for various activities.
  • Remote-flexible work environment with office options.
  • 6 weeks of vacation.
Cohere logo

Cohere

Cohere is a pioneering company dedicated to scaling intelligence to serve humanity through the development and deployment of advanced AI models. With a mission to enhance the capabilities of AI systems for developers and enterprises, Cohere focuses on creating transformative experiences in areas such as content generation, semantic search, and AI agents. The company prides itself on its diverse team of top-tier researchers, engineers, and designers who are committed to building high-quality products. Cohere fosters a culture of hard work, rapid innovation, and a strong emphasis on customer value, while also valuing inclusivity and diverse perspectives in the workplace.

Share This Job!

Save This Job!

Similar Jobs:

Life360 logo

Senior II Site Reliability Engineer, Infrastructure - Remote

Life360

18 weeks ago

Join Life360 as a Senior II Site Reliability Engineer to build and maintain scalable infrastructure platforms in a remote-first environment.

Worldwide
Full-time
DevOps / Sysadmin
$147,500 - $173,500 CAD/year
Coinbase logo

Senior Site Reliability Engineer, Core AI Infrastructure - Remote

Coinbase

5 weeks ago

Join Coinbase as a Senior Site Reliability Engineer to enhance AI infrastructure and drive automation in a remote role.

USA
Full-time
Software Development
$186,065 - $218,900 USD/year

HashiCorp

Senior Site Reliability Engineer II - Network, Infrastructure Services - Remote

HashiCorp

16 weeks ago

Join HashiCorp as a Senior Site Reliability Engineer II to lead AWS networking infrastructure design and optimization.

USA
Full-time
DevOps / Sysadmin
$170,000 - $240,000 USD/year
Clerk logo

Infrastructure Engineer / Site Reliability Engineer (SRE) - Remote

Clerk

Yesterday

Clerk is seeking an experienced Infrastructure Engineer / SRE to manage and optimize their technology infrastructure.

USA
Full-time
DevOps / Sysadmin
Flywire logo

Senior Site Reliability Engineer I - Remote

Flywire

14 weeks ago

Join Flywire as a Senior Site Reliability Engineer I to enhance our development ecosystem and ensure compliance with fintech regulations.

Israel
Full-time
DevOps / Sysadmin