Remote Otter LogoRemoteOtter

Staff DevOps Engineer | Research Infrastructure Operations - Remote

Posted 4 weeks ago
DevOps / Sysadmin
Full Time
Worldwide

Overview

As a Staff DevOps Engineer in Research Infrastructure Operations, you will play a critical role in architecting, designing, and operating our High-Performance Computing (HPC) infrastructure, contributing significantly to our AI development.

In Short

  • Design, plan, setup, administer, maintain and troubleshoot GPU infrastructure.
  • Benchmark and optimize performance of GPU infrastructure systems.
  • Collaborate with researchers and developers to fine-tune applications for HPC environments.
  • Work on various projects to maintain an optimized state across all aspects of the infrastructure.
  • Support the team with unexpected issues and coordinate escalations.
  • Automate processes using advanced toolchain.
  • Develop and implement custom monitoring checks for technical issues.
  • Engage with hardware vendors in a high-performance environment.

Requirements

  • Extensive experience with GPU compute clusters management.
  • Proficiency in containerization and orchestration (Docker, K8s).
  • Fluency in at least one programming language, preferably Go.
  • Expertise in patch and OS management at scale.
  • Experienced in Linux performance benchmarking and troubleshooting.
  • Familiarity with distributed storage solutions like Lustre and Ceph.
  • Knowledge in networking technologies and protocols.
  • Proactive and solution-oriented mindset.
  • Excellent problem-solving skills.
  • Initiative-driven and ownership-oriented.

Benefits

  • Diverse and internationally distributed team.
  • Open communication and regular feedback.
  • Hybrid work with flexible hours.
  • Regular in-person team events.
  • Monthly full-day hacking sessions.
  • 30 days of annual leave.
  • Competitive benefits tailored to your location.
DeepL logo

DeepL

DeepL is a global communications platform that leverages Language AI to facilitate seamless communication across languages. Founded in 2017, the company aims to eliminate language barriers through its human-like translations and intelligent writing suggestions, catering to over 100,000 businesses and millions of individuals worldwide. DeepL is committed to becoming the leader in Language AI, focusing on innovation and employee well-being within a diverse and inclusive work culture. The company values open communication, flexibility, and collaboration, offering a hybrid work environment and competitive benefits to support its growing international team.

Share This Job!

Save This Job!

Similar Jobs:

Pryon logo

Staff Engineer, Infrastructure (DevOps Architect) - Remote

Pryon

9 weeks ago

Join Pryon as a Staff Engineer, Infrastructure (DevOps Architect) to lead the design and implementation of cloud-native architectures for AI/ML applications.

USA
Full-time
DevOps / Sysadmin
Sopra Steria logo

DevOps Engineer - Infrastructure - Remote

Sopra Steria

4 weeks ago

Join Sopra Steria as a DevOps Engineer to deploy and maintain infrastructure while collaborating on innovative projects.

Worldwide
Full-time
DevOps / Sysadmin
Flodesk logo

DevOps/Infrastructure Engineer - Remote

Flodesk

5 weeks ago

Join Flodesk as a DevOps/Infra Engineer to optimize cloud infrastructure and deployment pipelines for a fast-growing email marketing company.

Worldwide
Full-time
DevOps / Sysadmin
Sopra Steria logo

DevOps Engineer - Infrastructure - Remote

Sopra Steria

6 weeks ago

Join Sopra Steria as a DevOps Engineer to work on cloud infrastructure projects with a focus on agile deployment and maintenance.

France
Full-time
DevOps / Sysadmin
Spendesk logo

Staff Engineer (Infrastructure) - Remote

Spendesk

17 weeks ago

We are seeking a Staff Engineer (Infrastructure) to lead and enhance our cloud platform operations in a Fintech environment.

Germany
Full-time
DevOps / Sysadmin