Remote Otter LogoRemoteOtter

Member of Technical Staff - Cluster Management - Remote

Posted 7 weeks ago
DevOps / Sysadmin
Full Time
Worldwide

Overview

As a Member of Technical Staff on Cluster Management, you will be responsible for the reliability, performance, and scalability of our compute infrastructure, designing and maintaining tools for smooth operations.

In Short

  • Be responsible for the reliability, performance, and scalability of our compute infrastructure.
  • Design, build, and maintain the tools that keep our systems running smoothly.
  • Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems.
  • Collaborate with engineering and research teams to ensure our infrastructure meets their needs.
  • Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs.

Requirements

  • Experience managing and troubleshooting large-scale distributed systems.
  • Strong scripting and automation skills (e.g., Python, Bash).
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana).
  • A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure).
  • Strongly desired: Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems).
  • The ability to build in a fast-paced environment under some uncertainty.

Benefits

  • Collaborate with top-tier engineers, researchers, and operators from renowned organizations.
  • Opportunity to design and manage large-scale clusters with the latest hardware.
  • Be part of a rapidly growing industry poised to transform multiple sectors globally.
  • Thrive in an open and inclusive work environment that values diverse perspectives.
  • We provide visa assistance, including H1B and OPT transfers, for US employees.
Reka logo

Reka

Reka is a globally distributed foundation model startup headquartered in the San Francisco Bay Area, California, dedicated to building useful multimodal artificial intelligence to empower organizations and businesses. With a remote-first approach, Reka brings together top talent from around the world, including contributors to significant AI breakthroughs over the past decade. The company fosters a collaborative, mission-driven environment focused on advancing AI for meaningful applications, while promoting an inclusive culture that values diverse perspectives. Reka offers generous benefits and is positioned in a rapidly growing industry with massive market opportunities.

Share This Job!

Save This Job!

Similar Jobs:

anchorage logo

Member of Technical Staff - Remote

anchorage

24 weeks ago

Join Anchorage Digital as a Member of Technical Staff to work on cloud infrastructure and build systems for a leading digital asset platform.

Worldwide
Full-time
Software Development
Moonvalley AI logo

Member of Technical Staff - Remote

Moonvalley AI

31 weeks ago

Join Moonvalley as a Member of Technical Staff to work on cutting-edge AI technology in a fully remote role.

UK
Full-time
Software Development
anchorage logo

Member of Technical Staff - Remote

anchorage

58 weeks ago

Join Anchorage Digital as a Member of Technical Staff to support and integrate new crypto assets into a leading digital asset platform.

USA
Full-time
Software Development
anchorage logo

Member of Technical Staff - Remote

anchorage

77 weeks ago

Join Anchorage Digital as a Member of Technical Staff to build tools for managing blockchain assets in a collaborative environment.

USA
Full-time
Software Development
anchorage logo

Member of Technical Staff - Remote

anchorage

83 weeks ago

Join Anchorage Digital as a Member of Technical Staff to support and integrate new crypto assets into a leading digital asset platform.

Worldwide
Full-time
Software Development