Remote Otter LogoRemoteOtter

Member of Technical Staff - AI Infrastructure - Remote

Posted 7 weeks ago
Software Development
Full Time
USA

Overview

As a Member of Technical Staff at Fluidstack, you will design, develop, and maintain software solutions that power our AI infrastructure and enable our customers to run complex ML workloads efficiently at scale.

In Short

  • Developing and optimizing job scheduling systems to maximize GPU utilization and throughput for ML workloads
  • Building and improving software interfaces for cluster management that support PyTorch, JAX, and other ML frameworks
  • Creating monitoring and observability tools for tracking training progress, resource usage, and system performance
  • Implementing data pipeline optimizations to accelerate training and inference workflows
  • Designing and developing APIs and services to integrate with MLflow, Kubeflow, Weights & Biases, and other ML tooling
  • Writing libraries and utilities to simplify the deployment and management of distributed training jobs

Requirements

  • You have developed software for training or serving large-scale ML models (1000+ GPU scale)
  • You have optimized distributed training performance across multiple nodes and accelerators
  • You have implemented APIs and interfaces for ML platforms that prioritize developer experience
  • You have experience with orchestration systems like Kubernetes or SLURM in the context of large scale ML workloads
  • You have built or contributed to ML infrastructure tools (e.g., Ray, Horovod, DeepSpeed), and have experience with ML experiment tracking and workflow systems (MLflow, Kubeflow, W&B)

Benefits

  • Competitive total compensation package (cash + equity).
  • Retirement or pension plan, in line with local norms.
  • Health, dental, and vision insurance.
  • Generous PTO policy, in line with local norms.
  • Fluidstack is remote first, but has offices in key locations. For all other locations, we provide access to WeWork.

FluidStack

FluidStack

FluidStack is an innovative AI cloud company that collaborates with leading AI firms globally, including notable names like Poolside, Meta, Modal, and Reka. The company specializes in providing high-performance computing (HPC) as a service, ensuring that its GPU infrastructure operates at peak performance while offering exceptional support to its customers. FluidStack is committed to scaling its operations through automation and efficient deployment of new clusters, making it a key player in the AI cloud industry.

Share This Job!

Save This Job!

Similar Jobs:

anchorage logo

Member of Technical Staff, Infrastructure - Remote

anchorage

13 weeks ago

Join Anchorage Digital as a Member of Technical Staff to work on cloud infrastructure and enhance developer productivity.

USA
Full-time
Software Development

I.G

Member of Technical Staff - SRE / Infrastructure - Remote

IntelliPro Group

12 weeks ago

Seeking an experienced Member of Technical Staff to manage AWS cloud infrastructure and ensure system stability and security.

Worldwide
Full-time
DevOps / Sysadmin
$200,000 - $280,000/year
anchorage logo

Member of Technical Staff - Remote

anchorage

24 weeks ago

Join Anchorage Digital as a Member of Technical Staff to work on cloud infrastructure and build systems for a leading digital asset platform.

Worldwide
Full-time
Software Development
Moonvalley AI logo

Member of Technical Staff - Remote

Moonvalley AI

31 weeks ago

Join Moonvalley as a Member of Technical Staff to work on cutting-edge AI technology in a fully remote role.

UK
Full-time
Software Development
anchorage logo

Member of Technical Staff - Remote

anchorage

58 weeks ago

Join Anchorage Digital as a Member of Technical Staff to support and integrate new crypto assets into a leading digital asset platform.

USA
Full-time
Software Development