Remote Otter LogoRemoteOtter

GPU Cloud Platform Engineer - Remote

Posted 1 week ago
Software Development
Full Time
USA

Overview

We are seeking a GPU Cloud Platform Engineer to join our core infrastructure team and help build the next-generation AI compute cloud.

In Short

  • Design, deploy, and operate large-scale, multi-cluster GPU infrastructure.
  • Ensure high availability, performance, and efficiency of containerized AI workloads.
  • Conduct performance testing and evaluation of multi-node GPU clusters.
  • Deploy and orchestrate large models across multi-cluster environments using Kubernetes.
  • Participate in the design, development, and iteration of GPU cluster scheduling systems.
  • Build a unified multi-cluster management and monitoring system.
  • Coordinate with IDC providers for deploying large-scale GPU clusters.

Requirements

  • Bachelor's degree in Computer Science or related fields; 3+ years of experience in system engineering or DevOps.
  • 5+ years of experience in cloud-native development or AI engineering.
  • Familiarity with the Kubernetes ecosystem and multi-cluster management.
  • Proficient in Docker and containerization technologies.
  • Experience with monitoring tools like Prometheus and Grafana.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
  • Experience with cluster management tools like Ray, Slurm, and Rancher is a plus.
  • Familiarity with distributed file systems and high-performance communication protocols.
  • Strong communication skills and team collaboration.

Benefits

  • Join a visionary team redefining AI infrastructure.
  • Work on cutting-edge technologies bridging AI and decentralized computing.
  • Collaborate with experts from leading institutions and tech companies.
  • Enjoy a flexible, remote work environment that values innovation.
Yotta Labs logo

Yotta Labs

Yotta Labs is at the forefront of developing a Decentralized Operating System (DeOS) designed for AI workload orchestration on a global scale. The company's mission is to democratize access to AI resources by aggregating geo-distributed GPUs, which enables high-performance computing for AI training and inference across a diverse range of hardware, from commodity to high-end GPUs. Yotta Labs supports major large language models (LLMs) and provides customizable solutions for new models, promoting elastic and efficient AI development. The company is committed to redefining AI infrastructure and fostering innovation in the AI and decentralized computing sectors.

Share This Job!

Save This Job!

Similar Jobs:

Acronis logo

Cloud Platform Engineer - Remote

Acronis

3 weeks ago

Join Acronis as a Cloud Platform Engineer to manage and optimize cloud infrastructure while ensuring system stability and incident management.

Worldwide
Full-time
DevOps / Sysadmin
ASCENDING logo

Cloud Platform Engineer - Remote

ASCENDING

12 weeks ago

Join our team as a Cloud Platform Engineer to build and manage scalable cloud infrastructures across AWS, Azure, and GCP.

USA
Full-time
DevOps / Sysadmin
PadSplit logo

Cloud Platform Engineer - Remote

PadSplit

16 weeks ago

Join PadSplit as a Cloud Platform Engineer to manage and optimize our cloud infrastructure for affordable housing solutions.

USA
Full-time
Software Development

P.C

Cloud Platform Engineer - Remote

PayPay Card

18 weeks ago

Join PayPay as a Cloud Platform Engineer to architect and build robust cloud infrastructure for a leading FinTech company.

Worldwide
Full-time
Software Development

PayPay

Cloud Platform Engineer - Remote

PayPay

20 weeks ago

PayPay is seeking a Cloud Platform Engineer to enhance its Cloud Based payment system.

Worldwide
Full-time
DevOps / Sysadmin