Remote Otter LogoRemoteOtter

Staff Software Engineer, Training - Remote

Posted Yesterday
Software Development
Full Time
Worldwide

Overview

The Staff Software Engineer for Training will focus on optimizing the training stack, enhancing performance and efficiency in distributed systems and ML infrastructure.

In Short

  • Drive down wall-clock time to convergence by profiling and eliminating bottlenecks.
  • Design, build, and optimize distributed training systems using PyTorch.
  • Implement efficient low-level code with CUDA and custom kernels.
  • Optimize workloads for hardware efficiency including CPU/GPU balance.
  • Develop monitoring and debugging tools for large-scale runs.

Requirements

  • Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years).
  • Production-grade expertise in Python.
  • Low-level performance mastery with CUDA/cuDNN/Triton.
  • Experience with PyTorch and training jobs using various parallelism techniques.
  • System-level mindset with a track record of tuning hardware–software interactions.

Benefits

  • Work in a dynamic and innovative environment.
  • Opportunity to contribute to cutting-edge ML technologies.
  • Collaborate with a talented team of engineers and researchers.
  • Flexible work arrangements including remote options.
  • Competitive salary and benefits package.

G.A

Genesis AI

Genesis AI is a forward-thinking technology company focused on developing innovative tools that empower engineers and researchers in the fields of robotics and simulation. The company is dedicated to enhancing productivity through the creation of intuitive user interfaces and powerful backend systems that facilitate the exploration and analysis of complex data. With a commitment to building general-purpose Physical AI, Genesis AI fosters collaboration among its teams to ensure the delivery of high-quality software solutions that streamline research workflows and improve model evaluation processes.

Share This Job!

Save This Job!

Similar Jobs:

Stack AV logo

Staff Software Engineer, ML Training - Remote

Stack AV

27 weeks ago

Join Stack as a Staff Software Engineer focusing on ML Training to optimize and enhance model training processes.

USA
Full-time
Software Development
Reddit logo

Staff Software Engineer, Training Platform - Remote

Reddit

28 weeks ago

Reddit is seeking a Staff Software Engineer to lead the development of machine learning infrastructure.

USA
Full-time
Software Development
$230,000 - $322,000 USD
& Company logo

AI Training Software Engineer - Remote

& Company

18 weeks ago

Join us as a freelance AI Training Software Engineer to utilize your coding skills in training generative AI models.

Thailand
Freelance
Software Development
& Company logo

AI Training Software Engineer - Remote

& Company

28 weeks ago

Join us as a freelance AI Training Software Engineer to help train generative AI models while working remotely.

Thailand
Freelance
Software Development
dbt Labs logo

Staff Software Engineer - Remote

dbt Labs

4 days ago

Join dbt Labs as a Staff Software Engineer to enhance the performance and reliability of their open-source analytics engineering platform.

India
Full-time
Software Development