Remote Otter LogoRemoteOtter

Staff Software Engineer, ML Training - Remote

Posted 14 weeks ago
Software Development
Full Time
USA

Overview

Stack is developing revolutionary AI and advanced autonomous systems designed to enhance safety, reliability, and efficiency of modern operations. Stack's autonomous technology incorporates cutting-edge advancements in artificial intelligence, robotics, machine learning, and cloud technologies, empowering us to create innovative solutions that address the needs and challenges of the dynamic trucking transportation industry. With decades of experience creating and deploying real world systems for demanding environments, the Stack team is dedicated to developing an autonomous solution ecosystem tailored to the trucking industry's unique demands.

In Short

  • Setup efficiency monitoring for all our training jobs to identify models that need improvement
  • Work with customer teams to benchmark/profile their jobs and make improvements
  • Create standardized APIs for stack-wide abstractions like training datasets, bulk inference jobs, evaluation metrics
  • Optimize dataloaders / training data formats to ensure high gpu utilization
  • Optimize distributed training configurations (network topologies, sharding strategies, pipelines, etc).

Requirements

  • Experience: 5+ years as a SWE, ideally building infrastructure/customer facing product, experience in AV or robotics is also great.
  • Experience with both ML Platforms and building ML-based applications (bonus point if you have modeling experience).
  • Experience building scalable, reliable infra at a fast-paced environment.
  • Experience building or using ML infra built for a large number of customer teams.
  • A deep understanding of design tradeoffs and ability to articulate those tradeoffs and work with others on getting alignment.
  • Experience with building ML models or ML infra in the domains of autonomous vehicles, perception, and decision making (desirable but not required).
  • Experience with model training, model optimization, or large data processing pipelines.
  • Machine Learning Expertise is preferred but not necessary.
  • Knows how to push the GPU to its limit from Python to CUDA kernel level.
  • Built the inference or training loop for a large model (ideally with LLM flavor).
  • Shipped ML products (NLP, computer vision, recommender systems, etc.) at scale to make business impact.
  • Knows how to build low latency / high throughput batch or stream processing pipelines.
  • Knows how to write (readable) high performance C++.
  • Prior AV experience.

Benefits

  • High customer empathy, able to communicate with customers well
  • Comfortable reading papers / keeping up with SOTA ML literature
Stack AV logo

Stack AV

Stack AV is at the forefront of developing revolutionary AI and advanced autonomous systems aimed at enhancing the safety, reliability, and efficiency of modern operations, particularly within the trucking transportation industry. Leveraging decades of experience, Stack AV integrates cutting-edge advancements in artificial intelligence, robotics, machine learning, and cloud technologies to create innovative solutions tailored to the unique demands of the industry. The company is committed to building an autonomous solution ecosystem that addresses the challenges faced by the dynamic trucking sector.

Share This Job!

Save This Job!

Similar Jobs:

Reddit logo

Staff Software Engineer, Training Platform - Remote

Reddit

15 weeks ago

Reddit is seeking a Staff Software Engineer to lead the development of machine learning infrastructure.

USA
Full-time
Software Development
$230,000 - $322,000 USD
& Company logo

AI Training Software Engineer - Remote

& Company

15 weeks ago

Join us as a freelance AI Training Software Engineer to help train generative AI models while working remotely.

Thailand
Freelance
Software Development
Maven Clinic logo

Staff Software Engineer, AI/ML - Remote

Maven Clinic

22 weeks ago

Maven Clinic is seeking a Staff Software Engineer to lead the development of their AI/ML platform and collaborate with cross-functional teams.

United States
Full-time
Software Development
$195,000 - $300,000/year
Ridgeline logo

Staff Software Engineer - Trading Team - Remote

Ridgeline

19 weeks ago

Join Ridgeline as a Staff Software Engineer to build innovative trading applications using modern technologies.

USA
Full-time
Software Development
$165,000 - $200,000/year
Eventual logo

Software Engineer, Pre-Training/AI - Remote

Eventual

18 weeks ago

Join Eventual as a Software Engineer focused on AI Pretraining, working on cutting-edge AI research and scalable data systems.

CA, USA
Full-time
Software Development