Remote Otter LogoRemoteOtter

Senior Software Engineer, Distributed Systems Engineer - DGX Cloud - Remote

Posted 2 days ago
Software Development
Full Time
USA
152000 USD - 241500 USD/year

Overview

NVIDIA is hiring experienced software engineers to help scale up its AI Infrastructure. You will be part of a team responsible for production systems that enable large scalable GPU clusters to be used for a variety of AI workloads.

In Short

  • Design and develop a massively distributed scalable platform.
  • Identify, diagnose, and remediate non-performant GPU assets.
  • Ensure production AI clusters run reliably and consistently.
  • Evaluate system failures and improve services based on incident management.
  • Collaborate with multi-functional teams across NVIDIA.
  • 5+ years in a software engineering role with demonstrable impact.
  • Experience with software engineering principles, tools, and techniques.
  • Technical knowledge in Go or Python, and understanding of data structures.
  • Advanced experience with cluster management systems.
  • Proven operational excellence in maintaining reliable infrastructure.

Requirements

  • BS in Computer Science, Engineering, Physics, Mathematics or equivalent experience.
  • Strong communication skills and ability to work with multi-functional teams.
  • Experience with large-scale production systems.
  • Technical competency in managing large-scale distributed systems.
  • Experience in asynchronous workflows and/or event-driven architecture.
  • Creative and autonomous mindset.

Benefits

  • Base salary determined by location, experience, and pay of similar positions.
  • Eligible for equity and benefits.
  • Work in a diverse and equal opportunity environment.
  • Join a team considered one of the technology world’s most desirable employers.
  • Opportunity to work with forward-thinking and hardworking individuals.

N.U

NVIDIA USA

VN01 NVIDIA Vietnam Company Limited is a subsidiary of NVIDIA, a global leader in accelerated computing. The company focuses on pioneering technologies in AI and digital twins, transforming major industries and making a significant impact on society. With a commitment to innovation, NVIDIA Vietnam plays a crucial role in the manufacturing and engineering processes, ensuring high standards of manufacturability and production capabilities in a fast-paced environment. The team collaborates closely with global contract manufacturers and engineering teams to enhance production efficiency and drive continuous improvement.

Share This Job!

Save This Job!

Similar Jobs:

Decentriq logo

Senior Software Engineer - Distributed Systems - Remote

Decentriq

53 weeks ago

Join Decentriq as a Senior Software Engineer to design and operate data pipelines while advancing machine learning models in a fully remote role.

Worldwide
Full-time
Software Development
Lambda logo

Senior Software Engineer - Distributed Systems - Remote

Lambda

62 weeks ago

Join Lambda as a Senior Software Engineer to build and architect distributed systems for AI products.

CA, USA
Full-time
Software Development
$200,000 - $440,000/year
eToro logo

Senior Software Engineer - Distributed Systems - Remote

eToro

62 weeks ago

Join eToro as a Senior Software Engineer to lead the design and implementation of high-scale distributed systems.

Israel
Full-time
Software Development

MongoDB

Senior Software Engineer - Distributed Systems - Remote

MongoDB

62 weeks ago

Join MongoDB as a Senior Software Engineer to enhance the operational resilience of distributed systems.

USA
Full-time
Software Development
$137,000 - $270,000 USD/year

S.L

Senior Software Engineer - Distributed Systems - Remote

Sumo Logic

68 weeks ago

Join Sumo Logic as a Senior Software Engineer to design and develop distributed data processing capabilities in a fully remote role.

USA
Full-time
Software Development
$155000 - $180000/year