The Member of Technical Staff will contribute to model training pipelines, ship state-of-the-art models to production, and bridge the gap between research and production in a remote-friendly environment.
In Short
Design and write high-performant and scalable software for training.
Improve training setup from an infrastructure and codebase performance standpoint.
Craft and implement tools to speed up training cycles.
Research and experiment with ideas on supercompute and data infrastructure.
Work with top researchers in the field.
Strong software engineering skills required.
Proficiency in Python and ML frameworks like JAX and Pytorch.
Experience with distributed training infrastructures.
Hands-on experience with large model training.
Encouragement to apply even if not all qualifications are met.
Requirements
Extremely strong software engineering skills.
Proficiency in Python and related ML frameworks.
Experience with distributed training infrastructures.
Experience using large-scale distributed training strategies.
Hands-on experience on training large models at scale.