Remote Otter LogoRemoteOtter

Inference Optimization Engineer - Remote

Posted 10 weeks ago
Software Development
Full Time
CA, USA

Overview

As an Inference Optimization Engineer, you will improve the speed and efficiency of large language models at the GPU kernel level, through the inference engine, and across distributed architectures.

In Short

  • Identify bottlenecks and optimize inference efficiency.
  • Build repeatable tests that model production traffic.
  • Reduce memory use and compute cost with mixed precision.
  • Improve batching, caching, load balancing, and model-parallel execution.
  • Write technical posts and contribute to the open-source community.

Requirements

  • Deep understanding of transformer architecture.
  • Hands-on experience with model serving optimizations.
  • Experience with inference engines like vLLM, SGLang, or TRT-LLM.
  • Proficiency in CUDA and profiling tools.
  • Track record of blog posts or conference talks in ML systems.

Benefits

  • Direct impact on distributed LLM inference.
  • Work remotely from anywhere.
  • Competitive salary and equity.
  • Learning budget and paid conference travel.
BentoML logo

BentoML

BentoML is a prominent provider of inference platforms designed to assist AI teams in efficiently running large language models and generative AI workloads at scale. Backed by investors like DCM, the company serves enterprises globally, ensuring consistent scalability and performance in production environments. BentoML offers a diverse portfolio that includes both open-source and commercial products, with a mission to empower teams to leverage AI for building competitive advantages.

Share This Job!

Save This Job!

Similar Jobs:

Augmodo logo

ML Optimization Engineer - Remote

Augmodo

12 weeks ago

Join Augmodo as an ML Optimization Engineer to optimize and deploy cutting-edge computer vision algorithms on edge devices.

CA, USA
Full-time
Software Development
$155,000 - $210,000 USD/year
Waabi logo

PnP Optimization Engineer - Remote

Waabi

11 weeks ago

Join Waabi as a PnP Optimization Engineer to develop and optimize self-driving technology.

USA, Canada
Full-time
Software Development
Ubiminds logo

Cloud Optimization Engineer - Remote

Ubiminds

13 weeks ago

Join Ubiminds as a Cloud Optimization Engineer and leverage your skills in Azure to drive cost-effective cloud solutions.

Worldwide
Full-time
Software Development
Sully.ai logo

Senior AI Systems Engineer (LLM Inference & Infra Optimization) - Remote

Sully.ai

10 weeks ago

Join Sully.ai as a Senior AI Systems Engineer to optimize and deploy large language models on advanced GPU infrastructure.

Worldwide
Full-time
Software Development
iPullRank logo

Search Engine Optimization Engineer - Remote

iPullRank

40 weeks ago

Join iPullRank as a Search Engine Optimization Engineer to enhance Organic Search visibility through technical SEO and innovative strategies.

USA
Full-time
Marketing
$100,000 - $120,000/year