Remote Otter LogoRemoteOtter

Open Role at HUD - Remote

Posted 1 week ago
All others
Full Time
Worldwide

Overview

HUD is a fast-growing startup focused on developing agentic evaluations for AI agents that browse the web, aiming to provide detailed evaluations for a wide range of tasks.

In Short

  • Building new evaluations/eval environments for HUD's CUA evaluation framework.
  • Building out the CUA evals framework.
  • Developing partnerships and improving developer experience for CUA developers.
  • Supporting teams of research engineers as they build out evals.
  • General startup operations as the company scales.

Requirements

  • Engagement with AI Safety and AI alignment.
  • Understanding of LLM evaluation frameworks, particularly multimodal and agentic evaluations.
  • Familiarity in using and deploying the latest AI tools for operational efficiency.
  • Experience in fullstack LLM deployment, particularly for multimodal and agentic AI evaluations.
  • Prior experience in fast-growing startup teams.

Benefits

  • Remote-friendly work environment.
  • Support for relocation and visa sponsorship for strong full-time candidates.
  • Opportunity to work with a team of talented individuals, including international Olympiad medallists and AI startup founders.
  • Motivated candidates are encouraged to apply even if they don't meet all criteria.
HUD logo

HUD

HUD (YC W25) is a pioneering company focused on developing agentic evaluations for Computer Use Agents (CUAs) that browse the web. Their innovative CUA Evals framework is the first comprehensive evaluation tool designed specifically for CUAs, addressing the critical need for detailed evaluations to ensure AI agents function effectively in real-world scenarios. Backed by Y Combinator, HUD collaborates closely with leading AI labs to provide scalable agent evaluation infrastructure. The team comprises highly skilled individuals, including international Olympiad medallists and experienced AI startup founders, dedicated to advancing the field of AI evaluation.

Share This Job!

Save This Job!