Remote Otter LogoRemoteOtter

AI Evaluation Engineer - Remote

Posted 1 week ago
Software Development
Full Time
USA, Canada

Overview

As an AI Evaluation Engineer at P-1 AI, you will be responsible for ensuring that our AI, Archie, learns and retains necessary engineering skills through effective evaluation benchmarks.

In Short

  • Implement systems for organizing and reporting on eval benchmarks.
  • Ensure evals run effectively within our CI/CD system.
  • Collaborate with industrial partners and experts to refine evals.
  • Create methods for detecting common AI quality challenges.
  • Lead the implementation of automated tests across technology stacks.

Requirements

  • Experience in constructing test suites for software/AI systems.
  • Ability to design metrics for system evaluation and performance visualization.
  • Experience with LLM-based systems is a plus.
  • Strong communication skills with stakeholders.
  • Proficiency in Python and modern software development tools.
  • Ability to thrive in a fast-paced startup environment.

Benefits

  • Work remotely with flexibility.
  • Opportunity to collaborate with top minds in AI and engineering.
  • Participation in a dynamic startup culture.
  • Travel to the San Francisco office for co-working sessions.
P 1ai logo

P 1ai

P-1 AI is an innovative company focused on developing engineering artificial general intelligence (AGI) with the goal of transforming the built world. Founded on the belief that AI can significantly enhance human capabilities in engineering, P-1 AI's flagship product, Archie, is designed to function as an AI engineer, capable of quantitative and spatial reasoning akin to an entry-level design engineer. The company is backed by a strong founding team of experts in model-based engineering and deep learning, and has recently secured $23 million in seed funding. P-1 AI aims to integrate Archie into engineering teams across industrial sectors, making a substantial impact on how engineering tasks are approached and executed.

Share This Job!

Save This Job!

Similar Jobs:

Trunk Tools logo

AI Systems Evaluation Engineer - Remote

Trunk Tools

11 weeks ago

Join Trunk Tools as an AI Systems Evaluation Engineer to drive automation in the construction industry.

USA
Full-time
Software Development

B.D.U.K.L

Test & Evaluation Engineer - Remote

Boeing Defence United Kingdom Limited

5 weeks ago

Join Boeing as a Test & Evaluation Engineer to support complex test operations for military aircraft in the UK.

UK
Full-time
Engineering

B.D.U.K.L

Test & Evaluation Engineer - Remote

Boeing Defence United Kingdom Limited

5 weeks ago

The Test & Evaluation Engineer will oversee testing procedures for the Chinook programme at Boeing Defence UK, ensuring safety and effectiveness in flight operations.

Worldwide
Full-time
All others

T.S.A

Field Evaluation Engineer - Remote

TÜV SÜD America

16 weeks ago

TÜV SÜD America Inc. is seeking a Field Evaluation Engineer to perform testing and inspection of electrical equipment with significant travel requirements.

USA
Full-time
All others

P.V

AI Automation Engineer - Remote

Persist Ventures

4 weeks ago

Join us as an AI Automation Engineer to develop a tool for automating outreach to viral content creators.

India
Full-time
Software Development