Remote Otter LogoRemoteOtter

AI Evaluation Engineer - Remote

Posted Yesterday
Data Analysis
Contract
Colombia

Overview

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

In Short

  • We are looking for an AI Evaluation Engineer specialized in data analysis.
  • Design benchmark tasks that simulate real-world analytical workflows.
  • Create scenarios for AI systems to analyze large, messy, multi-source datasets.
  • Decompose tasks across multiple agents and produce clear, verifiable conclusions.
  • Commitments required: 8 hours per day with 4 hours overlap with PST.
  • Employment type: Contractor assignment (no medical/paid leave).
  • Duration of contract: 4 weeks+
  • Location: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, Vietnam.
  • Interview: take home assessment (60min).

Requirements

  • Design and develop multi-agent benchmark tasks focused on complex data analysis workflows.
  • Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data).
  • Build tasks requiring cross-referencing across multiple data sources.
  • Conduct anomaly detection and contradiction identification.
  • Perform statistical analysis and interpretation.
  • Define task decomposition strategies across specialized sub-agents.
  • Develop verification logic to validate precise analytical outputs.
  • Implement evaluation pipelines using Python and SQL.
  • Create reproducible environments using Docker.
  • Analyze task performance and refine for clarity, difficulty, and scoring accuracy.

Benefits

  • Flexible working hours.
  • Opportunity to work with diverse datasets.
  • Engagement with cutting-edge AI technologies.
  • Collaboration with a global team.
  • Enhance your skills in data analysis and AI evaluation.

G.C.G

Gramian Consulting Group

Gramian Consulting Group is a boutique consultancy that specializes in IT professional services and engineering talent solutions. With a strong foundation in software engineering and leadership, the company focuses on helping organizations build high-performing teams by connecting them with professionals who meet their specific needs. They are currently representing a fast-growing AI startup based in San Francisco, which is dedicated to developing smart data tools that enhance the reliability and utility of large language models (LLMs) in real-world applications, particularly within the legal industry.

Share This Job!

Save This Job!

Similar Jobs:

P 1ai logo

AI Evaluation Engineer - Remote

P 1ai

50 weeks ago

Join P-1 AI as an AI Evaluation Engineer to develop and implement evaluation benchmarks for our AI systems.

USA, Canada
Full-time
Software Development
Trunk Tools logo

AI Systems Evaluation Engineer - Remote

Trunk Tools

59 weeks ago

Join Trunk Tools as an AI Systems Evaluation Engineer to drive automation in the construction industry.

USA
Full-time
Software Development

B.D.U.K.L

Test & Evaluation Engineer - Remote

Boeing Defence United Kingdom Limited

53 weeks ago

Join Boeing as a Test & Evaluation Engineer to support complex test operations for military aircraft in the UK.

UK
Full-time
Engineering

B.D.U.K.L

Test & Evaluation Engineer - Remote

Boeing Defence United Kingdom Limited

53 weeks ago

The Test & Evaluation Engineer will oversee testing procedures for the Chinook programme at Boeing Defence UK, ensuring safety and effectiveness in flight operations.

Worldwide
Full-time
All others

T.S.A

Field Evaluation Engineer - Remote

TÜV SÜD America

64 weeks ago

TÜV SÜD America Inc. is seeking a Field Evaluation Engineer to perform testing and inspection of electrical equipment with significant travel requirements.

USA
Full-time
All others