AI Evaluation Dataset Creator - Remote

Posted 20 weeks ago

Software Development

Contract

USA

Overview

Mercor is collaborating with a leading AI research lab to develop a next-generation evaluation dataset for frontier AI models. We are seeking experts with advanced domain knowledge across diverse fields to design extremely challenging prompts that cannot be solved by existing AI systems without internet search or browsing capabilities.

In Short

Create original, expert-level prompts that require tool use (e.g., search, browse, or code execution).
Ensure prompts are objective, self-contained, and yield clear, unambiguous answers.
Test prompts against advanced AI models and document failures/successes.
Provide reasoning steps and solutions for each prompt.
Classify prompts into subject domains for dataset organization.
Collaborate with reviewers for expert validation and prompt refinement.

Requirements

Advanced academic or professional expertise in a specialized subject (STEM, law, finance, history, cultural studies, etc.).
Strong ability to design precise, high-difficulty questions requiring deep knowledge and external references.
Experience in academic research, benchmarking, or test question design preferred.
Attention to detail and ability to provide concise reasoning explanations.
Familiarity with AI models and their limitations is a plus.

Benefits

Remote and asynchronous — set your own hours.
Expected commitment: ~10–20 hours/week.
Project duration: ~2 months, with possible extensions based on dataset needs.
Opportunity to contribute to high-impact AI safety and evaluation research.

Mercor

HelixRecruit is a forward-thinking recruitment firm specializing in connecting talent with innovative companies. They focus on providing opportunities for individuals to engage in data annotation projects that enhance artificial intelligence systems. With a commitment to flexibility, HelixRecruit offers remote and asynchronous work arrangements, allowing contractors to set their own schedules while contributing to meaningful projects. The company values detail-oriented generalists and encourages applicants from diverse educational backgrounds, including students and early career professionals.

AI Evaluation Dataset Creator - Remote

Overview

In Short

Requirements

Benefits

Mercor

Mercor

Similar Jobs:

AI Data Trainer, Code Evaluation - Remote

AI Translation Evaluator – Danish - Remote

AI Translation Evaluator – Croatian - Remote

AI Translation Evaluator – Catalan - Remote

AI Translation Evaluator – Czech - Remote