Remote Otter LogoRemoteOtter

AI Evaluation Specialist - Remote

Posted 21 hours ago
Software Development
Full Time
HK

Overview

Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.

In Short

  • Participate in the entire software development lifecycle, encompassing all stages from requirements analysis to test planning, execution, defect tracking, through to product release and maintenance.
  • Act as the go-to person for AI Agents evaluation and continuous monitoring.
  • Create comprehensive and effective test strategies and conduct hands-on testing to ensure the accuracy, reliability, and performance of AI and data applications.
  • Perform root cause analysis of test failures and product issues effectively, driving optimization for future enhancements.
  • Design and develop internal tools leveraging AI technology to improve engineering and testing work efficiency.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
  • Strong understanding of Large Language Models (LLMs), autonomous AI agents, and their system architectures.
  • Experience with AI evaluation methodologies, including offline benchmarking, online monitoring, and hybrid human-AI evaluation approaches.
  • Familiarity with software engineering best practices such as Test-Driven Development (TDD), Behavior-Driven Development (BDD), and their limitations in AI contexts.
  • Proficiency in designing adaptive, lifecycle-spanning evaluation frameworks incorporating both quantitative and qualitative metrics.
  • Experience with evaluation tools and frameworks (e.g., Opik, LangSmith) is a plus.
  • Ability to analyze complex system-level behaviors, including reasoning pipelines, tool integrations, and emergent agent actions.
  • Strong analytical skills with experience in data-driven diagnostics and root cause analysis.
  • Excellent communication skills to document evaluation plans, results, and recommendations clearly.
  • Experience working in cross-functional teams and managing feedback loops between evaluation and development.
  • Experience collaborating with infrastructure or platform teams to improve AI tooling and automation platforms.

Benefits

  • Opportunity to work in a leading global blockchain ecosystem.
  • Engage with cutting-edge AI technologies.
  • Collaborate with a diverse and talented team.
  • Contribute to innovative projects in the cryptocurrency space.
  • Flexible working environment.
Binance logo

Binance

Binance is a premier global blockchain ecosystem that operates the world's largest cryptocurrency exchange by trading volume and registered users. Trusted by over 230 million individuals across more than 100 countries, Binance is recognized for its industry-leading security, transparency in user funds, rapid trading engine, and deep liquidity. The company offers a diverse range of digital asset products, including trading, finance, education, research, payments, institutional services, and Web3 features. Binance is dedicated to leveraging digital assets and blockchain technology to create an inclusive financial ecosystem that enhances financial freedom and access for people worldwide.

Share This Job!

Save This Job!

Similar Jobs:

NeoWork logo

AI Validation Specialist - Remote

NeoWork

5 weeks ago

Join NeoWork as an AI Validation Specialist to ensure the accuracy of AI-generated congressional summaries.

Philippines
Full-time
All others
University of Tennessee Career Site logo

Research and Evaluation Specialist - Remote

University of Tennessee Career Site

25 weeks ago

The Research and Evaluation Specialist will assess the effectiveness of opioid settlement-funded programs in Tennessee and provide data analysis and grant writing support.

USA
Full-time
Data Analysis
Remote Recruitment logo

AI Automation Specialist - Remote

Remote Recruitment

3 weeks ago

Join a UK-based team as an AI Automation Specialist, designing and implementing intelligent low-code/no-code automations.

South Africa
Full-time
Software Development
13500 ZAR/month
DreamHire.com logo

AI Automation Specialist - Remote

DreamHire.com

4 weeks ago

We are looking for an AI Automation Specialist to create and manage AI-driven automation solutions for various business functions.

Philippines
Full-time
Software Development
Remote Recruitment logo

AI Automation Specialist - Remote

Remote Recruitment

4 weeks ago

Join our innovative team as an AI Automation Specialist, designing low-code/no-code automations to enhance business productivity.

South Africa
Full-time
Software Development
13500 ZAR/month