Scientist-Bench Leaderboard
Our benchmark evaluates the performance of AI-Researcher across different language models for automated scientific research tasks. This leaderboard tracks how various models perform on our standardized tests.

Our benchmark is released! Check it out and submit the results with your agents.