
Kaggle has announced that it now offers Community Benchmarks, enabling AI practitioners to design, run, and share their own benchmarks for evaluating AI models.
Kaggle is a community platform run by Google that offers models and resources for data scientists and machine learning practitioners. Last year, it had introduced Kaggle Benchmarks to provide evaluations from research groups, such as Meta’s MultiLoKo and Google’s FACTS suite benchmarks.
This latest announcement extends this to the community as a whole, allowing them to create benchmarks specific to their own use cases. According to Google, AI capabilities are evolving so quickly that the existing ways of benchmarking and evaluating them aren’t able to keep up. With Community Benchmarks, the company hopes to bridge this gap and provide a more flexible and transparent framework for evaluation.
To get started, users can create a task, which enables them to test an AI model’s performance on a specific problem. Once multiple tasks are created, they can be grouped into a benchmark that can be run across a suite of AI models to create a leaderboard.
According to Google, the benefits of Community Benchmarks include free access to state-of-the-art models, reproducibility, rapid prototyping, and support for testing multi-model inputs, code execution, tool use, and multi-turn conversations.
“The future of AI progress depends on how models are evaluated. With Kaggle Community Benchmarks, Kagglers are no longer just testing models, they’re helping shape the next generation of intelligence,” Google wrote in a blog post.
To get started, users can read the documentation for a tutorial on how to create tasks and benchmarks, and visit the Kaggle Benchmarks Cookbook for a collection of examples and patterns




