Dark Mode Light Mode

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Follow Us
Follow Us
Login Login

Anthropic wants to support better AI benchmarks.

Anthropic Logo In Beige Background Anthropic Logo In Beige Background

Anthropic is starting a project to support the creation of fresh benchmarks able to assess the performance and influence of generative models such as its own Claude, thereby measuring the performance and effect of AI models.

Revealed on Monday, Anthropic’s initiative would distribute payouts to outside companies able to, as the business said in a blog post, “effectively measure advanced capabilities in AI models.” Those who are interested might submit applications for a rolling evaluation.

Anthropic stated on its official blog, “Our investment in these evaluations aims to elevate the entire field of artificial intelligence safety, providing valuable tools that benefit the whole ecosystem.” “Development of high-quality, safety-relevant evaluations remains difficult; the demand is exceeding the supply.”

As we have already discussed, artificial intelligence suffers from benchmarking. Nowadays, the most frequently mentioned criteria for artificial intelligence fall short of accurately reflecting how the typical user really makes use of the evaluated technologies. Furthermore, it is unclear whether certain benchmarks, especially those published before the birth of contemporary generative artificial intelligence, truly measure what they claim to measure given their age.

The extremely high-level, harder-than-it-sounds solution that Anthropic is suggesting involves establishing demanding standards, with a focus on artificial intelligence security and social ramifications, using new tools, infrastructure, and approaches.

The corporation especially requests studies evaluating a model’s capacity to complete tasks like cyberattacks, “enhance” weapons of mass destruction (such as nuclear weapons), and control or fool people (such as by means of deepfakes or false information). Anthropic asserts in the blog post that it is committed to developing a “early warning system” to identify and assess AI threats to national security and defense.

Anthropic also states that it expects its new programme to encourage research into benchmarks and “end-to-end” projects that examine AI’s potential for aiding in scientific study, facilitating multilingual chats, minimising inherent biases, and reducing self-censorship.

Anthropic sees new platforms allowing subject-matter specialists to create their own assessments and extensive trials of models involving “thousands” of people in order to attain all this. The firm claims to have hired a full-time coordinator for the program and may purchase or expand initiatives it believes have scalability.

“We offer a range of funding options tailored to the needs and stage of each project,” Anthropic notes in the article; an Anthropic spokesman refused to disclose any further specifics on those possibilities. Teams will have the chance to engage directly with Anthropic’s domain specialists from the frontier red team, fine-tuning, trust and safety, and other pertinent teams.

Assuming there is enough money and labor behind it, Anthropic’s attempt to enable new AI standards is commendable. It might be difficult to completely trust, however, given the company’s business aspirations in the AI race.

With some help from third parties like the nonprofit AI research organisation METR, Anthropic is rather open in the blog post about the fact that it sponsors specific assessments to match the AI safety categories it devised. That is entirely within the company’s purview. It might also compel candidates to embrace definitions of “safe” or “risky” artificial intelligence that they disagree with.

Anthropic’s allusions to “catastrophic” and “deceptive” AI hazards are likely to irritate some members of the AI community, particularly those related to nuclear weapons concerns. Many analysts believe there is no proof that artificial intelligence, as we know it, will acquire human-outsmarting, world-ending abilities anytime soon, if ever. According to these experts, claims of “superintelligence” only help to deflect focus from the urgent AI regulatory concerns of the day, including AI’s hallucinogenic tendencies.

In its article, Anthropic stated that it expects its program to serve as “a catalyst for progress towards a future where comprehensive AI evaluation is an industry standard.” The numerous open, corporate-unaffiliated initiatives to establish better AI standards may resonate with that goal. It is still a matter of debate whether such initiatives are ready to team up with an artificial intelligence provider whose allegiance finally rests with shareholders.

Anthropic Dashboard
Anthropic Wants To Support Better Ai Benchmarks. 18

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
Spotify Logo In White Background

Spotify conducts emergency alert testing in Sweden.

Next Post
Apple Ios18 Language Support

On iOS 18, Apple adds additional languages to lock screen, keypad, and search.

Advertisement