HomeAIAnthropic wants to support better AI benchmarks.

Anthropic wants to support better AI benchmarks.

July 2, 2024

Anthropic is starting a project to support the creation of fresh benchmarks able to assess the performance and influence of generative models such as its own Claude, thereby measuring the performance and effect of AI models.

Revealed on Monday, Anthropic’s initiative would distribute payouts to outside companies able to, as the business said in a blog post, “effectively measure advanced capabilities in AI models.” Those who are interested might submit applications for a rolling evaluation.

Anthropic stated on its official blog, “Our investment in these evaluations aims to elevate the entire field of artificial intelligence safety, providing valuable tools that benefit the whole ecosystem.” “Development of high-quality, safety-relevant evaluations remains difficult; the demand is exceeding the supply.”

As we have already discussed, artificial intelligence suffers from benchmarking. Nowadays, the most frequently mentioned criteria for artificial intelligence fall short of accurately reflecting how the typical user really makes use of the evaluated technologies. Furthermore, it is unclear whether certain benchmarks, especially those published before the birth of contemporary generative artificial intelligence, truly measure what they claim to measure given their age.

The extremely high-level, harder-than-it-sounds solution that Anthropic is suggesting involves establishing demanding standards, with a focus on artificial intelligence security and social ramifications, using new tools, infrastructure, and approaches.

The corporation especially requests studies evaluating a model’s capacity to complete tasks like cyberattacks, “enhance” weapons of mass destruction (such as nuclear weapons), and control or fool people (such as by means of deepfakes or false information). Anthropic asserts in the blog post that it is committed to developing a “early warning system” to identify and assess AI threats to national security and defense.

Anthropic also states that it expects its new programme to encourage research into benchmarks and “end-to-end” projects that examine AI’s potential for aiding in scientific study, facilitating multilingual chats, minimising inherent biases, and reducing self-censorship.

Anthropic sees new platforms allowing subject-matter specialists to create their own assessments and extensive trials of models involving “thousands” of people in order to attain all this. The firm claims to have hired a full-time coordinator for the program and may purchase or expand initiatives it believes have scalability.

“We offer a range of funding options tailored to the needs and stage of each project,” Anthropic notes in the article; an Anthropic spokesman refused to disclose any further specifics on those possibilities. Teams will have the chance to engage directly with Anthropic’s domain specialists from the frontier red team, fine-tuning, trust and safety, and other pertinent teams.

Assuming there is enough money and labor behind it, Anthropic’s attempt to enable new AI standards is commendable. It might be difficult to completely trust, however, given the company’s business aspirations in the AI race.

With some help from third parties like the nonprofit AI research organisation METR, Anthropic is rather open in the blog post about the fact that it sponsors specific assessments to match the AI safety categories it devised. That is entirely within the company’s purview. It might also compel candidates to embrace definitions of “safe” or “risky” artificial intelligence that they disagree with.

Anthropic’s allusions to “catastrophic” and “deceptive” AI hazards are likely to irritate some members of the AI community, particularly those related to nuclear weapons concerns. Many analysts believe there is no proof that artificial intelligence, as we know it, will acquire human-outsmarting, world-ending abilities anytime soon, if ever. According to these experts, claims of “superintelligence” only help to deflect focus from the urgent AI regulatory concerns of the day, including AI’s hallucinogenic tendencies.

In its article, Anthropic stated that it expects its program to serve as “a catalyst for progress towards a future where comprehensive AI evaluation is an industry standard.” The numerous open, corporate-unaffiliated initiatives to establish better AI standards may resonate with that goal. It is still a matter of debate whether such initiatives are ready to team up with an artificial intelligence provider whose allegiance finally rests with shareholders.

Anthropic Dashboard — Anthropic Wants To Support Better Ai Benchmarks. 18

July 2, 2024

byAlex Harper

Add a comment Add a comment

Spotify conducts emergency alert testing in Sweden.

Media & Entertainment

July 2, 2024

On iOS 18, Apple adds additional languages to lock screen, keypad, and search.

Apps

July 2, 2024

Recommended for You

Meta intends to incorporate generative artificial intelligence into metaverse gaming.

byAlex Harper

The OpenAI intrusion serves as a reminder that AI companies are valuable targets for hackers.

byAlex Harper

Claude, developed by Anthropic, introduces a convenient playground feature that allows you to enhance your AI applications with ease.

byAlex Harper

Medal raises $13 million to develop a contextual AI assistant for desktop.

byAlex Harper

An AI company called Helsing raised $487 million in Series C funding and plans to grow in the Baltics to fight the Russian threat.

byAlex Harper

Silicon Valley says California’s SB 1047 will lead to an AI disaster, even though the bill’s goal is to stop it.

byAlex Harper

EliseAI secures $75 million in funding for chatbots that assist property managers in interacting with tenants.

byAlex Harper

MIT researchers disclose an AI risk database.

byAlex Harper

Postjer Group – Creating Amazing Digital Creations for the Future of the Web

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Google is accused temporarily of antitrust in the UK for “self-preferencing” its ad exchange.

Karo is a task management application that allows you to allocate work to your acquaintances and relatives.

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Anthropic wants to support better AI benchmarks.

Leave a Reply Cancel reply

Spotify conducts emergency alert testing in Sweden.

On iOS 18, Apple adds additional languages to lock screen, keypad, and search.

Recommended for You

Meta intends to incorporate generative artificial intelligence into metaverse gaming.

The OpenAI intrusion serves as a reminder that AI companies are valuable targets for hackers.

Claude, developed by Anthropic, introduces a convenient playground feature that allows you to enhance your AI applications with ease.

Medal raises $13 million to develop a contextual AI assistant for desktop.

An AI company called Helsing raised $487 million in Series C funding and plans to grow in the Baltics to fight the Russian threat.

Silicon Valley says California’s SB 1047 will lead to an AI disaster, even though the bill’s goal is to stop it.

EliseAI secures $75 million in funding for chatbots that assist property managers in interacting with tenants.

MIT researchers disclose an AI risk database.

Keep Up to Date with the Most Important News

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Google is accused temporarily of antitrust in the UK for “self-preferencing” its ad exchange.

Karo is a task management application that allows you to allocate work to your acquaintances and relatives.

Meta will allow third-party applications to call WhatsApp and Messenger in 2027.

Roblox gives artists new ways to make money and hints at a creative AI project.

Keep Up to Date with the Most Important News

Anthropic wants to support better AI benchmarks.

Keep Up to Date with the Most Important News

Leave a Reply Cancel reply

Spotify conducts emergency alert testing in Sweden.

On iOS 18, Apple adds additional languages to lock screen, keypad, and search.

Recommended for You

Meta intends to incorporate generative artificial intelligence into metaverse gaming.

The OpenAI intrusion serves as a reminder that AI companies are valuable targets for hackers.

Claude, developed by Anthropic, introduces a convenient playground feature that allows you to enhance your AI applications with ease.

Medal raises $13 million to develop a contextual AI assistant for desktop.

An AI company called Helsing raised $487 million in Series C funding and plans to grow in the Baltics to fight the Russian threat.

Silicon Valley says California’s SB 1047 will lead to an AI disaster, even though the bill’s goal is to stop it.

EliseAI secures $75 million in funding for chatbots that assist property managers in interacting with tenants.

MIT researchers disclose an AI risk database.