Dark Mode Light Mode

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Follow Us
Follow Us
Login Login
With the purchase of Samooha, Snowflake goes all-in on data clean rooms.
OpenAI strengthens the safety team and gives the board veto authority over dangerous AI.Given Snowflake’s curiosity, the issue arose.
Bitcoin Ends 11-Week Winning Streak With $33M Outflows While Altcoins Buck Trend.

OpenAI strengthens the safety team and gives the board veto authority over dangerous AI.Given Snowflake’s curiosity, the issue arose.

2M493W7 OpenAI logo and font in isometric view on dark background. 3D render.

To combat the danger of hazardous AI, OpenAI is improving its internal safety measures. A new “safety advisory group” will sit above the technical teams and offer suggestions to leadership, and the board has been given veto authority; whether it will really use it remains to be seen.

Normally, the intricacies of policies like these do not demand publicity since, in fact, they consist of a slew of closed-door meetings with opaque roles and responsibility flows to which outsiders are seldom privy. Though this is likely also true in this situation, the recent leadership squabble and expanding AI risk debate demand a look at how the world’s largest AI development business is tackling safety concerns.

OpenAI discusses their updated “Preparedness Framework” in a new document and blog post, which one imagines got a bit of a retool after November’s shake-up that removed the board’s two most “decelerationist” members: Ilya Sutskever (still at the company in a slightly different role) and Helen Toner (completely gone).

Advertisement

The update’s major objective seems to be to establish a clear method for discovering, assessing, and determining what to do about “catastrophic” risks inherent in the models they are constructing. They define it as follows:

By catastrophic risk, we mean any danger that might cause hundreds of billions of dollars in economic loss or cause serious injury or death to a large number of people, including, but not limited to, existential risk.

(Existential danger is the “rise of the machines” kind of thing.)

A “safety systems” team oversees in-production models; this is for things like systematic ChatGPT misuse that may be prevented with API limits or tweaking. Frontier models under development are assigned to a “preparedness” team, which identifies and quantifies risks before the model is published. Then there’s the “superalignment” team, which is developing theoretical guide rails for “superintelligent” models, which we may or may not be approaching.

The first two categories, actual and not fictitious, have reasonably simple criteria. Their teams provide risk scores to each model based on four criteria: cybersecurity, “persuasion” (e.g., disinformation), model autonomy (i.e., acting on its own), and CBRN (chemical, biological, radiological, and nuclear dangers, such as the potential to produce new viruses).

Various safeguards are expected, such as a legitimate reluctance to disclose the procedure for creating napalm or pipe bombs. If a model is still judged as having a “high” risk after taking into account known mitigations, it cannot be deployed, and if a model has any “critical” risks, it will not be developed further.

In case you were wondering whether these risk levels were to be left to the judgment of some engineer or product manager, they are actually specified in the framework.

For example, it is a “medium” risk to “increase the productivity of operators… on key cyber operation tasks” by a specific factor in the cybersecurity area, which is the most realistic of them. In contrast, a high-risk model would “identify and develop proofs-of-concept for high-value exploits against hardened targets without human intervention.” Critical says that the “model can devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high-level desired goal.” We obviously don’t want it out there (though it would sell for a lot of money).

I’ve contacted OpenAI for further details on how these categories are established and improved, such as if a new danger, such as a photorealistic false video of humans, falls under “persuasion” or a new category, and will update this page if I get a response.

As a result, only medium and high risks may be accepted in one manner or another. However, the individuals who create such models are not always the ideal people to assess and suggest them. As a result, OpenAI is forming a “cross-functional safety advisory group” that will sit on top of the technical side, analyzing the researchers’ findings and providing suggestions from a higher vantage point. Hopefully, they suggest, this will reveal some “unknown unknowns,” but by definition, such are tough to capture.

The procedure demands that these proposals be delivered to the board and leadership at the same time, which we believe to include CEO Sam Altman and CTO Mira Murati, as well as their lieutenants. Leadership will decide whether to send it or keep it in the fridge, but the board will have the authority to overturn such choices.

This should prevent anything like what was believed to have occurred before the huge controversy, namely a high-risk product or procedure being greenlit without the board’s knowledge or permission. Of course, the drama resulted in the dismissal of two of the more critical voices and the hiring of some money-minded gentlemen (Bret Taylor and Larry Summers), who are clever but far from AI specialists.

If an expert panel provides a recommendation and the CEO takes a decision based on that knowledge, would this friendly board feel empowered to oppose them and put the brakes on? Will we be informed if they do? Outside of the commitment that OpenAI would seek audits from independent third parties, transparency is not fully addressed.

Assume a model is created that merits a “critical” risk category. OpenAI hasn’t been shy about braggadocio in the past; bragging about how extraordinarily strong their models are, to the point of refusing to reveal them, is wonderful advertising. But if the hazards are so substantial and OpenAI is so worried about them, do we have any certainty that this will happen? Perhaps it’s a poor idea. In any case, it isn’t actually discussed.

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Add a comment Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

With the purchase of Samooha, Snowflake goes all-in on data clean rooms.

Next Post

Bitcoin Ends 11-Week Winning Streak With $33M Outflows While Altcoins Buck Trend.

Advertisement