OpenAI claims to be developing a tool that will enable content creators to “opt out” of AI training.

OpenAI is currently developing a platform that will give creators more precise control over how generative AI models use their content.

The Media Manager application enables artists and content owners to identify their works for OpenAI and define their preferences for inclusion or exclusion from AI research and training.

OpenAI aims to implement the tool by 2025, collaborating with artists, content owners, and regulators to establish a standard. This may include the industry steering group that OpenAI has just been a part of.

In a blog post, OpenAI stated that developing a groundbreaking tool to detect copyrighted text, photos, audio, and video from many sources and consider the preferences of the creators would require advanced machine learning research. “In the future, we intend to gradually incorporate more options and functionalities.”

Media Manager seems to be OpenAI’s answer to the increasing criticism of their AI development strategy, which mainly depends on extracting data from publicly accessible sources on the internet. Eight prominent U.S. newspapers, including the Chicago Tribune, recently sued OpenAI for infringing on their intellectual property. The newspapers accused OpenAI of using their articles to train generative AI models, which OpenAI then sold without giving credit or compensation to the source publications.

Generative AI models, such as those developed by OpenAI, have the ability to analyse and produce many forms of content, including text, photos, and videos. These models are trained using a vast number of samples, often obtained from public websites and datasets. OpenAI and other generative AI providers argue that fair use, a legal principle that allows the use of copyrighted works for transformative secondary inventions, protects their technique of collecting public data and using it for model training. However, there is dissent among individuals.

According to OpenAI, the creation of valuable AI models would be impossible without the inclusion of copyrighted information.

However, in order to appease critics and protect itself from potential legal actions, OpenAI has implemented measures to find a compromise with content providers.

Last year, OpenAI implemented a policy that gave artists the option to exclude and delete their artwork from the datasets used to train the company’s image-generating algorithms. Using the robots.txt standard, the business enables website owners to specify whether to extract material from their site for AI model training. OpenAI is actively securing licensing agreements with major content owners, such as news organisations, stock media libraries, and Q&A platforms like Stack Overflow.

Nevertheless, several content providers argue that OpenAI has not made sufficient progress.

Artists have criticised OpenAI’s opt-out method for photos, deeming it burdensome since it necessitates sending a separate copy of each image to be deleted, along with a corresponding description. According to reports, OpenAI pays a comparatively low fee to obtain material licenses. OpenAI notes in their recent blog post that their existing solutions do not cover situations when artists’ works are cited, remixed, or republished on venues that they do not have control over.

In addition to OpenAI, other external entities are endeavouring to provide comprehensive traceability and voluntary exclusion mechanisms for generative artificial intelligence.

Startup Spawning AI, in collaboration with Stability AI and Hugging Face, provides an application that detects and monitors bots’ IP addresses to prevent scraping efforts. Additionally, they provide a database where artists may register their works to prevent vendors that want to honour the requests from using them for training purposes. Steg.AI and Imatag assist creators in asserting ownership of their photos by adding invisible watermarks. The University of Chicago developed the Nightshade project, which uses a technique of “poisoning” picture data to render it ineffective or disruptive for training AI models.

Juliet P.
Author: Juliet P.

Share this article
0
Share
Shareable URL
Prev Post

VSCO, a picture editing tool, opens a market to link photographers with companies.

Next Post

The Copilot Chat feature in GitHub’s mobile app is now widely accessible.

Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
Subscribe to our newsletter
Get notified of the best deals on our WordPress themes.