On the emerging landscape of open AI

This article was contributed by Alek Tarkowski, director of strategy of Open Future Foundation, the think tank for an open movement.

The launch of open models like BLOOM and Stable Diffusion is a symbolic birth moment for the field of open AI. In recent months, principles that underpin Open Source programming and other efforts to build information commons are being applied to AI research and development. It’s a turning point, and it’s worthy of consideration by everyone who cares about the future of Open Source software.

Several weeks ago, I took a closer look at the emergent field, and wrote an analysis that focused on the role of licensing models (followed by a Twitter thread). I wanted to understand how these developments relate to other, earlier fields of openness, like Open Source software, open science, or open data. And in turn, how the approach to openness chosen by AI researchers might have an impact on these other fields.

The “open AI” term is not ideal.After all, OpenAI is the name of a company. Consequently, we need to build this emerging field in the shadow of corporate branding. As my friend Paul Keller and I brainstormed the question of whether there might be better terms out there, he said that the question is about open sharing in times of “machines that eat content.”

In recent months, AI tools like large language and text-to-image models were released in ways that connect norms of open sharing with a vision of responsible AI. The BigScience Research Workshop recently released the BLOOM model under RAIL, a new “Responsible AI license.” This was followed in August by Stable Diffusion, a text-to-image model launched under CreativeML Open RAIL-M. That’s a derivative license (RAIL is an open-ended family of licenses). And, Meta AI released an OPT language model under a similar, bespoke license that permits research uses. Interestingly, it did not describe this as an “open” release.

These new licenses aim to ensure not just openness of resources, but also responsibility for the impact of AI models. They are tackling what we call the Paradox of Open: openness is today both a challenger and enabler of concentrations of power.

“Open vs responsible” is now a big topic in AI circles. But it also raises questions for the broader space of open sharing and for companies and organizations built on open frameworks. And it signals an urgent need to revisit open licensing frameworks. Anna Mazgal calls this “a singularity point for open licensing” and also argues for a review of open licenses from a fundamental rights perspective.

Consider this: Open AI tools have the same generative potential as open servers (Apache Software Foundation), browsers (Mozilla) or encyclopedias (Wikipedia). But for the first time, the debate is not just about sharing. Management of risk and responsible use are raised from the start not as related to the issue of open, but as norms of equal importance. And in the case of the RAIL license, its creators pay more attention to figuring out how to enforce responsible behavior, than to openness (which they sort of take for granted). The license is meant to do what ethical guidelines fail to do. An excellent paper published at ACM FAccT 2022 by Danish Contractor, Daniel McDuff, Julia Haines, Jenny Lee, Christopher Hines, Brent Hecht, Nicholas Vincent, and Hanlin Li examines behavioral use licensing for responsible AI. It should be required reading for anyone interested in the topic.

Is giving up on the most permissive licenses really necessary—and worth it? Can a balance really be found? John Weitzmann from Wikimedia argued recently that use restrictions are not effective. Other issues raised by new licenses include enforcement, governance, license proliferation, peer production, and theory of change behind creating open alternatives. I have defined these issues in more detail in my recent notes on Open AI.

Some will ask whether the RAIL license is really an open license. I don’t think that that’s the key question. The more important one is whether we need to revisit open licensing frameworks and definitions and embrace responsible licensing as a flavor of open.

I’ve recently published a white paper, co-authored with Zuzanna Warso, on the use of openly licensed photographs for face recognition training datasets. We followed previous work by Adam Harvey and his exposing.ai project (Adam also conducted a detailed exploration of aspects of the datasets related to open licensing). These controversial cases show how unethical uses of the information commons were a side effect of deploying key datasets for AI training. They also demonstrate the importance of assigning responsibility and ensuring responsible use—the same issues that the new RAIL licenses aim to solve.

In our paper, we argue that these datasets (and other elements of AI systems) should be governed as a commons. And as the AI commons gets designed, licensing is one of the key questions that need to be answered. Hopefully, the answer will be peer produced together by open advocates and AI researchers. Please find me on Twitter (@atarkowski) if you would like to be part of such a conversation.

Image generated by Stable Diffusion, using the prompt “blooming artificial intelligence”.