By Mike Linksvayer, Head of Developer Policy at GitHub
The drive to build AI and radically accelerate human progress is a thread through the history of computing. It’s no coincidence that free software was founded by an AI lab developer and the term open source was coined by the leader of an AI and nanotech think tank. While “AI” has had ups and downs, and changing definitions, it’s now clear that the deep learning revolution of the last decade will be transformative–either merely as Software 2.0, or as something much more. GitHub is sponsoring Open Source Initiative’s Deep Dive: AI because we think it’s important for the community to unpack how open source software, process, and principles can help best deliver on the promise of AI.
Open [source] is at the core of AI development
Open source is an essential driver of AI development in three ways. First, the leading AI tools are all open source. Open source frameworks like PyTorch are ubiquitous infrastructure for training AI systems. Similarly, open source software provides essential tools for responsible AI development, enabling developers to increase transparency of AI systems with packages like InterpretML and measure bias with toolkits like AIF360.
Second, open source collaboration provides a roadmap for developers to build AI systems.
We’re witnessing a proliferation of trained machine learning models placed into the commons under open source and other public licenses. This enables developers to use, train, modify, and redistribute models for their own purposes, building an AI development process that looks like the open source library and package ecosystems that underpin modern software development. Many of these projects are setting norms for how communities can collaboratively build AI. From a license perspective, this is yielding diverse results. EleutherAI is using open source licenses for their tooling and models. Others are releasing AI models under licenses that may give any user permission to use, modify, and share the models–so long as they avoid uses the authors deem unethical in the license.
Third, free and open source software has inspired similar open movements in critical related areas, such as open access, open data, and free culture, to produce an “information commons”. These movements are foundational drivers for the democratization of AI, evidenced by the ubiquity of Wikipedia and scraped datasets such as Common Crawl and the Pile for training AI models. Without the information commons created by these open movements through a mix of norms, practices, legal affordances, and of course community, the development of AI would have been both slower and more limited to only the largest entities with proprietary data.
AI will become core to [open source] software development
AI is changing how software gets made. Seemingly every week, developers are encountering new AI-powered tools that may transform how software is built and maintained. Chief among these are code generation systems that act as pair programmers for developers, helping them to write code faster. In the past year since we’ve launched GitHub Copilot, others have also released AI systems for code generation, such as Amazon, Carnegie Mellon, DeepMind, Meta, OpenAI, Replit, Salesforce, and others. Such systems do, or will soon, not only aid programmers in the generation of new code (especially necessary but painful boilerplate and tests), but also code documentation, and translations from one programming language to another. The promise of AI-powered developer tools to reduce programmer drudgery is high.
AI also holds promise to expand developer capabilities and opportunities in multiple dimensions: make it feasible for more developers to use advanced tools (like formal methods), enable more people to be developers (lowering barriers to writing useful code, accelerating learning), and increase software quality while decreasing its cost, which will increase the overall opportunity and demand for developers (as open source has done for decades).
AI itself presents new challenges to software correctness and supply chain security due to the relative opacity and complexity of AI models. We can expect an explosion of use of AI models as dependencies, raising novel questions for software supply chain security and provenance. To use AI responsibly we will have to import learnings from and investments made in securing the traditional software ecosystem—global, open collaboration among developers, security researchers, and all of society (each in turn assisted by AI tools) will be essential to manage AI risks and drive alignment with human progress.
Towards strong community stewardship for open source and AI
Developers may be at an inflection point similar to the advent of the web. Open source development accelerated with the web as libraries, code, and practices were more readily available, but also raised questions in FOSS communities about user autonomy and transparency as code switched from distribution to service. AI may constitute another shift in the economics of software, one which prompts free software and open source activists to expand their policy ambitions from protecting privately created regulatory carve-outs (copyright licenses) to include advocacy around public regulation.
Both as the steward of the Open Source Definition, and as a focal point for metadiscourse on open source, the Open Source Initiative is uniquely positioned to lead a conversation on how the future of AI can embody the spirit of open.
One fundamental question: what does it mean for an AI system to be open source? For example, can a pretrained model ever be its own preferred form for modification? Or, what is the minimum set of precursors that need to be under an open source license for the model to be open source? How should training sets that contain personal or other sensitive data be handled in the context of producing an open source model? These questions will inform and be informed by not only what existing and new open source communities do, but also public policy—for example, drafts of the EU AI Act invoke open source AI systems, even though there is no settled definition of what open source AI is.
Another set of fundamental questions concern how open source AI projects are governed and what role open source plays in AI governance: at what layers (e.g., technical, community norms, standards, legal, public policy), with what methods, and whether differing approaches can be complementary or by necessity limit collaboration or interoperability across projects.
Finally, questions about the macro impacts of AI, from labor to geopolitics, and the role that open source can play in shaping good macro outcomes, are essential. How, for example, might global open collaboration on AI reduce the risks of runaway military competition or technological surprise, while increasing the benefit of using AI to spur innovations that will help address global challenges such as climate change?
Open source collaboration has grown from the small scale of individual labs, to a global and largely informal community, to an ecosystem that also includes huge corporate investment and that powers much of society’s critical infrastructure. This scaling continues to accelerate as governments become key players in open source, and as AI massively expands the role that software will play in society. The latter—AI—is a good prompt for the open source community to ask challenging questions, many of which were already bubbling under the surface, resulting from the increased surface area and criticality of open source and software with all sectors of society, and all parts of the world.
We’re excited that OSI is stepping up to the challenge of engaging in such important and deep questions that go far beyond the initial practical context of open source—yet were always there in aspiration. We’re looking forward to contributing to and following the Deep Dive: AI conversation, and encourage everyone to join in.