This article provides a hands on guide on how to establish an open governance structure for an Open Source project. In fact, I’m currently in the process of proposing an amendment to the existing charter of the ClearlyDefined project, and this article will highlight what steps are being taken to make this process as smooth as possible.
Just to give some background, ClearlyDefined was originally developed by Microsoft and it was donated to the Open Source Initiative about 5 years ago. Microsoft still plays a key role in the project, but is seeking external contributors. Several individuals and companies have demonstrated interest in helping with ClearlyDefined, but as of now there’s no clear guidelines on how these contributors can play a role in the governance of the project. At ClearlyDefined there is an interest in receiving contributions from other organizations, but the governance structure that exists today does not provide any incentives for organizations to contribute.
I recently joined the ClearlyDefined project as a Community Manager, and one of my first steps was to reach out to several community members to understand their needs and desires. Besides scheduling calls with these members online, I also had the opportunity to attend the first ever ORT Community Day. The OSS Review Toolkit (ORT) is a Linux Foundation project that is currently being used by several organizations for managing Open Source supply chain compliance and security, and is one of the key projects that currently makes use of and promotes ClearlyDefined. It was in talking with the community face-to-face that I identified the clear need for an open governance model.
My second step was to conduct research about open governance models for Open Source projects. I identified several good resources, probably the most important being The Open Source Way 2.0, a comprehensive guidebook created by Red Hat and several prominent Open Source leaders. It’s also worth mentioning the GitHub Open Source Guide, which provides a good summary of open governance. There’s also a more recent article published by the CNCF, Outlining the structure of your Open Source software project, which provides a good explanation of how to structure an Open Source project based on its maturity level.
My next step was to study the different charters and by-laws from projects that are part of the same ecosystem as ClearlyDefined. We don’t want to create something from scratch, but seek inspiration from adjacent projects that already exist:
- ORT Governance
- OpenSSF Charter
- SPDX Governance
- FOSSology Governance
- OpenChain Charter
- CNCF Charter
- Todo Group Charter
- Eclipse Foundation Bylaws
And finally, it was time to start making some suggestions of changes to the existing ClearlyDefined charter. This is a very delicate process, because everything has to be very well justified. Let’s go over each section of the charter:
The original mission is very generic. The mission is not “clearly defined”:
Help FOSS projects be more successful through clearly defined project data.
So let’s make it more specific and inspiring:
ClearlyDefined’s mission is to create a global database of licensing metadata for every Open Source software component ever published.
The original principles are rather good, so we left them unchanged:
Neutral – The project carries no affiliation or company driven focus;
Open – The data, infrastructure, and processes are open to all;
Factual – All data is factual. No interpretation or assessment is made;
Upstream – Enable upstream projects as much as possible;
Simple – Wherever possible the project will use the simple solution.
The community has asked for more open data, for example, and that’s already addressed here. We just need to follow our principles.
The original scope was overly ambitious, and aimed to address security, accessibility, among other issues:
ClearlyDefined will pursue any data that makes FOSS projects easier to consume and thus more successful. Initially this work will focus on licensing data that form the core of understanding and meeting the legal obligations related to using FOSS. This includes: License (declared and observed); Copyright holders; Source location (including revision/commit).
Why? The FOSS licensing and security information landscape is vast and varied. Projects without clear metadata are harder to adopt and so get fewer contributions and lower engagement – they enjoy less success. On the consumer side, enormous effort is required to discover and comply with licensing obligations and track security issues. Even simple things like the location of the source for a component can be painful to find. These ambiguities mean that FOSS cannot be consumed with confidence. This affects the success of FOSS projects. We want to break this vicious cycle.
What? Crowdsourcing the curation of licensing, security, accessibility data for FOSS projects. First clear licensing data. Later, clear security, accessibility data.
How? Harvesting data embedded in projects, curating the data in an open and collaborative process, contributing clearly defined project data back to the FOSS projects, and making the data freely and easily accessible. A virtuous cycle.
Future efforts will focus on further topics such as: Security – facilitating the reporting and tracking of vulnerabilities in projects; Accessibility – Characteristics and analysis of a project’s support of accessibility related technology and concerns; Project data – Governance model, principals, issue tracking, discussion forums, … The ordering of this work and effort applied will depend entirely on the community and their interests.
Let’s reduce the scope and focus on licensing metadata. Security, accessibility and project data are also important, of course, but there are other initiatives that are already working on these, so let’s collaborate with them.
This is the proposed scope:
ClearlyDefined will focus on licensing metadata that form the core of understanding and meeting the legal obligations and security best practices related to using Open Source software. This includes: License (declared and observed); Copyright holders; Source location (including revision/commit).
In the original charter, the motivation (Why?) was part of the scope section. Let’s break the motivation into a separate section to make things clearer. Also, let’s rephrase the motivation, making it shorter and more concise:
With the move towards SBOMs (Software Bill of Materials) everywhere for compliance and security reasons, organizations will face great challenges to generate these at scale for each stage on the supply chain, for every build or release.
Additionally, multiple organizations will have to fix the same missing or wrongly identified licensing metadata over and over again.
This is where ClearlyDefined comes in, by serving a cached copy of licensing metadata for each component through a simple API.
Organizations will also be able to contribute back with any missing or wrongly identified licensing metadata, helping to create a database that is accurate for the benefit of all.
In the original charter, under governance, it described the project processes and community roles. But governance addresses how a system works on a higher level, and that’s entirely missing from the original charter. So let’s create these 3 sections: processes, community and governance.
The processes section remains mostly the same:
The continuing goal of ClearlyDefined is to help originating projects craft and maintain clarity around their in-scope data as a native part of their operation. Where that is not possible, the project will maintain the relevant data. This is viewed as a fork of the upstream project and, like code forks, should be minimized. Either way, the project serves as a one-stop-shop for the in-scope data making life easy for consumers.
The project undertakes four main operations in support of the stated goals and scope: Harvesting data embedded in projects; Curating the data in an open and collaborative process; Contributing clearly defined project data back to the FOSS projects; and Making the data freely and easily accessible.
These processes are further described below.
Harvesting is the act of getting data from upstream projects. This may be as simple as reading prescribed data from canonical locations to full-on analysis of the source code using a variety of open tools. The discovered data is stored in its entirety in its native form in the ClearlyDefined infrastructure and made available to the community on demand. The harvesting tools themselves are always fully open and accessible to the community for vetting and inspection. The project is open to including new tools subject to a vote, as described below.
Harvesting may be run by the ClearlyDefined project itself or by designated parties, typically curators. In all cases, only output from agreed-upon tools and configurations will be admitted to the system. Harvesting operators are free to focus on a given domain of projects that best suit their expertise and interests.
The curation process is fundamentally open and transparent. Curators (aka project committers or maintainers) work on harvested data, data contributed by the ClearlyDefined community, and with the origin project artifacts and community to validate presented information. All deliberations, discoveries and discussions are recorded and made available for community inspection.
Initially this workflow will happen in one or more GitHub repositories using standard Pull Request workflows on human-readable and diff-able curation artifacts. The project may develop additional tools to supplement or supplant this flow but will always ensure full transparency.
As with committers on typical FOSS projects, curators are free/expected to focus on particular domains that fit their interests and expertise.
At least initially, all curated data must be signed off by two curators. This is more in the interest of working through thought and mechanical processes and developing a common understanding of the data and determining what is admissible. This requirement may be removed through a vote, as described below.
Having curated data about a project, ClearlyDefined community members will seek to contribute the data upstream in a form most attractive to the receiving project. Given the anticipated scale of this effort, some automation will be used but with a sensitivity to spamming projects with pull requests. These contributions will include information supporting the inclusion and ongoing maintenance of the curated data.
Projects accepting the curated contribution will be deemed ClearlyDefined (see Badging below) and will no longer need curation. Validations will continue but by opting into this program, the projects are endeavoring to effectively self-curate the data.
Regardless of whether projects are self-curating or externally curated by ClearlyDefined, as a service to the consuming community, the ClearlyDefined project serves up the harvested and curated data both through programmatic (e.g., REST) APIs and through browsable web properties. The raw harvested data as well as the summarized and curated is made available through both access methods.
The community section remains mostly the same as well:
A data curator is akin to a project maintainer or committer in typical Open Source projects. Curators have write permissions to the curation repo(s) and are ultimately responsible for admitting data to the curated store. A curator is more librarian and data scientist than lawyer or developer. The role requires enough domain context to enable issue identification and resolution. The role also requires technical expertise in running the various tools used to detect and analyze components. Each curator must be, and be seen to be, vendor neutral and unbiased. This helps them in their other key role – working with upstream projects to incorporate the curated data into the original project.
As with committers and maintainers, curators are nominated and approved by the project community based on their merits and prior contributions. The role of curator relates to an individual, not an organization or a position in an organization. Under no circumstances is a curator held responsible for any errors or other flaws in the data merged into the service.
A ClearlyDefined data contributor is like a contributor on any other Open Source project — they identify bugs or improvements, fork the repo and contribute a pull request with their changes. For data contributors this could be a small change (e.g., spelling correction), a substantive change (e.g., identifying the license for a component), or wholesale data definition (e.g., providing data for a previously unknown component). Contributors should, as with any other project, expect to substantiate the changes with background information and proof of correctness.
A serial contributor of quality data is a candidate to become a curator.
A ClearlyDefined data consumer accesses the curated or harvested data. They understand that the data is provided as-is with no guarantees or warranties as to the correctness of the data or is suitability for any particular purpose. All data is fully qualified as to its origin and any clarifications made and it is up to the consumer to use the data, or not.
While ClearlyDefined is focused on data, the project will develop a modest amount of code. Code committership is independent of data committership. As such, code committers are elected by a vote of the existing code committer community as described below. Code committers have complete control over and responsibility for the operation of the harvesting, curation and serving infrastructure of the project.
Removal from role
In the unlikely event that a committer or curator becomes disruptive or falls inactive for an extended period of time, they may be removed from the role though a unanimous vote of the remaining set of committers or curators.
Most decisions within the project can be done through informal consensus and recorded in the appropriate public record. When a formal decision is required, for example, when electing committers/curators, a vote is held using the following process: A topic for voting is tabled by a curator by notifying all other curators; Once tabled, curators may vote during an open voting period lasting no less than one working week. Voting will occur on an agreed to, mutually convenient, and open medium (e.g., email, GitHub issue, etc.); A minimum of two positive (+1) votes and no negative (-1) votes carries the topic. Note that negative votes must be substantiated; Abstention (0) votes do not affect the outcome.
Recognition and promotion
The project may, from time to time, run programs that recognize and reward the efforts of a project to become and remain ClearlyDefined. For example, a badging program would enable eligible projects to show they are ClearlyDefined, thus increasing consumer confidence. Such recognitions may be made relative to a specific domain such as licensing or security, or in relation to the overall ClearlyDefined effort.
And finally, let’s describe the governance model. This section is entirely new, and was inspired by the governance models of other adjacent projects from the same ecosystem:
The Governing Board voting members shall consist of: the Executive Director of the Open Source Initiative, the Steering Committee Chair, and the Outreach Committee Chair.
The Governing Board responsibilities consist of:
– setting the overall strategic direction of the ClearlyDefined project, establishing main goals and identifying key priorities in accordance with feedback and input from the community;
– managing the resources of the ClearlyDefined project in a responsible and sustainable manner, including budget, infrastructure, and human resources;
– adopting and maintaining policies or rules and procedures for the ClearlyDefined project, such as a Code of Conduct and a trademark policy and any compliance or certification policies.
The Steering Committee shall be responsible for:
– setting the technical direction of the ClearlyDefined project, establishing main goals and identifying key technical priorities;
– overseeing all processes (harvest, curate, contribute, serve), ensuring that the underlying architecture enables these processes to run smoothly;
– empowering the community (data curator, data contributor, data consumer, and code committer/maintainer), providing all the technical support necessary to achieve ClearlyDefined’s mission;
– establishing open collaboration with adjacent projects that are part of the ecosystem.
The Outreach Committee shall be responsible for:
– planning and executing efforts to promote the ClearlyDefined project to potential users and contributors;
– organizing activities across events worldwide, both virtual and in person, to bring together existing community members and attract new ones;
– creating educational material (documentation, whitepapers, webinars, podcasts, etc) to help individuals and companies understand how to use and contribute to the ClearlyDefined project;
– managing communication across different channels (website, blog, social media, and press releases).
Members and Chairs
The Steering and Outreach Committees are made up of community members with a sustained contribution over time, and recognized as interested in the long term health of the ClearlyDefined project. Members of the Steering and Outreach Committee recommend and appoint new members and vote for a Chair for each Committee to serve for a one-year term. Members will be removed from the committee if they resign or if they are inactive in participating and contributing to the project for more than 6 months.
Governing Board meetings will be limited to the Governing Board members. They will be private unless decided otherwise by the Governing Board, as sensitive matters may not be made public. The Governing Board may choose to hold open, community meetings at its discretion.
Steering and Outreach Committee meetings should be held periodically (monthly or fortnightly) and are intended to be open to the public. They can be conducted electronically, via teleconference, or in person. The chair for each Committee should set the agenda ahead of time and preside over the meetings. The meeting minutes should be published and shared with the community through public channels.
While it is the goal of the ClearlyDefined project to operate as a consensus based community, if any decision requires a vote to move forward, the members of the Governing Board, Steering Committee, or Outreach Committee, as applicable, shall vote on a one vote per member basis.
Decisions by vote will be based on a majority vote, provided that at least sixty percent (60%) of the Governing Board, Steering Committee, and Outreach Committee members, as applicable, are either present, participating electronically or via electronic vote (e.g., such as by email, or in a form as specified by the board materials) in advance of the meeting.
A two-thirds majority vote will be required for any vote amending this Charter. Amendments will be communicated to the community through public channels, and will take effect immediately.
Clearly defined governance
Now the final and crucial step is to engage with all the stakeholders and ask for their feedback again. The initial feedback is what drove the changes to the charter in the first place, and we want to make sure that the changes proposed addresses the concerns from the stakeholders.
We want to avoid discussing these changes and making decisions behind closed doors. We should try to do everything we can to make the process as open and as transparent as possible from the very beginning.
By establishing a clear and open governance model, our hope is that ClearlyDefined will become more welcoming towards contributions from individuals and organizations, not just in terms of code or data contributions, but also contributions that will help govern the project itself.
Image from ktasimarr via Canva.com