Executive Summary

The Open Source Initiative (OSI) has successfully led a global, multi-stakeholder process to define and validate Open Source AI through a collaborative, inclusive, and iterative co-design approach. The resulting Open Source AI Definition (OSAID) v1.0 outlines the essential freedoms—use, study, modify, and share—that AI systems must provide to align with Open Source principles.

Key Outcomes

The Open Source AI Definition v1.0
- Developed through a rigorous, multi-phase global consultation, this definition transposes the freedoms of Open Source software to AI systems.
- The Definition is supported by a diverse group of stakeholders, including AI developers, deployers, end users, and those impacted by AI systems.
Initial list of Open Source AI systems
- The following systems successfully passed the validation process:
  Pythia (Eleuther AI), OLMo (AI2), Amber & CrystalCoder (LLM360), T5 (Google).
- Other systems like BLOOM and Starcoder2 would pass with licensing modifications.
- Systems such as Llama2, Grok, Phi-2, and Mixtral did not meet the required criteria.
A transparent and inclusive Co-Design Process
- The process engaged over 50 volunteers across 30+ countries with contributions from underrepresented communities, including women, transgender, and nonbinary individuals, and people of color.
- Co-design phases included workshops, webinars, and validation processes, balancing in-person and virtual formats to maximize global accessibility.
Lessons learned and future governance
- Balancing openness with structured processes emerged as a key challenge. Future iterations will emphasize hybrid consultation methods, clearer timelines, and governance frameworks.
- Expanding the knowledge commons through reusable resources like webinars, reports, and workshops ensures sustained community engagement.
Next steps: promotion and collaboration
- The OSI will focus on global promotion of the Definition through conferences, webinars, and media outreach in 2025.
- Collaborations with organizations like Hugging Face, Mozilla, and Carnegie Mellon University are underway to refine the Definition’s practical implementation.

The Open Source AI Definition establishes a critical foundation for transparency, innovation, and equitable AI development worldwide. Moving forward, OSI will continue fostering dialogue, refining processes, and supporting stakeholders as they implement and evaluate AI systems against this Definition.

I. The Open Source AI Definition v.1.0

Please view the text of the Open Source AI Definition v1.0 on our website and the announcement on our blog.

Initial list of Open Source AI systems

These models that passed the Validation phase:

Pythia (Eleuther AI)
OLMo (AI2)
Amber and CrystalCoder (LLM360)
T5 (Google)

There are a couple of others that were analyzed and would pass if they changed their license:

BLOOM (BigScience)
Starcoder2 (BigCode)
Falcon (TII)

Those that have been analyzed and don’t pass because they lack required components:

Llama2 (Meta)
Grok (X/Twitter whatever)
Phi-2 (Microsoft)
Mixtral (Mistral)

These are all the ones that were analyzed. How to expand the process to verify that a system conforms to the Open Source AI Definition is going to be the next exercise

II. Rationale document

Due to the technical, semantic and economic differences between software and AI systems, it became apparent quite quickly in 2022 that a simple translation of the Open Source Definition (OSD) would not be enough to apply the Open Source freedoms to AI.

We agreed that for AI, society needs at least the same essential freedoms of Open Source to enable AI developers, deployers and end users to enjoy those same benefits: autonomy, transparency, frictionless reuse and collaborative improvement.

The OSI board set a strategic objective to have a definition of Open Source for AI, a completely different domain, hoping to replicate its success. We knew that this new Definition couldn’t be the work of a single individual as it was for the free software definition and the subsequent Open Source Definition. We started in 2022 a global, multi-stakeholder discussion with the AI, data and software communities to find the principles of Open Source applied to AI. Our top objective was understand the meaning of OSD#2 “Source code: The source code must be the preferred form in which a programmer would modify the program” for AI.

At the beginning of 2023, we started pitching to community partners the idea of a process similar to the one used to define the GPLv3 to be executed during 2024 with a conclusion in 2025. The response was unanimously concerned about the lengthy timeline, with every week increasing the risk that Open Source AI would be a generic term with no clear definition and that EU regulators would come up with their own definition without community input. OSI was pushed to act fast: the term Open Source AI was being used and abused already and the whole Open Source ecosystem required guidance. Therefore, the board set a deadline to finish the process by October 2024, with two additional constraints: The Open Source AI Definition must be supported by stakeholders that include developers, deployers and end users of AI and subjects (those affected by AI decisions); additionally, it must provide positive examples of AI systems, rooted in current practice to provide a reference for interested parties.

System Creator	License Creator	Regulator	Licensee	End User	Subject
Makes AI system and/or component that will be studied, used, modified, or shared through an open source license (e.g., ML researcher in academia or industry)	Writes or edits the open source license to be applied to the AI system or component; includes compliance (e.g., IP lawyer)	Writes or edits rules governing licenses and systems (e.g. government policy-maker)	Seeks to study, use modify, or share an open source AI system (e.g. AI engineer, health researcher, education researcher)	Consumes a system output, but does not seek to study, use, modify, or share the system (e.g., student using a chatbot to write a report, artist creating an image)	Affected upstream or downstream by a system output without interacting with it intentionally; includes advocates for this group (e.g. people with loan denied, or content creators)

Stakeholder categories identified by OSI board

OSI staff conducted consultations in many styles that suited the expectations and needs of various kinds of stakeholders in the early months of 2023. Finally brought all their perspectives together in a public context starting with the launch of the public co-design process at All Things Open, October 2023 that continued through 2024.

III. Research and co-design process

The co-design method was chosen because a global definition requires a global consultation. Co-design is a methodology for making decisions with diverse stakeholders. Our goal was to design the Open Source AI Definition (OSAID) with the people who would create, deploy, use, and be subject to Open Source AI systems, and to be as global, equitable, and inclusive as possible in that work, giving a place to everyone and special favor to no one. The definition that starts this paragraph actually emerged from the co-design process, suggested during a workshop in Buenos Aires by an Argentinian open source strategist, Maria Cruz. The fact that even our co-design definition was created by a stakeholder is emblematic of the international, collaborative process used to create the OSAID.

This approach was not uncontroversial. Making global tech decisions through global consultation represents a departure from past methods in which tech experts and activists from the Global North held disproportionate power as compared to the Global Majority (aka Global South) in deciding what is true, right, and best for Open Source. This has continued to be a point of challenge illustrating that changing culture in this area will be ongoing work.

Co-design is a set of participatory methods that share knowledge and power. Every person who volunteered for a role in the co-design process was given one. We were also careful to ensure all the identified stakeholder groups were represented, further challenging traditional notions of “expertise”.

Also emblematic of the spirit of the OSAID co-design process is the story of Rahmat Akintola (left), who was featured on the OSI blog. Rahmat is the Program Lead for Women in Machine Learning and Data Science (WiMLDS) in Accra, Ghana. As part of its effort to ensure the inclusion of women of color from the Global South in the OSAID co-design process, Do Big Good, the co-design firm OSI hired in the fall of 2023, conducted focused outreach to this and similar organizations in Sub-Saharan Africa.

Rahmat joined the OSAID co-design process as a member of the OpenCV Workgroup and then volunteered to present the OSAID at the Deep Learning Indaba in Dakar in September, the premier AI / ML conference in Africa. This path from inclusive outreach to workgroup participation to public advocacy, funded by a grant from Alfred P. Sloan Foundation, is what equitable and global co-design is all about and crucial to achieving a definition that is global in scope.

Among the 50+ co-design volunteers in the process, nearly 30 countries of origin and residence are represented, including participants from Africa, Asia, Europe, and the Americas. We estimate that 31% are OSAI developers, 46% deployers, 90% end users, and nearly all have been subjects of OSAI through upstream or downstream data usage. Over 30% are women, transgender, and nonbinary and over 40% are black, indigenous, and other people of color.

This section describes the co-design phases in the development of the OSAID. The first phase describes OSI’s activities in 2022 through 2023. Phases two through five describe activities in late 2023 through 2024, when Do Big Good was brought in to manage and implement the co-design process.

Phase 1: Preliminary research (Jul 2022 – Dec, 2023)

In 2022, the Open Source Initiative started coordinating a global process to sharpen collective knowledge and identify the principles that eventually lead to the OSAID. Under the name “Deep Dive: AI”, the OSI mapped the issues of Open Source and AI. This project consisted of a global conversation made of six podcast episodes (with experts Pamela Chestek, Alek Tarkowski, Connor Leahy, David Gray Widder, Mo Zhou, and Bruce Draper) and four online panel discussions (with experts Astor Nummelin Carlberg, David Kanter, Sal Kimmich, Stella Biderman, Alek Tarkowski, Kat Walsh, Luis Villa, Carlos Muñoz Ferrandis, Kit Walsh, Pamela Chestek, Jennifer Lee, Danish Contractor, Adrin Jalali, Chris Albon, Ibrahim Haddad, Mark Surman, and Amy Heineike).

In early 2023, a comprehensive report was published to further socialize the outcomes and inform the next phases of work. The key learning from this initial phase was that the traditional view of Open Source software licensing is insufficient to cover the complexity of AI systems. Key questions emerged for the next phase: What does it mean for an AI system to be Open Source? What policies are needed to both nurture innovation and protect individuals and society as a whole from harm?

In September 2023, the OSI hosted a webinar series to better understand the AI space. Speakers from law, academia, NGOs, enterprise, and the Open Source community shared their thoughts on pressing issues and offered potential solutions in our development and use of AI systems. A total of 18 webinars were shared bringing together 37 experts. A second report was published in late 2023

In 2023, with the participation of Do Big Good, OSI hosted three in-person co-design workshops in the United States and Africa to determine how the Free Software Foundation’s four freedoms to study, use, modify and share an Open Source system should apply to AI.

Question: Use, study, modify, share: What should these open source principles mean for artificial intelligence?
Method: In-person co-design workshops in Monterey, Raleigh, and Addis Ababa where participants drafted and edited the text of the four freedoms for OSAI. The results of that process still appear in the current version of the definition:
- Use: the system for any purpose and without having to ask for permission.
- Study: how the system works and inspect its components.
- Modify: the system for any purpose, including to change its output.
- Share: the system for others to use with or without modifications, for any purpose.
Workshop Participants: During this phase of the co-design process, participants were not asked to publicly share their names and affiliations. This omission in transparency was remedied in subsequent co-design phases.
Objective: Transpose the “four freedoms” of the free software definition to AI.

Phase 3: System analysis (Feb – Mar, 2024)

At the end of the second phase, we received stakeholder feedback that the co-design process was exclusionary because it was only happening in-person, and there were many stakeholders who could not attend the workshops (one of the reasons why we reached out to the Alfred P. Sloan Foundation to support a global outreach effort).

We took this feedback into account and, after one more in-person session at AI_dev in San Jose, we shifted to an entirely virtual process for the third phase. Co-design volunteers conducted small group analysis on four systems self-described as open to develop a proposal on which components should be included in the preferred form. This post clarifies that the intention of this phase was to explore avenues to unlock the conversation that got us stuck debating “data”: we needed to get a better sense from AI practitioners of what they need to exercise the four freedoms.

Question: What components must be open in order for an AI system to be used, studied, modified, and shared?
Method: One in-person session in San Jose, followed by four virtual workgroups focused on Bloom, OpenCV, Llama 2, and Pythia, four systems with different approaches to OSAI openness.
- We started with a list of AI system components created by a pre-release of the Model Openness Framework (MOF), a Linux Foundation project.
- In February, workgroup members were invited to vote on whether or not each of the MOF components were required to study, use, modify, and share the system.
- Workgroup members voted using their initials, so it would be transparent which members saw which components. Votes were recorded and tabulated on a public spreadsheet.
- When tabulation occurred, we didn’t notice that the Llama 2 group had a -1 option that subsequent groups lacked. This was an oversight that didn’t impact the result (as this discrepancy was only highlighted in September 2024 on the forum, we removed the -1 votes and re-tabulated the data, ending up with the same results.)
- The purpose of the voting was to give a stakeholder-based signal of component priorities for the preferred form, which would then be commented upon and critiqued publicly in the forum. There has been ample opportunity to comment on the outcomes from the initial voting process.
- We shared the results of the tabulation for comment on the forum on March 1st. The results were criticized for “wasting time” analyzing LLama, which clearly would never pass as Open Source.
- The recommendation results from the tabulation were:
  - Required: Training, validation, and testing code; Inference code; Model architecture; Model parameters; Supporting libraries & tools
  - Likely Required: Data preprocessing code
  - Maybe Required: Training datasets; Testing datasets; Usage documentation; Research paper
  - Likely Not Required:Model card; Evaluation code; Validation datasets; Benchmarking dataset; All other data documentation
- Further down in the thread, we clarified that a line was drawn arbitrarily between “maybe required” and “likely required” to test the hypothesis for the next co-design step: If the component “training dataset” is not required, do we have any clearly non-open-source bycatch (like Llama?)
- We integrated the recommended components in version 0.0.6 on March 10th, which was also shared for public comment.
Members:
- These and other co-design groups were selected from two source: those who responded to public calls for participation on the forum or listserv and focused outreach by Mer Joyce and Kayla Cody-Lushuzi of Do Big Good to bring in excluded groups, such as women, trans, and nonbinary folks; black, indigenous, and other people of color; and people from Asia and the Global South.
- Llama 2 Workgroup

Bastien Guerry DINUM / France
Ezequiel Lanza Intel / Argentia
Roman Shaposhnik Apache Software Foundation / Russia
Davide Testuggine Meta / Italy
Jonathan Torres Meta / USA
Stefano Zacchiroli Polytechnic Institute of Paris / Italy
Mo Zhou Debian, Johns Hopkins University / China
Victor Lu independent consultant / USA

BLOOM Workgroup

George C. G. Barbosa Fundação Oswaldo Cruz / Brazil
Daniel Brumund GIZ FAIR Forward – AI for All / Germany
Danish Contractor BLOOM Model Governance Workgroup / Canada
Abdoulaye Diack Google / Ghana
Jaan Li University of Tartu, Phare Health / Estonia
Jean-Pierre Lorre LINAGORA, OpenLLM / France
Ofentse Phuti WiMLDS Gaborone / Botswana
Caleb Fianku Quao Kwame Nkrumah University of Science and Technology, Kumasi / Ghana

Pythia Workgroup

Seo-Young Isabelle Hwang Samsung / South Korea
Cailean Osborne University of Oxford / UK
Stella Biderman EleutherAI Institute / USA
Justin Colannino Microsoft / USA
Hailey Schoelkopf EleutherAI Institute / USA
Aviya Skowron EleutherAI Institute / Poland

OpenCV Workgroup

Rahmat Akintola WiMLDS Accra / Ghana
Dr. Ignatius Ezeani Lancaster University, UK, Nnamdi Azikiwe University, Nigeria, Masakhane NLP / Nigeria
Kevin Harerimana CMU Africa / Rwanda
Satya Mallick OpenCV / USA
David Manset ITU / France
Phil Nelson OpenCV / USA
Tlamelo Makati WiMLDS Gaborone, Technological University Dublin / Botswana
Minyechil Alehegn Tefera Mizan Tepi University / Ethiopia
Akosua Twumasi Ghana Health Service / Ghana
Rasim Sen Oasis Software Technology Ltd. / UK

Phase 4: System validation (May – Jul, 2024)

In the next phase, we sought to verify which AI systems met the criteria of the OSAID, a requirement of the Board and a common question among stakeholders. Enabled by the results of the previous phase we tested a working hypothesis: If the training dataset is not required, do we keep Pythia (whose dataset is legally challenged in the US) in the Open Source AI fold while we don’t catch Grok, Phi or Llama?

Volunteers reviewed 13 AI systems self-described as open, yet the process was difficult. Most volunteers could not find all the documentation necessary to verify that the required components were available to study, use, modify, and share.

We see the difficulty of the validation process as a reason for OSI to continue to certify licenses, as it does for software, rather than trying to certify individual AI systems. This means that the collaboration of system creators is necessary to certify systems, as they’re the best positioned to provide the list of components and their legal terms.

Question: Which AI systems meet the criteria of the OSAID?
Method: Through a public call for participation, volunteers signed up to review a total of 13 systems self-described as open (list below). They used versions 0.0.6 through 0.0.8 of the OSAID as references.
- All review spreadsheets were posted publicly to maximize transparency.
- Most of the review work took place in May, 2024.
- Whenever possible, each system was reviewed by at least one person not affiliated with the system. LLM360 is self-certified.
- Most volunteers were not able to complete their reviews or reach conclusions on the openness of the system because of difficulty finding necessary documentation publicly on the internet.
- The results we were able to collect are in the table on the previous page.
Reviewers:
- 1. Arctic
  - Jesús M. Gonzalez-Barahona Universidad Rey Juan Carlos / Spain
- 2. BLOOM
  - Danish Contractor BLOOM Model Governance Workgroup / Canada
  - Jaan Li University of Tartu, Phare Health / Estonia
- 3. Falcon
  - Casey Valk Nutanix / USA
  - Jean-Pierre Lorre LINAGORA, OpenLLM / France
- 4. Grok
  - Victor Lu independent consultant / USA
  - Karsten Wade Open Community Architects / USA
- 5. Llama 2
  - Davide Testuggine Meta / Italy
  - Jonathan Torres Meta / USA
  - Stefano Zacchiroli Polytechnic Institute of Paris / Italy
  - Victor Lu independent consultant / USA
- 6. LLM360
  - Victor Miller LLM360 / USA
- 7. Phi-2
  - Seo-Young Isabelle Hwang Samsung / South Korea
- 8. Mistral
  - Mark Collier OpenInfra Foundation / USA
  - Jean-Pierre Lorre LINAGORA, OpenLLM / France
  - Cailean Osborne University of Oxford / UK
- 9. OLMo
  - Amanda Casari Google / USA
  - Abdoulaye Diack Google / Ghana
- 10. OpenCV
  - Rasim Sen Oasis Software Technology Ltd. / UK
- 11. Pythia
  - Seo-Young Isabelle Hwang Samsung / South Korea
  - Stella Biderman EleutherAI Institute / USA
  - Hailey Schoelkopf EleutherAI Institute / USA
  - Aviya Skowron EleutherAI Institute / Poland
- 12. T5
  - Jaan Li University of Tartu, Phare Health / Estonia
- 13. Viking
  - Merlijn Sebrechts Ghent University / Belgium

Phase 5: Workshop about training data (Sept – Oct, 2024)

Note: Participants gave verbal consent for these photos to be taken. We are waiting on their written consent before these photos are disseminated publicly. (MJ 16/10/24)

Because the OSAID position on training data was the most contentious outcome of the co-design process, we decided to host a workshop specifically to provide recommendations on how training datasets should be designed, licensed, and regulated in open source AI systems.

Question: How should training datasets be designed, licensed, and regulated in open source AI?
Method: On October 10th and 11th, we brought together 18 data and OSAI experts from 15 countries for a two-day workshop in Paris to co-design recommendations on OSAI data. Mer Joyce facilitated both days of the workshop. This was our process:
- Preparation – In September, Alek Tarkowski of Open Future wrote a draft of the white paper for participants to comment on before the workshop. From these comments emerged three topic areas (dataset design, licensing, and regulation), as well as the structure of the workshop, which would begin with brainstorming and end with small group development of proposals.
- Day 1 – We collected and prioritized a broad array of solutions for open, public, obtainable, and unshareable data across the three topic areas, using post-its to record suggestions. The day ended with voting to prioritize these suggestions.
- Day 2 – On the second day, we split into small groups connected to the three thematic areas (design, licensing, regulation) and each group developed specific proposals in these areas, based on the brainstorming and prioritization from the day before. Participants self-documented their proposals and discussion notes.
- Next Steps – Recommendations from the participants have been incorporated into the white paper and shared again for comments in early November. The white paper is being finalized for publication.
- The framing of the discussion in Paris has been published in a blog post. The white paper is complementary to, but not contingent on, the release of the OSAID.
Workshop Participants:

Dr. Ignatius Ezeani – Lancaster University, UK, Nnamdi Azikiwe University, Nigeria, Masakhane NLP / Nigeria
Masayuki Hatta – Surugadai University / Japan
Aviya Skowron – EleutherAI Institute / Poland
Stefano Zacchiroli – Polytechnic Institute of Paris / Italy
Ricardo Mirón – Digital Public Goods Alliance / Mexico
Kristina Podnar – Data and Trust Alliance / Croatia + USA
Joana Varon – Coding Rights / Brazil
Renata Avila – Open Knowledge Foundation / Guatemala
Alek Tarkowski – Open Future / Poland
Maximilian Gantz – Mozilla Foundation / Germany
Stefaan Verhulst – GovLab / USA + Belgium
Paul Keller – Open Future / Germany
Thom Vaughn – Common Crawl / UK
Julie Hunter – LINAGORA / USA
Deshni Govender – GIZ FAIR Forward AI for All / South Africa
Ramya Chandrasekhar – CNRS – Center for Internet and Society / India
Anna Tumadóttir – Creative Commons / Iceland
Stefano Maffulli – Open Source Initiative / Italy

Stakeholder Feedback

Below are quotes from participants who played a variety of roles in the co-design process:

The codesign process allowed me to see first hand the thought process of people all over the world about what is open source AI. It may never be possible for all the people to agree on the definition. But It is a wonderful start and I think everyone will agree that the open discussions, seminars, townhall meetings, follow up surveys, emails are all very effective and “democratic” 🙂

– Victor Lu, Llama 2 Workgroup Member and System Validator

[What I appreciated about the workshop was] the diversity of attendees’ perspectives, how the conversation was facilitated (prep ahead of time so we could get a running start) and the constructive nature of taking this white paper forward. [I just wish we had] a bit more time… maybe starting earlier on Thursday would have been good. Else, everything was great. Thank you for making good use of time, creating a collaborative and open environment, and representing as much diversity as possible.

– Anonymous Participant, Data in OSAI Workshop

It was a great experience working with the open AI team and contributing to this important initiative. We look forward to seeing the release version and witnessing the impact it will have on the AI community.

– Rasim Sen, OpenCV Workgroup Member and System Validator

During the OSAID process, I had the chance to collaborate with members from various continents and time zones. It was an interesting experience as sometimes I found myself waking up at 2 am in my pajamas for a Zoom call! 😉 Through both synchronous discussions within our working group (WG) meetings and asynchronous conversations on the web forum, I gained valuable insights into diverse collaboration methods.

– Seo-Young Isabelle Hwang, Pythia Workgroup Member and System Validator

Thank you for making good use of time, creating a collaborative and open environment, and representing as much diversity as possible.

– Anonymous Participant, Data in OSAI Workshop

In my experience, the co-design process was seamless and straightforward to take part in. Even though the process was virtual, it was transparent and simple to follow at every stage.

– Rahmat Akintola, OpenCV Workgroup Member

The debate over what is or isn’t open source AI often seems like an infinite tug-of-war between those who argue for relatively light-touch requirements ( basically, open-weight models) and those who argue for maximal transparency of models and their constituent parts, as well as all the various views and concerns in between these two poles.

While it is healthy to have divergent views in the open source AI community, it’s becoming more and more urgent to build consensus, especially as we now have regulations like the AI Act that introduce requirements and exceptions for the providers of open source AI systems even in the absence of a definition of open source AI systems.

Towards this end, the co-design process has been an excellent way to bring in the diverse views of experts from various corners of the world and through open debate figure out what we can agree on and what not.

Given the high stakes of the open source AI definition, I hope that the co-design process can continue and that we can work towards a definition that works for the community.

– Cailean Osborne, Pythia Workgroup Member

I like everything that has to do with the transparency of Artificial Intelligence algorithms. For my part, I am focused on the transparency of Machine Learning models: trying to decipher the billions of calculations that make them up and explaining them to those who are in the field and those who are not.

In the same way, I value the search for transparency in terms of the data with which these models are trained and the way in which they are obtained, as well as in the design of the code. This is why I highly value the work of the Open Source AI Definition and consider it vitally important to ensure transparency.

-OSAID Presentation Participant, Argentina (translated from Spanish)

IV. Timeline

A list of all the consultation points (meeting dates / locations; total number of threads on the discussion site etc) and a list of all contributors (one big list, not per consultation point

Deep Dive: AI Podcasts 2022

Welcome to Deep Dive: AI (Stefano Maffulli – July, 2022)
Copyright, selfie monkeys, the hand of God (Pamela Chestek – Aug 16, 2022)
Solving for AI’s black box problem (Alek Tarkowski – Aug 23, 2022)
When hackers take on AI: Sci-fi – or the future? (Connor Leahy – Aug 30, 2022)
Building creative restrictions to curb AI abuse (David Gray Widder – Sep 6, 2022)
Why Debian won’t distribute AI models any time soon (Mo Zhou – Sep 13, 2022)
How to secure AI systems (Bruce Draper – Feb 9, 2023)

Deep Dive: AI Panels 2022

Exploring the business side of AI (Astor Nummelin Carlberg, David Kanter, Sal Kimmich, Stella Biderman, Alek Tarkowski – October 11, 2022)
Exploring the society side of AI (Kat Walsh, Luis Villa, Carlos Muñoz Ferrandis, Kit Walsh – October 13, 2022)
Exploring the legal side of AI (Pamela Chestek, Jennifer Lee, Danish Contractor, Adrin Jalali – October 18, 2022)
Exploring the academia side of AI (Chris Albon, Ibrahim Haddad, Mark Surman, Amy Heineike – October 20, 2022)

Deep Dive: AI Webinars 2023

Deep Dive: AI Webinar Series (September, 2023)
The Turing Way Fireside Chat: Who is building Open Source AI? (Jennifer Ding, Arielle Bennett, Anne Steele, Kirstie Whitaker, Marzieh Fadaee, Abinaya Mahendiran, David Gray Widder, Mophat Okinyi)
Operationalising the SAFE-D principles for Open Source AI (Kirstie Whitaker, David Leslie, Victoria Kwan)
Commons-based data governance (Alek Tarkowski, Zuzanna Warso)
Preempting the Risks of Generative AI: Responsible Best Practices for Open-Source AI Initiatives (Monica Lopez)
Data privacy in AI (Michael Meehan)
Perspectives on Open Source Regulation in the upcoming EU AI Act (Katharina Koerner)
Data Cooperatives and Open Source AI (Tarunima Prabhakar, Siddharth Manohar)
Fairness & Responsibility in LLM-based Recommendation Systems: Ensuring Ethical Use of AI Technology (Rohan Singh Rajput)
Challenges welcoming AI in openly-developed open source projects (Thierry Carrez, Davanum Srinivas, Diane Mueller)
Opening up ChatGPT: a case study in operationalizing openness in AI (Andreas Liesenfeld, Mark Dingemanse)
Open source AI between enablement, transparency and reproducibility (Ivo Emanuilov, Jutta Suksi)
Federated Learning: A Paradigm Shift for Secure and Private Data Analysis (Dimitris Stripelis)
Should OpenRAIL licenses be considered OS AI Licenses? (Daniel McDuff, Danish Contractor, Luis Villa, Jenny Lee)
Copyright — Right Answer for Open Source Code, Wrong Answer for Open Source AI? (McCoy Smith)
Should we use open source licenses for ML/AI models? (Mary Hardy)
Covering your bases with IP Indemnity (Justin Dorfman, Tammy Zhu, Samantha Mandell)
The Ideology of FOSS and AI: What “Open” means to platforms and black box systems (Mike Nolan)

OSAID Conferences and Meetings 2023/2024

June, 2023

First OSAID Meeting (Jun. 2023 – San Francisco)
- Towards a definition of “Open Artificial Intelligence” (OSI Board members, Mozilla Foundation, Creative Commons, Wikimedia Foundation, Internet Archive, Linux Foundation Europe, OSS Capital)

July, 2023

FOSSY (Jul. 13-15, 2023 – Portland)
- Workshop – Defining Open Source AI (Stefano Maffulli)
Campus Party Brazil (Jul. 25-29, 2023 – Sao Paulo)
- The future of Artificial Intelligence: Sovereignty and Privacy with Open Source (Nick Vidal, Aline Deparis)
Open Source Congress (Jul. 27-28, 2023 – Geneva)
- Panel: Does AI Change Everything? What is Open? Liability, Ethics, Values? (Stefano Maffulli, Joanna Lee, Satya Mallick, Mohamed Nanabhay, Ibrahim Haddad)
- Defining “Open” AI/ML (Stefano Maffulli)

September, 2023

Open Source Summit Europe (Sept. 19-21, 2023 – Bilbao)
Nerdearla (Sept. 26-30, 2023 – Buenos Aires)
- Celebrating 25 years of Open Source (Nick Vidal)

October, 2023

All Things Open (Oct. 15-17, 2023 – Raleigh)
- Workshop: Open Source AI definition: preview, feedback and working session (Mer Joyce, Stefano Maffulli)
Latinoware (Oct. 18-20 – Foz do Iguacu)
- The future of Artificial Intelligence: Sovereignty and Privacy with Open Source (Nick Vidal, Aline Deparis)
Linux Foundation Member Summit (Oct. 24-26, 2023 – Monterey)
- Workshop: Define “Open AI” (Stefano Maffulli, Mer Joyce)
- Why Open Source AI Matters: The Community & Policy Perspective (Mary Hardy, Stefano Mafulli, Mike Linksvayer, Katharina Koerner)

November, 2023

DPGA Member Meeting (Nov. 14, 2023 – Addis Ababa)
- Workshop: Define “Open AI” (Stefano Maffulli, Nicole Martinelli)

December, 2023

AI.dev (Dec. 12-13, 2023 – San Francisco)
- Panel Discussion: Why a Universal Definition of ‘Open Source AI’ is Essential for Humanity (Roman Shaposhnik, Tanya Dadasheva, Nithya Ruff, Sal Kimmich)
- Workshop: Define “Open AI” (Mer Joyce, Ruth Suehle)

February, 2024

FOSDEM (Feb. 3-4, 2024 – Brussels)
- Moving a step closer to defining Open Source AI (Stefano Maffulli)
Columbia Convening on openness and AI (Feb. 29, 2024 – New York)

April, 2024

Open Source Summit – North America (April 16, 2024 – Seattle)
- The Open Source AI dilemma: Crafting a clear definition for Open Source AI (Mer Joyce, Ofer Hermoni)
LLW Gothenburg (April 16, 2024 – Gothenburg)

May, 2024

PyCon (May 15 -17, 2024 – Pittsburgh)
- OSI Workshop: Open Source AI Definition (Mer Joyce, Stefano Maffulli)
CPDP-ai (May 22-24, 2024 – Brussels)
- Challenges and Opportunities of Open-Source Artificial Intelligence (Stefano Maffulli, Felicity Reddel, Igancio Sanchez, Michel-Marie Maudet, Achim Klabunde)

June, 2024

OW2conf (June 11-12, 2024 – Paris)
- How we’re getting the Open Source AI Definition (Stefano Maffulli)
OpenExpo Europe (June 13, 2024 – Spain)
- Skynet no será Open Source (Ariel Jolo)
AI_Dev Europe (June 19-20, 2024 – Paris)
- The Open Source AI dilemma: Crafting a clear definition for Open Source AI (Ofer Hermoni)

July, 2024

OSPOs for Good (July 9-10, 2024 – New York)
- Open Source and AI Panel (Ashley Kramer, Craig Ramlal, Mehdi Snene, Sasha Luccioni, Sergio Gaggo Huerta, Stefano Maffulli)
What’s Next for Open Source (July 11, 2024 – New York)
- Open-Source AI: What Is It, Why Does It Matter, and How to Ensure Responsible Openness – Moderated by Mer Joyce, Do Big Good & Lea Gimpel, Digital Public Goods Alliance (DPGA)
Sustain Africa (July 15, 2024 – Online)

August, 2024

KubeCon + AI_dev Hong Kong (Aug. 21-23, 2024 – Hong Kong)
- Unveiling the Future: Nurturing Openness in AI Development (Mer Joyce, Anni Lai)
Open Source Congress (Aug. 25-27, 2024 – Beijing)
- Datasets, Privacy, and Copyright (Stefano Maffulli, Donnie Dong)
- The Open Source AI Definition (Stefano Maffulli)

September, 2024

Deep Learning Indaba (Sept. 1-7, 2024 – Dakar)
- Open Source AI Definition (Rahmat Akintola)
India FOSS (Sept. 7-8, 2024 – Bengaluru)
- Open Source AI Definition (Tarunima Prabhakar)
Open Source Summit Europe (Sept. 16-18 , 2024 – Vienna)
- The Open Source AI Definition Is (Almost) Ready (Stefano Maffulli, Justin Colannino)
- Why Open Source AI Matters for Europe (Justin Colannino, Sachiko Muto, Stefano Maffulli, Cailean Osborne)
Nerdearla (Sept. 24-28, 2024 – Buenos Aires)
- Defining Open Source AI (Mer Joyce)

October, 2024

Open Forum for AI (Oct. 4, 2024 – Washington DC)
- Open Source AI Definition (Deb Bryant)
Training Data in OSAI (Oct. 10-11, 2024 – Paris)
- Workshop (Ignatius Ezeani, Masayuki Hatta, Aviya Skowron, Stefano Zacchiroli, Ricardo Torres, Kristina Podnar, Joana Varon, Renata Avila, Alek Tarkowski, Maximilian Gantz, Stefaan Verhulst, Paul Keller, Thom Vaughn, Julie Hunter, Deshni Govender, Ramya Chandrasekhar, Anna Tumadóttir, Stefano Maffulli)
Open Community Experience (Oct. 22-24, 2024 – Mainz)
- Listening, Learning, Leading: The road to the Open Source AI Definition (Stefano Mafulli)
- Open Source AI and the EU AI Act (Enzo Ribagnac, Gaël Blondelle, Ansgar Lindwedel)
All Things Open (Oct 27-29, 2024 – Raleigh)
- Co-designing the Open Source AI Definition (Stefano Maffulli, Mer Joyce)
- Open Source AI Definition Stable Release Presentation (Stefano Maffulli, Mer Joyce)

November, 2024

SFSCON (Nov. 8-9, 2024 – Bolzano, Italy)
- Open Source in EU policy (Jordan Maris)
Digital Public Goods Alliance Annual Members Meeting (Nov. 13-15, 2024 – Singapore)
The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- OSI Open Source AI Definition Update and Q&A (Stefano Maffulli)

Co-Design Town Halls 2024

January, 2024
- Jan 12, 2024 – Townhall
- Jan 26, 2024 – Townhall
February, 2024
- Feb 9, 2024 – Townhall
- Feb 23, 2024 – Townhall
March, 2024
- March 8, 2024 – Townhall
- March 22, 2024 – Townhall
April, 2024
- April 5, 2024 – OSAID Townhall
- April 19, 2024 – OSAID Townhall
May, 2024
- May 3, 2024 – OSAID Townhall
- May 31, 2024 – OSAID Townhall
June, 2024
- June 14, 2024 – OSAID Townhall
- June 28, 2024 – OSAID Townhall
July, 2024

August, 2024
- August 23, 2024 – OSAID Townhall
September, 2024
October, 2024

V. Initial list of supporters (Endorsements)

The list of endorsers announced at the launch of version 1.0 is below. The full and most up to date list is available on OSI website.

Institutional

Developers
- EleutherAI Institute
- CommonCrawl
- George Washington University OSPO
- LLM360
- LINAGORA
- Women In Machine Learning and Data Science – Accra
Deployers
- Mozilla Foundation
- Mercado Libre
- SUSE
- Kaiyuanshe
- Eclipse Foundation
End Users
- Bloomberg
- Open Infrastructure Foundation
- Interministerial Directorate of Digital Affairs (DINUM)
- Nextcloud
- sysarmy
Subjects
- Digital Public Goods Alliance
- OpenForum Europe
Academia
- Carnegie Mellon University OSPO
- Georgia Tech University OSPO
- Washington University OSPO

Individuals

Sayash Kapoor
Arvind Naranian
Percy Liang
Victor Lu
Kevin Harerimana
George C. G. Barbosa
Dr. Ignatius Ezeani
Seo-Young Isabelle Hwang
Cailean Osborne
Tlamelo Makati
Stefano Zacchiroli
Shuji Sado
Felix Reda

VI. Divergent opinions

As more and more groups express support for the Open Source AI Definition, we want to keep track of the concerns raised by others. Below is a list of issues raised so far, with no added commentary, explanation or judgment:

List of comments received

We’ve received comments during the most heated discussions

On the availability of training data: All the data used to train an AI system should be openly available, as it’s essential for understanding and improving the model.
1. Synthetic data: If releasing the original data is not feasible, providing synthetic data and a clear explanation can be helpful.
2. Pre-training dataset distribution: The dataset used for pre-training should also be accessible to ensure transparency and allow for further development.
3. Dataset documentation: The documentation for training datasets should be thorough and accurate to address potential issues.
4. Versioning: To maintain consistency and reproducibility, versioned data is crucial for training AI systems.
Reproducibility: The Definition should say that Open Source AI must be reproducible using the original training data, scripts, logs and everything else used by the original developer.
About the co-design process:
1. The co-design process as conducted was not democratic and ultimately unfair, voting was the wrong method, the selection of the volunteers was biased, the results didn’t show any consensus and many other issues.
2. Some companies reported that they didn’t have the chance to offer an official position of neither endorsement nor requests for modifications to the text. Despite the fact that some people contributed to the co-design process as volunteers in their official capacity, the quick pace of development together with the fully transparent process didn’t leave the company representatives the time to escalate up the corporate decision chain for comments to become the official statements.

VII. Press release

The announcement was published on October 28th, 2024 on OSI’s official website.

VIII. Further revisions

In the short term, the OSI will use the forum to collect the experience of AI builders interacting with the Definition. The team will reach out to groups interested in evaluating AI systems for compliance, offer them guidance on how to interpret the wording in the Definition. We started conversations with Hugging Face, Carnegie Mellon University, Mozilla, Google and others have expressed interest.

The AI Committee will monitor the conversations and offer suggestions to review the text of the Definition at quarterly intervals.

IX. Lessons learned

The Open Source AI Definition (OSAID) process has been a pioneering initiative, and while it achieved significant milestones, it also provided valuable insights for future efforts. Key lessons learned from the process include the following:

1. Balancing openness with structure

The co-design methodology fostered inclusivity by welcoming diverse stakeholders, yet its openness also posed challenges. Some corporate stakeholders found the process too open-ended, leading to disengagement, while others critiqued the lack of cohesion among co-design activities. The lesson here is to establish a clear blueprint early on, ensuring participants have a shared understanding of the process and its objectives. Introducing mechanisms for greater cross-pollination and coordination across working groups could enhance cohesiveness and engagement.

2. Managing inclusivity and accessibility

Efforts to include voices from the Global South and underrepresented communities were a notable success, exemplified by stories like Rahmat Akintola’s journey from participant to advocate. However, different formats—such as in-person workshops and online forums—were alternately praised and criticized for their accessibility. Future processes should adopt a hybrid approach from the outset, carefully designing inclusive formats that balance accessibility and participation equity. Providing preparatory learning resources beforehand can help level the playing field for all participants.

3. Public feedback and consensus building

The decision to integrate stakeholder feedback only in public discussions increased transparency to the OSAID, but discriminated against some stakeholders. Also, the rapid pace of the process occasionally hindered consensus building. The use of voting in one phase was misunderstood, perceived by few as a democratic representation tool. Others criticized the lack of time to engage their corporate employers to formally present opinions. In future initiatives, a longer timeline with built-in intervals for reflection, and different consensus-building processes adapted to stakeholders’ needs, can foster trust and allow participants to engage at a deeper level.

4. Expanding the knowledge commons

One of the most significant achievements was the creation of reusable resources, including podcasts, webinars, white papers, and recordings of town halls. These materials have contributed substantially to the knowledge commons, setting a benchmark for future projects. This demonstrates the value of documenting and sharing outputs systematically to extend their impact beyond the immediate project.

5. Reflections on governance and maintenance

The co-design process highlighted the need for ongoing governance, education, and maintenance of the OSAID. Establishing a clear governance framework with defined roles for stakeholders, mechanisms for periodic reviews, and strategies for addressing divergent opinions will be critical for the long-term success and credibility of the Definition.

A recurring theme in feedback was the need to use these lessons as a springboard for next steps, thereby to ensure the OSAID remains a living document, reflective of and responsive to the needs of the Open Source AI ecosystem.

X. 2025 follow-up plan

Next year the activities of OSI will switch to promotion and education. In parallel, the OSI will partner with other organisations to continue the validation of the Open Source AI Definition v.1.0 in order to record its critical points.

OSI will present the results of the co-design process and the version 1.0 at conferences around the world. We will engage with volunteers to present the Definition to keep travel costs and burden at a minimum and grow a community of supporters. An initial list of the top conferences OSI directly aiming at are below. OSI’s community manager is already reaching out to the co-design volunteers to identify local opportunities.

Besides the in-person conferences, the OSI will host a webinar/podcast series in the second half of 2025, interviewing AI builders to understand how they’re working in practice with the Open Source AI Definition v1.0.

Additionally, the leadership of OSI will start a media campaign to promote awareness of the OSAID, as well as by commenting on issues of importance to Open Source. The organization will maintain a robust social media presence, engaging with communities interested in expanding the role of Open Source in society.

Depending on budget availability and allocation, this plan may expand further or shrink.

Events list:

FOSDEM, Brussels, February 1 – 2
AI & Big Data Expo, London, February 5 – 6
SCALE, California, March 6 – 9
SXSW, Austin, March 7 – 15
KHIPU, Santiago de Chile, March 10 – 14
Open Expo Europe, Madrid, May 8
ODSC East, Boston, May 13 – 15
PyCon US, Pittsburgh, May 16 – 18
AI For Good, Geneva, May
OSS Summit NA, Denver, June 23 – 25
AI Risk Summit, California, June
R.AI.SE Summit, Paris, July 8 – 9
OSPOs 4 Good, New York (United Nations), July
Ghana Data Science Summit, Ghana, July
Open Source Congress, August
Ai4, Las Vegas, August
DEF CON (AI Village), Las Vegas, August 7 – 10
OSS Summit Europe, Amsterdam, August 25 – 27
Deep Learning Indaba, Dakar, September
Nerdearla Argentina, Buenos Aires, September
The AI Conference, San Francisco, September
All Things Open, Raleigh, October 12 – 15
TED AI, San Francisco, October
Dot AI, Paris, October
AI_dev Japan, Tokyo, October
MIT AI Conference, New York, October
TechCrunch Disrupt, San Francisco, October
GitHub Universe, San Francisco, October
Open Community Experience, Mainz, October
GovAI Summit, Arlington, October
Nerdearla Spain, Madrid, November 13 – 15
Linux Foundation Legal Summit, California, November
IEEE World Technology Summit, California, November
SeaGL, Seattle, November
The AI Summit, New York, December

Table of Contents

Executive Summary

Key Outcomes

I. The Open Source AI Definition v.1.0

Initial list of Open Source AI systems

II. Rationale document

III. Research and co-design process

Phase 1: Preliminary research (Jul 2022 – Dec, 2023)

Phase 2: Four Freedoms refinement (Oct – Nov, 2023)

Phase 3: System analysis (Feb – Mar, 2024)

Phase 4: System validation (May – Jul, 2024)

Phase 5: Workshop about training data (Sept – Oct, 2024)

Stakeholder Feedback

IV. Timeline

V. Initial list of supporters (Endorsements)

VI. Divergent opinions

List of comments received

VII. Press release

VIII. Further revisions

IX. Lessons learned

1. Balancing openness with structure

2. Managing inclusivity and accessibility

3. Public feedback and consensus building

4. Expanding the knowledge commons

5. Reflections on governance and maintenance

X. 2025 follow-up plan

About

Open
Source AI

Licenses

Board

Community

Table of Contents

Executive Summary

Key Outcomes

I. The Open Source AI Definition v.1.0

Initial list of Open Source AI systems

II. Rationale document

III. Research and co-design process

Phase 1: Preliminary research (Jul 2022 – Dec, 2023)

Phase 2: Four Freedoms refinement (Oct – Nov, 2023)

Phase 3: System analysis (Feb – Mar, 2024)

Phase 4: System validation (May – Jul, 2024)

Phase 5: Workshop about training data (Sept – Oct, 2024)

Stakeholder Feedback

IV. Timeline

V. Initial list of supporters (Endorsements)

VI. Divergent opinions

List of comments received

VII. Press release

VIII. Further revisions

IX. Lessons learned

1. Balancing openness with structure

2. Managing inclusivity and accessibility

3. Public feedback and consensus building

4. Expanding the knowledge commons

5. Reflections on governance and maintenance

X. 2025 follow-up plan

OpenSource AI

Open
Source AI