Table of Contents
Executive Summary
The Open Source Initiative (OSI) has successfully led a global, multi-stakeholder process to define and validate Open Source AI through a collaborative, inclusive, and iterative co-design approach. The resulting Open Source AI Definition (OSAID) v1.0 outlines the essential freedoms—use, study, modify, and share—that AI systems must provide to align with Open Source principles.
Key Outcomes
- The Open Source AI Definition v1.0
- Developed through a rigorous, multi-phase global consultation, this definition transposes the freedoms of Open Source software to AI systems.
- The Definition is supported by a diverse group of stakeholders, including AI developers, deployers, end users, and those impacted by AI systems.
- Initial list of Open Source AI systems
- The following systems successfully passed the validation process:
Pythia (Eleuther AI), OLMo (AI2), Amber & CrystalCoder (LLM360), T5 (Google). - Other systems like BLOOM and Starcoder2 would pass with licensing modifications.
- Systems such as Llama2, Grok, Phi-2, and Mixtral did not meet the required criteria.
- The following systems successfully passed the validation process:
- A transparent and inclusive Co-Design Process
- The process engaged over 50 volunteers across 30+ countries with contributions from underrepresented communities, including women, transgender, and nonbinary individuals, and people of color.
- Co-design phases included workshops, webinars, and validation processes, balancing in-person and virtual formats to maximize global accessibility.
- Lessons learned and future governance
- Balancing openness with structured processes emerged as a key challenge. Future iterations will emphasize hybrid consultation methods, clearer timelines, and governance frameworks.
- Expanding the knowledge commons through reusable resources like webinars, reports, and workshops ensures sustained community engagement.
- Next steps: promotion and collaboration
- The OSI will focus on global promotion of the Definition through conferences, webinars, and media outreach in 2025.
- Collaborations with organizations like Hugging Face, Mozilla, and Carnegie Mellon University are underway to refine the Definition’s practical implementation.
The Open Source AI Definition establishes a critical foundation for transparency, innovation, and equitable AI development worldwide. Moving forward, OSI will continue fostering dialogue, refining processes, and supporting stakeholders as they implement and evaluate AI systems against this Definition.
I. The Open Source AI Definition v.1.0
Please view the text of the Open Source AI Definition v1.0 on our website and the announcement on our blog.
Initial list of Open Source AI systems
These models that passed the Validation phase:
- Pythia (Eleuther AI)
- OLMo (AI2)
- Amber and CrystalCoder (LLM360)
- T5 (Google)
There are a couple of others that were analyzed and would pass if they changed their license:
- BLOOM (BigScience)
- Starcoder2 (BigCode)
- Falcon (TII)
Those that have been analyzed and don’t pass because they lack required components:
- Llama2 (Meta)
- Grok (X/Twitter whatever)
- Phi-2 (Microsoft)
- Mixtral (Mistral)
These are all the ones that were analyzed. How to expand the process to verify that a system conforms to the Open Source AI Definition is going to be the next exercise
II. Rationale document
Due to the technical, semantic and economic differences between software and AI systems, it became apparent quite quickly in 2022 that a simple translation of the Open Source Definition (OSD) would not be enough to apply the Open Source freedoms to AI.
We agreed that for AI, society needs at least the same essential freedoms of Open Source to enable AI developers, deployers and end users to enjoy those same benefits: autonomy, transparency, frictionless reuse and collaborative improvement.
The OSI board set a strategic objective to have a definition of Open Source for AI, a completely different domain, hoping to replicate its success. We knew that this new Definition couldn’t be the work of a single individual as it was for the free software definition and the subsequent Open Source Definition. We started in 2022 a global, multi-stakeholder discussion with the AI, data and software communities to find the principles of Open Source applied to AI. Our top objective was understand the meaning of OSD#2 “Source code: The source code must be the preferred form in which a programmer would modify the program” for AI.
At the beginning of 2023, we started pitching to community partners the idea of a process similar to the one used to define the GPLv3 to be executed during 2024 with a conclusion in 2025. The response was unanimously concerned about the lengthy timeline, with every week increasing the risk that Open Source AI would be a generic term with no clear definition and that EU regulators would come up with their own definition without community input. OSI was pushed to act fast: the term Open Source AI was being used and abused already and the whole Open Source ecosystem required guidance. Therefore, the board set a deadline to finish the process by October 2024, with two additional constraints: The Open Source AI Definition must be supported by stakeholders that include developers, deployers and end users of AI and subjects (those affected by AI decisions); additionally, it must provide positive examples of AI systems, rooted in current practice to provide a reference for interested parties.
System Creator | License Creator | Regulator | Licensee | End User | Subject |
Makes AI system and/or component that will be studied, used, modified, or shared through an open source license (e.g., ML researcher in academia or industry) | Writes or edits the open source license to be applied to the AI system or component; includes compliance (e.g., IP lawyer) | Writes or edits rules governing licenses and systems (e.g. government policy-maker) | Seeks to study, use modify, or share an open source AI system (e.g. AI engineer, health researcher, education researcher) | Consumes a system output, but does not seek to study, use, modify, or share the system (e.g., student using a chatbot to write a report, artist creating an image) | Affected upstream or downstream by a system output without interacting with it intentionally; includes advocates for this group (e.g. people with loan denied, or content creators) |
Stakeholder categories identified by OSI board
OSI staff conducted consultations in many styles that suited the expectations and needs of various kinds of stakeholders in the early months of 2023. Finally brought all their perspectives together in a public context starting with the launch of the public co-design process at All Things Open, October 2023 that continued through 2024.
III. Research and co-design process
The co-design method was chosen because a global definition requires a global consultation. Co-design is a methodology for making decisions with diverse stakeholders. Our goal was to design the Open Source AI Definition (OSAID) with the people who would create, deploy, use, and be subject to Open Source AI systems, and to be as global, equitable, and inclusive as possible in that work, giving a place to everyone and special favor to no one. The definition that starts this paragraph actually emerged from the co-design process, suggested during a workshop in Buenos Aires by an Argentinian open source strategist, Maria Cruz. The fact that even our co-design definition was created by a stakeholder is emblematic of the international, collaborative process used to create the OSAID.
This approach was not uncontroversial. Making global tech decisions through global consultation represents a departure from past methods in which tech experts and activists from the Global North held disproportionate power as compared to the Global Majority (aka Global South) in deciding what is true, right, and best for Open Source. This has continued to be a point of challenge illustrating that changing culture in this area will be ongoing work.
Co-design is a set of participatory methods that share knowledge and power. Every person who volunteered for a role in the co-design process was given one. We were also careful to ensure all the identified stakeholder groups were represented, further challenging traditional notions of “expertise”.

Also emblematic of the spirit of the OSAID co-design process is the story of Rahmat Akintola (left), who was featured on the OSI blog. Rahmat is the Program Lead for Women in Machine Learning and Data Science (WiMLDS) in Accra, Ghana. As part of its effort to ensure the inclusion of women of color from the Global South in the OSAID co-design process, Do Big Good, the co-design firm OSI hired in the fall of 2023, conducted focused outreach to this and similar organizations in Sub-Saharan Africa.
Rahmat joined the OSAID co-design process as a member of the OpenCV Workgroup and then volunteered to present the OSAID at the Deep Learning Indaba in Dakar in September, the premier AI / ML conference in Africa. This path from inclusive outreach to workgroup participation to public advocacy, funded by a grant from Alfred P. Sloan Foundation, is what equitable and global co-design is all about and crucial to achieving a definition that is global in scope.
Among the 50+ co-design volunteers in the process, nearly 30 countries of origin and residence are represented, including participants from Africa, Asia, Europe, and the Americas. We estimate that 31% are OSAI developers, 46% deployers, 90% end users, and nearly all have been subjects of OSAI through upstream or downstream data usage. Over 30% are women, transgender, and nonbinary and over 40% are black, indigenous, and other people of color.
This section describes the co-design phases in the development of the OSAID. The first phase describes OSI’s activities in 2022 through 2023. Phases two through five describe activities in late 2023 through 2024, when Do Big Good was brought in to manage and implement the co-design process.
Phase 1: Preliminary research (Jul 2022 – Dec, 2023)
In 2022, the Open Source Initiative started coordinating a global process to sharpen collective knowledge and identify the principles that eventually lead to the OSAID. Under the name “Deep Dive: AI”, the OSI mapped the issues of Open Source and AI. This project consisted of a global conversation made of six podcast episodes (with experts Pamela Chestek, Alek Tarkowski, Connor Leahy, David Gray Widder, Mo Zhou, and Bruce Draper) and four online panel discussions (with experts Astor Nummelin Carlberg, David Kanter, Sal Kimmich, Stella Biderman, Alek Tarkowski, Kat Walsh, Luis Villa, Carlos Muñoz Ferrandis, Kit Walsh, Pamela Chestek, Jennifer Lee, Danish Contractor, Adrin Jalali, Chris Albon, Ibrahim Haddad, Mark Surman, and Amy Heineike).
In early 2023, a comprehensive report was published to further socialize the outcomes and inform the next phases of work. The key learning from this initial phase was that the traditional view of Open Source software licensing is insufficient to cover the complexity of AI systems. Key questions emerged for the next phase: What does it mean for an AI system to be Open Source? What policies are needed to both nurture innovation and protect individuals and society as a whole from harm?
In September 2023, the OSI hosted a webinar series to better understand the AI space. Speakers from law, academia, NGOs, enterprise, and the Open Source community shared their thoughts on pressing issues and offered potential solutions in our development and use of AI systems. A total of 18 webinars were shared bringing together 37 experts. A second report was published in late 2023
Phase 2: Four Freedoms refinement (Oct – Nov, 2023)
In 2023, with the participation of Do Big Good, OSI hosted three in-person co-design workshops in the United States and Africa to determine how the Free Software Foundation’s four freedoms to study, use, modify and share an Open Source system should apply to AI.

- Question: Use, study, modify, share: What should these open source principles mean for artificial intelligence?
- Method: In-person co-design workshops in Monterey, Raleigh, and Addis Ababa where participants drafted and edited the text of the four freedoms for OSAI. The results of that process still appear in the current version of the definition:
- Use: the system for any purpose and without having to ask for permission.
- Study: how the system works and inspect its components.
- Modify: the system for any purpose, including to change its output.
- Share: the system for others to use with or without modifications, for any purpose.
- Workshop Participants: During this phase of the co-design process, participants were not asked to publicly share their names and affiliations. This omission in transparency was remedied in subsequent co-design phases.
- Objective: Transpose the “four freedoms” of the free software definition to AI.
Phase 3: System analysis (Feb – Mar, 2024)
At the end of the second phase, we received stakeholder feedback that the co-design process was exclusionary because it was only happening in-person, and there were many stakeholders who could not attend the workshops (one of the reasons why we reached out to the Alfred P. Sloan Foundation to support a global outreach effort).
We took this feedback into account and, after one more in-person session at AI_dev in San Jose, we shifted to an entirely virtual process for the third phase. Co-design volunteers conducted small group analysis on four systems self-described as open to develop a proposal on which components should be included in the preferred form. This post clarifies that the intention of this phase was to explore avenues to unlock the conversation that got us stuck debating “data”: we needed to get a better sense from AI practitioners of what they need to exercise the four freedoms.

- Question: What components must be open in order for an AI system to be used, studied, modified, and shared?
- Method: One in-person session in San Jose, followed by four virtual workgroups focused on Bloom, OpenCV, Llama 2, and Pythia, four systems with different approaches to OSAI openness.
- We started with a list of AI system components created by a pre-release of the Model Openness Framework (MOF), a Linux Foundation project.
- In February, workgroup members were invited to vote on whether or not each of the MOF components were required to study, use, modify, and share the system.
- Workgroup members voted using their initials, so it would be transparent which members saw which components. Votes were recorded and tabulated on a public spreadsheet.
- When tabulation occurred, we didn’t notice that the Llama 2 group had a -1 option that subsequent groups lacked. This was an oversight that didn’t impact the result (as this discrepancy was only highlighted in September 2024 on the forum, we removed the -1 votes and re-tabulated the data, ending up with the same results.)
- The purpose of the voting was to give a stakeholder-based signal of component priorities for the preferred form, which would then be commented upon and critiqued publicly in the forum. There has been ample opportunity to comment on the outcomes from the initial voting process.
- We shared the results of the tabulation for comment on the forum on March 1st. The results were criticized for “wasting time” analyzing LLama, which clearly would never pass as Open Source.
- The recommendation results from the tabulation were:
- Required: Training, validation, and testing code; Inference code; Model architecture; Model parameters; Supporting libraries & tools
- Likely Required: Data preprocessing code
- Maybe Required: Training datasets; Testing datasets; Usage documentation; Research paper
- Likely Not Required:Model card; Evaluation code; Validation datasets; Benchmarking dataset; All other data documentation
- Further down in the thread, we clarified that a line was drawn arbitrarily between “maybe required” and “likely required” to test the hypothesis for the next co-design step: If the component “training dataset” is not required, do we have any clearly non-open-source bycatch (like Llama?)
- We integrated the recommended components in version 0.0.6 on March 10th, which was also shared for public comment.
- Members:
- These and other co-design groups were selected from two source: those who responded to public calls for participation on the forum or listserv and focused outreach by Mer Joyce and Kayla Cody-Lushuzi of Do Big Good to bring in excluded groups, such as women, trans, and nonbinary folks; black, indigenous, and other people of color; and people from Asia and the Global South.
- Llama 2 Workgroup
- Bastien Guerry DINUM / France
- Ezequiel Lanza Intel / Argentia
- Roman Shaposhnik Apache Software Foundation / Russia
- Davide Testuggine Meta / Italy
- Jonathan Torres Meta / USA
- Stefano Zacchiroli Polytechnic Institute of Paris / Italy
- Mo Zhou Debian, Johns Hopkins University / China
- Victor Lu independent consultant / USA
- BLOOM Workgroup
- George C. G. Barbosa Fundação Oswaldo Cruz / Brazil
- Daniel Brumund GIZ FAIR Forward – AI for All / Germany
- Danish Contractor BLOOM Model Governance Workgroup / Canada
- Abdoulaye Diack Google / Ghana
- Jaan Li University of Tartu, Phare Health / Estonia
- Jean-Pierre Lorre LINAGORA, OpenLLM / France
- Ofentse Phuti WiMLDS Gaborone / Botswana
- Caleb Fianku Quao Kwame Nkrumah University of Science and Technology, Kumasi / Ghana
- Pythia Workgroup
- Seo-Young Isabelle Hwang Samsung / South Korea
- Cailean Osborne University of Oxford / UK
- Stella Biderman EleutherAI Institute / USA
- Justin Colannino Microsoft / USA
- Hailey Schoelkopf EleutherAI Institute / USA
- Aviya Skowron EleutherAI Institute / Poland
- OpenCV Workgroup
- Rahmat Akintola WiMLDS Accra / Ghana
- Dr. Ignatius Ezeani Lancaster University, UK, Nnamdi Azikiwe University, Nigeria, Masakhane NLP / Nigeria
- Kevin Harerimana CMU Africa / Rwanda
- Satya Mallick OpenCV / USA
- David Manset ITU / France
- Phil Nelson OpenCV / USA
- Tlamelo Makati WiMLDS Gaborone, Technological University Dublin / Botswana
- Minyechil Alehegn Tefera Mizan Tepi University / Ethiopia
- Akosua Twumasi Ghana Health Service / Ghana
- Rasim Sen Oasis Software Technology Ltd. / UK
Phase 4: System validation (May – Jul, 2024)
In the next phase, we sought to verify which AI systems met the criteria of the OSAID, a requirement of the Board and a common question among stakeholders. Enabled by the results of the previous phase we tested a working hypothesis: If the training dataset is not required, do we keep Pythia (whose dataset is legally challenged in the US) in the Open Source AI fold while we don’t catch Grok, Phi or Llama?

Volunteers reviewed 13 AI systems self-described as open, yet the process was difficult. Most volunteers could not find all the documentation necessary to verify that the required components were available to study, use, modify, and share.
We see the difficulty of the validation process as a reason for OSI to continue to certify licenses, as it does for software, rather than trying to certify individual AI systems. This means that the collaboration of system creators is necessary to certify systems, as they’re the best positioned to provide the list of components and their legal terms.
- Question: Which AI systems meet the criteria of the OSAID?
- Method: Through a public call for participation, volunteers signed up to review a total of 13 systems self-described as open (list below). They used versions 0.0.6 through 0.0.8 of the OSAID as references.
- All review spreadsheets were posted publicly to maximize transparency.
- Most of the review work took place in May, 2024.
- Whenever possible, each system was reviewed by at least one person not affiliated with the system. LLM360 is self-certified.
- Most volunteers were not able to complete their reviews or reach conclusions on the openness of the system because of difficulty finding necessary documentation publicly on the internet.
- The results we were able to collect are in the table on the previous page.
- Reviewers:
- 1. Arctic
- Jesús M. Gonzalez-Barahona Universidad Rey Juan Carlos / Spain
- 2. BLOOM
- Danish Contractor BLOOM Model Governance Workgroup / Canada
- Jaan Li University of Tartu, Phare Health / Estonia
- 3. Falcon
- Casey Valk Nutanix / USA
- Jean-Pierre Lorre LINAGORA, OpenLLM / France
- 4. Grok
- Victor Lu independent consultant / USA
- Karsten Wade Open Community Architects / USA
- 5. Llama 2
- Davide Testuggine Meta / Italy
- Jonathan Torres Meta / USA
- Stefano Zacchiroli Polytechnic Institute of Paris / Italy
- Victor Lu independent consultant / USA
- 6. LLM360
- Victor Miller LLM360 / USA
- 7. Phi-2
- Seo-Young Isabelle Hwang Samsung / South Korea
- 8. Mistral
- Mark Collier OpenInfra Foundation / USA
- Jean-Pierre Lorre LINAGORA, OpenLLM / France
- Cailean Osborne University of Oxford / UK
- 9. OLMo
- Amanda Casari Google / USA
- Abdoulaye Diack Google / Ghana
- 10. OpenCV
- Rasim Sen Oasis Software Technology Ltd. / UK
- 11. Pythia
- Seo-Young Isabelle Hwang Samsung / South Korea
- Stella Biderman EleutherAI Institute / USA
- Hailey Schoelkopf EleutherAI Institute / USA
- Aviya Skowron EleutherAI Institute / Poland
- 12. T5
- Jaan Li University of Tartu, Phare Health / Estonia
- 13. Viking
- Merlijn Sebrechts Ghent University / Belgium
- 1. Arctic
Phase 5: Workshop about training data (Sept – Oct, 2024)

Note: Participants gave verbal consent for these photos to be taken. We are waiting on their written consent before these photos are disseminated publicly. (MJ 16/10/24)
Because the OSAID position on training data was the most contentious outcome of the co-design process, we decided to host a workshop specifically to provide recommendations on how training datasets should be designed, licensed, and regulated in open source AI systems.
- Question: How should training datasets be designed, licensed, and regulated in open source AI?
- Method: On October 10th and 11th, we brought together 18 data and OSAI experts from 15 countries for a two-day workshop in Paris to co-design recommendations on OSAI data. Mer Joyce facilitated both days of the workshop. This was our process:
- Preparation – In September, Alek Tarkowski of Open Future wrote a draft of the white paper for participants to comment on before the workshop. From these comments emerged three topic areas (dataset design, licensing, and regulation), as well as the structure of the workshop, which would begin with brainstorming and end with small group development of proposals.
- Day 1 – We collected and prioritized a broad array of solutions for open, public, obtainable, and unshareable data across the three topic areas, using post-its to record suggestions. The day ended with voting to prioritize these suggestions.
- Day 2 – On the second day, we split into small groups connected to the three thematic areas (design, licensing, regulation) and each group developed specific proposals in these areas, based on the brainstorming and prioritization from the day before. Participants self-documented their proposals and discussion notes.
- Next Steps – Recommendations from the participants have been incorporated into the white paper and shared again for comments in early November. The white paper is being finalized for publication.
- The framing of the discussion in Paris has been published in a blog post. The white paper is complementary to, but not contingent on, the release of the OSAID.
- Workshop Participants:
- Dr. Ignatius Ezeani – Lancaster University, UK, Nnamdi Azikiwe University, Nigeria, Masakhane NLP / Nigeria
- Masayuki Hatta – Surugadai University / Japan
- Aviya Skowron – EleutherAI Institute / Poland
- Stefano Zacchiroli – Polytechnic Institute of Paris / Italy
- Ricardo Mirón – Digital Public Goods Alliance / Mexico
- Kristina Podnar – Data and Trust Alliance / Croatia + USA
- Joana Varon – Coding Rights / Brazil
- Renata Avila – Open Knowledge Foundation / Guatemala
- Alek Tarkowski – Open Future / Poland
- Maximilian Gantz – Mozilla Foundation / Germany
- Stefaan Verhulst – GovLab / USA + Belgium
- Paul Keller – Open Future / Germany
- Thom Vaughn – Common Crawl / UK
- Julie Hunter – LINAGORA / USA
- Deshni Govender – GIZ FAIR Forward AI for All / South Africa
- Ramya Chandrasekhar – CNRS – Center for Internet and Society / India
- Anna Tumadóttir – Creative Commons / Iceland
- Stefano Maffulli – Open Source Initiative / Italy
Stakeholder Feedback
Below are quotes from participants who played a variety of roles in the co-design process:
The codesign process allowed me to see first hand the thought process of people all over the world about what is open source AI. It may never be possible for all the people to agree on the definition. But It is a wonderful start and I think everyone will agree that the open discussions, seminars, townhall meetings, follow up surveys, emails are all very effective and “democratic” 🙂
– Victor Lu, Llama 2 Workgroup Member and System Validator
[What I appreciated about the workshop was] the diversity of attendees’ perspectives, how the conversation was facilitated (prep ahead of time so we could get a running start) and the constructive nature of taking this white paper forward. [I just wish we had] a bit more time… maybe starting earlier on Thursday would have been good. Else, everything was great. Thank you for making good use of time, creating a collaborative and open environment, and representing as much diversity as possible.
– Anonymous Participant, Data in OSAI Workshop
It was a great experience working with the open AI team and contributing to this important initiative. We look forward to seeing the release version and witnessing the impact it will have on the AI community.
– Rasim Sen, OpenCV Workgroup Member and System Validator
During the OSAID process, I had the chance to collaborate with members from various continents and time zones. It was an interesting experience as sometimes I found myself waking up at 2 am in my pajamas for a Zoom call! 😉 Through both synchronous discussions within our working group (WG) meetings and asynchronous conversations on the web forum, I gained valuable insights into diverse collaboration methods.
– Seo-Young Isabelle Hwang, Pythia Workgroup Member and System Validator
Thank you for making good use of time, creating a collaborative and open environment, and representing as much diversity as possible.
– Anonymous Participant, Data in OSAI Workshop
In my experience, the co-design process was seamless and straightforward to take part in. Even though the process was virtual, it was transparent and simple to follow at every stage.
– Rahmat Akintola, OpenCV Workgroup Member
The debate over what is or isn’t open source AI often seems like an infinite tug-of-war between those who argue for relatively light-touch requirements ( basically, open-weight models) and those who argue for maximal transparency of models and their constituent parts, as well as all the various views and concerns in between these two poles.
While it is healthy to have divergent views in the open source AI community, it’s becoming more and more urgent to build consensus, especially as we now have regulations like the AI Act that introduce requirements and exceptions for the providers of open source AI systems even in the absence of a definition of open source AI systems.
Towards this end, the co-design process has been an excellent way to bring in the diverse views of experts from various corners of the world and through open debate figure out what we can agree on and what not.
Given the high stakes of the open source AI definition, I hope that the co-design process can continue and that we can work towards a definition that works for the community.
– Cailean Osborne, Pythia Workgroup Member
I like everything that has to do with the transparency of Artificial Intelligence algorithms. For my part, I am focused on the transparency of Machine Learning models: trying to decipher the billions of calculations that make them up and explaining them to those who are in the field and those who are not.
In the same way, I value the search for transparency in terms of the data with which these models are trained and the way in which they are obtained, as well as in the design of the code. This is why I highly value the work of the Open Source AI Definition and consider it vitally important to ensure transparency.
-OSAID Presentation Participant, Argentina (translated from Spanish)
IV. Timeline
A list of all the consultation points (meeting dates / locations; total number of threads on the discussion site etc) and a list of all contributors (one big list, not per consultation point
Deep Dive: AI Podcasts 2022
- Welcome to Deep Dive: AI (Stefano Maffulli – July, 2022)
- Copyright, selfie monkeys, the hand of God (Pamela Chestek – Aug 16, 2022)
- Solving for AI’s black box problem (Alek Tarkowski – Aug 23, 2022)
- When hackers take on AI: Sci-fi – or the future? (Connor Leahy – Aug 30, 2022)
- Building creative restrictions to curb AI abuse (David Gray Widder – Sep 6, 2022)
- Why Debian won’t distribute AI models any time soon (Mo Zhou – Sep 13, 2022)
- How to secure AI systems (Bruce Draper – Feb 9, 2023)
Deep Dive: AI Panels 2022
- Exploring the business side of AI (Astor Nummelin Carlberg, David Kanter, Sal Kimmich, Stella Biderman, Alek Tarkowski – October 11, 2022)
- Exploring the society side of AI (Kat Walsh, Luis Villa, Carlos Muñoz Ferrandis, Kit Walsh – October 13, 2022)
- Exploring the legal side of AI (Pamela Chestek, Jennifer Lee, Danish Contractor, Adrin Jalali – October 18, 2022)
- Exploring the academia side of AI (Chris Albon, Ibrahim Haddad, Mark Surman, Amy Heineike – October 20, 2022)
Deep Dive: AI Webinars 2023
- Deep Dive: AI Webinar Series (September, 2023)
- The Turing Way Fireside Chat: Who is building Open Source AI? (Jennifer Ding, Arielle Bennett, Anne Steele, Kirstie Whitaker, Marzieh Fadaee, Abinaya Mahendiran, David Gray Widder, Mophat Okinyi)
- Operationalising the SAFE-D principles for Open Source AI (Kirstie Whitaker, David Leslie, Victoria Kwan)
- Commons-based data governance (Alek Tarkowski, Zuzanna Warso)
- Preempting the Risks of Generative AI: Responsible Best Practices for Open-Source AI Initiatives (Monica Lopez)
- Data privacy in AI (Michael Meehan)
- Perspectives on Open Source Regulation in the upcoming EU AI Act (Katharina Koerner)
- Data Cooperatives and Open Source AI (Tarunima Prabhakar, Siddharth Manohar)
- Fairness & Responsibility in LLM-based Recommendation Systems: Ensuring Ethical Use of AI Technology (Rohan Singh Rajput)
- Challenges welcoming AI in openly-developed open source projects (Thierry Carrez, Davanum Srinivas, Diane Mueller)
- Opening up ChatGPT: a case study in operationalizing openness in AI (Andreas Liesenfeld, Mark Dingemanse)
- Open source AI between enablement, transparency and reproducibility (Ivo Emanuilov, Jutta Suksi)
- Federated Learning: A Paradigm Shift for Secure and Private Data Analysis (Dimitris Stripelis)
- Should OpenRAIL licenses be considered OS AI Licenses? (Daniel McDuff, Danish Contractor, Luis Villa, Jenny Lee)
- Copyright — Right Answer for Open Source Code, Wrong Answer for Open Source AI? (McCoy Smith)
- Should we use open source licenses for ML/AI models? (Mary Hardy)
- Covering your bases with IP Indemnity (Justin Dorfman, Tammy Zhu, Samantha Mandell)
- The Ideology of FOSS and AI: What “Open” means to platforms and black box systems (Mike Nolan)
OSAID Conferences and Meetings 2023/2024
June, 2023
- First OSAID Meeting (Jun. 2023 – San Francisco)
July, 2023
- FOSSY (Jul. 13-15, 2023 – Portland)
- Campus Party Brazil (Jul. 25-29, 2023 – Sao Paulo)
- The future of Artificial Intelligence: Sovereignty and Privacy with Open Source (Nick Vidal, Aline Deparis)
- Open Source Congress (Jul. 27-28, 2023 – Geneva)
September, 2023
- Open Source Summit Europe (Sept. 19-21, 2023 – Bilbao)
- Nerdearla (Sept. 26-30, 2023 – Buenos Aires)
October, 2023
- All Things Open (Oct. 15-17, 2023 – Raleigh)
- Latinoware (Oct. 18-20 – Foz do Iguacu)
- Linux Foundation Member Summit (Oct. 24-26, 2023 – Monterey)
November, 2023
- DPGA Member Meeting (Nov. 14, 2023 – Addis Ababa)
- Workshop: Define “Open AI” (Stefano Maffulli, Nicole Martinelli)
December, 2023
- AI.dev (Dec. 12-13, 2023 – San Francisco)
February, 2024
- FOSDEM (Feb. 3-4, 2024 – Brussels)
- Columbia Convening on openness and AI (Feb. 29, 2024 – New York)
April, 2024
- Open Source Summit – North America (April 16, 2024 – Seattle)
- LLW Gothenburg (April 16, 2024 – Gothenburg)
May, 2024
June, 2024
- OW2conf (June 11-12, 2024 – Paris)
- OpenExpo Europe (June 13, 2024 – Spain)
- AI_Dev Europe (June 19-20, 2024 – Paris)
July, 2024
- OSPOs for Good (July 9-10, 2024 – New York)
- What’s Next for Open Source (July 11, 2024 – New York)
- Sustain Africa (July 15, 2024 – Online)
August, 2024
- KubeCon + AI_dev Hong Kong (Aug. 21-23, 2024 – Hong Kong)
- Open Source Congress (Aug. 25-27, 2024 – Beijing)
- Datasets, Privacy, and Copyright (Stefano Maffulli, Donnie Dong)
- The Open Source AI Definition (Stefano Maffulli)
September, 2024
- Deep Learning Indaba (Sept. 1-7, 2024 – Dakar)
- India FOSS (Sept. 7-8, 2024 – Bengaluru)
- Open Source Summit Europe (Sept. 16-18 , 2024 – Vienna)
- Nerdearla (Sept. 24-28, 2024 – Buenos Aires)
October, 2024
- Open Forum for AI (Oct. 4, 2024 – Washington DC)
- Open Source AI Definition (Deb Bryant)
- Training Data in OSAI (Oct. 10-11, 2024 – Paris)
- Workshop (Ignatius Ezeani, Masayuki Hatta, Aviya Skowron, Stefano Zacchiroli, Ricardo Torres, Kristina Podnar, Joana Varon, Renata Avila, Alek Tarkowski, Maximilian Gantz, Stefaan Verhulst, Paul Keller, Thom Vaughn, Julie Hunter, Deshni Govender, Ramya Chandrasekhar, Anna Tumadóttir, Stefano Maffulli)
- Open Community Experience (Oct. 22-24, 2024 – Mainz)
- All Things Open (Oct 27-29, 2024 – Raleigh)
November, 2024
- SFSCON (Nov. 8-9, 2024 – Bolzano, Italy)
- Open Source in EU policy (Jordan Maris)
- Digital Public Goods Alliance Annual Members Meeting (Nov. 13-15, 2024 – Singapore)
- The Linux Foundation Member Summit (November 19-21, 2024 – Napa)
- OSI Open Source AI Definition Update and Q&A (Stefano Maffulli)
Co-Design Town Halls 2024
- January, 2024
- February, 2024
- March, 2024
- April, 2024
- May, 2024
- June, 2024
- July, 2024
- August, 2024
- September, 2024
- October, 2024
V. Initial list of supporters (Endorsements)
The list of endorsers announced at the launch of version 1.0 is below. The full and most up to date list is available on OSI website.
Institutional
- Developers
- EleutherAI Institute
- CommonCrawl
- George Washington University OSPO
- LLM360
- LINAGORA
- Women In Machine Learning and Data Science – Accra
- Deployers
- Mozilla Foundation
- Mercado Libre
- SUSE
- Kaiyuanshe
- Eclipse Foundation
- End Users
- Bloomberg
- Open Infrastructure Foundation
- Interministerial Directorate of Digital Affairs (DINUM)
- Nextcloud
- sysarmy
- Subjects
- Digital Public Goods Alliance
- OpenForum Europe
- Academia
- Carnegie Mellon University OSPO
- Georgia Tech University OSPO
- Washington University OSPO
Individuals
- Sayash Kapoor
- Arvind Naranian
- Percy Liang
- Victor Lu
- Kevin Harerimana
- George C. G. Barbosa
- Dr. Ignatius Ezeani
- Seo-Young Isabelle Hwang
- Cailean Osborne
- Tlamelo Makati
- Stefano Zacchiroli
- Shuji Sado
- Felix Reda
VI. Divergent opinions
As more and more groups express support for the Open Source AI Definition, we want to keep track of the concerns raised by others. Below is a list of issues raised so far, with no added commentary, explanation or judgment:
List of comments received
We’ve received comments during the most heated discussions
- On the availability of training data: All the data used to train an AI system should be openly available, as it’s essential for understanding and improving the model.
- Synthetic data: If releasing the original data is not feasible, providing synthetic data and a clear explanation can be helpful.
- Pre-training dataset distribution: The dataset used for pre-training should also be accessible to ensure transparency and allow for further development.
- Dataset documentation: The documentation for training datasets should be thorough and accurate to address potential issues.
- Versioning: To maintain consistency and reproducibility, versioned data is crucial for training AI systems.
- Reproducibility: The Definition should say that Open Source AI must be reproducible using the original training data, scripts, logs and everything else used by the original developer.
- About the co-design process:
- The co-design process as conducted was not democratic and ultimately unfair, voting was the wrong method, the selection of the volunteers was biased, the results didn’t show any consensus and many other issues.
- Some companies reported that they didn’t have the chance to offer an official position of neither endorsement nor requests for modifications to the text. Despite the fact that some people contributed to the co-design process as volunteers in their official capacity, the quick pace of development together with the fully transparent process didn’t leave the company representatives the time to escalate up the corporate decision chain for comments to become the official statements.
VII. Press release
The announcement was published on October 28th, 2024 on OSI’s official website.
VIII. Further revisions
In the short term, the OSI will use the forum to collect the experience of AI builders interacting with the Definition. The team will reach out to groups interested in evaluating AI systems for compliance, offer them guidance on how to interpret the wording in the Definition. We started conversations with Hugging Face, Carnegie Mellon University, Mozilla, Google and others have expressed interest.
The AI Committee will monitor the conversations and offer suggestions to review the text of the Definition at quarterly intervals.
IX. Lessons learned
The Open Source AI Definition (OSAID) process has been a pioneering initiative, and while it achieved significant milestones, it also provided valuable insights for future efforts. Key lessons learned from the process include the following:
1. Balancing openness with structure
The co-design methodology fostered inclusivity by welcoming diverse stakeholders, yet its openness also posed challenges. Some corporate stakeholders found the process too open-ended, leading to disengagement, while others critiqued the lack of cohesion among co-design activities. The lesson here is to establish a clear blueprint early on, ensuring participants have a shared understanding of the process and its objectives. Introducing mechanisms for greater cross-pollination and coordination across working groups could enhance cohesiveness and engagement.
2. Managing inclusivity and accessibility
Efforts to include voices from the Global South and underrepresented communities were a notable success, exemplified by stories like Rahmat Akintola’s journey from participant to advocate. However, different formats—such as in-person workshops and online forums—were alternately praised and criticized for their accessibility. Future processes should adopt a hybrid approach from the outset, carefully designing inclusive formats that balance accessibility and participation equity. Providing preparatory learning resources beforehand can help level the playing field for all participants.
3. Public feedback and consensus building
The decision to integrate stakeholder feedback only in public discussions increased transparency to the OSAID, but discriminated against some stakeholders. Also, the rapid pace of the process occasionally hindered consensus building. The use of voting in one phase was misunderstood, perceived by few as a democratic representation tool. Others criticized the lack of time to engage their corporate employers to formally present opinions. In future initiatives, a longer timeline with built-in intervals for reflection, and different consensus-building processes adapted to stakeholders’ needs, can foster trust and allow participants to engage at a deeper level.
4. Expanding the knowledge commons
One of the most significant achievements was the creation of reusable resources, including podcasts, webinars, white papers, and recordings of town halls. These materials have contributed substantially to the knowledge commons, setting a benchmark for future projects. This demonstrates the value of documenting and sharing outputs systematically to extend their impact beyond the immediate project.
5. Reflections on governance and maintenance
The co-design process highlighted the need for ongoing governance, education, and maintenance of the OSAID. Establishing a clear governance framework with defined roles for stakeholders, mechanisms for periodic reviews, and strategies for addressing divergent opinions will be critical for the long-term success and credibility of the Definition.
A recurring theme in feedback was the need to use these lessons as a springboard for next steps, thereby to ensure the OSAID remains a living document, reflective of and responsive to the needs of the Open Source AI ecosystem.
X. 2025 follow-up plan
Next year the activities of OSI will switch to promotion and education. In parallel, the OSI will partner with other organisations to continue the validation of the Open Source AI Definition v.1.0 in order to record its critical points.
OSI will present the results of the co-design process and the version 1.0 at conferences around the world. We will engage with volunteers to present the Definition to keep travel costs and burden at a minimum and grow a community of supporters. An initial list of the top conferences OSI directly aiming at are below. OSI’s community manager is already reaching out to the co-design volunteers to identify local opportunities.
Besides the in-person conferences, the OSI will host a webinar/podcast series in the second half of 2025, interviewing AI builders to understand how they’re working in practice with the Open Source AI Definition v1.0.
Additionally, the leadership of OSI will start a media campaign to promote awareness of the OSAID, as well as by commenting on issues of importance to Open Source. The organization will maintain a robust social media presence, engaging with communities interested in expanding the role of Open Source in society.
Depending on budget availability and allocation, this plan may expand further or shrink.
Events list:
- FOSDEM, Brussels, February 1 – 2
- AI & Big Data Expo, London, February 5 – 6
- SCALE, California, March 6 – 9
- SXSW, Austin, March 7 – 15
- KHIPU, Santiago de Chile, March 10 – 14
- Open Expo Europe, Madrid, May 8
- ODSC East, Boston, May 13 – 15
- PyCon US, Pittsburgh, May 16 – 18
- AI For Good, Geneva, May
- OSS Summit NA, Denver, June 23 – 25
- AI Risk Summit, California, June
- R.AI.SE Summit, Paris, July 8 – 9
- OSPOs 4 Good, New York (United Nations), July
- Ghana Data Science Summit, Ghana, July
- Open Source Congress, August
- Ai4, Las Vegas, August
- DEF CON (AI Village), Las Vegas, August 7 – 10
- OSS Summit Europe, Amsterdam, August 25 – 27
- Deep Learning Indaba, Dakar, September
- Nerdearla Argentina, Buenos Aires, September
- The AI Conference, San Francisco, September
- All Things Open, Raleigh, October 12 – 15
- TED AI, San Francisco, October
- Dot AI, Paris, October
- AI_dev Japan, Tokyo, October
- MIT AI Conference, New York, October
- TechCrunch Disrupt, San Francisco, October
- GitHub Universe, San Francisco, October
- Open Community Experience, Mainz, October
- GovAI Summit, Arlington, October
- Nerdearla Spain, Madrid, November 13 – 15
- Linux Foundation Legal Summit, California, November
- IEEE World Technology Summit, California, November
- SeaGL, Seattle, November
- The AI Summit, New York, December