Championing Open-source Development in Machine Learning

Open-source software development is a cornerstone of modern machine learning research. However, there are often overlooked issues around the sustainability of long-term projects, the reliability of software, and proper academic acknowledgment of long-term maintenance and contributions. The CODEML workshop at ICML aims to identify and discuss strategies for successful and sustainable open-source development in machine learning while also proposing possible solutions to the challenges mentioned above. In addition, this workshop will serve as a platform to provide academic and community recognition of efforts from open-source contributors in the field. We will bring together machine learning researchers, engineers, industrial practitioners, and software development experts. The workshop will feature invited talks, panel discussions with experts, and extended abstract submissions from open-source contributors in machine learning.

Important Dates

Submission Deadline	~~May 19~~ Extended: May 26, 2025 - 11:59 PM AoE
Acceptance Notification Date	June 9, 2025
Workshop Date	July 18, 2025

Location and Venue

Vancouver Convention Center
1055 Canada Pl, Vancouver, BC V6C 0C3, Canada

Call for Papers

We welcome submissions on open-source software within the context of machine learning research. We encourage all types of contributions, including research papers, position papers, technical reports, and retrospectives.

Suggested topics include:

Submissions that describe a new open-source machine learning software library
Submissions which explain a new addition, significant bug fixes, or changes to an established library
Submissions that explore a scientific result across different versions of an (established) library
Submissions on the technical setup (e.g. CI, testing) and best practices for reproducibility
Proposals for better workflows or incentives for open-source development and maintenance in ML
Retrospectives on the development and maintenance of mature ML OSS packages

We especially encourage submissions on development practices, mature libraries, and other topics that have received little recognition from traditional academic venues.

As our proceedings are non-archival, we will accept work that is under submission or recently accepted by other publication venues (e.g. NeurIPS, JMLR OSS Track, etc.)

Submission Guidelines

We invite submissions of 4-page workshop papers (excluding references and appendix) that address any of the workshop themes. Submissions should use the ICML style file. We encourage submissions to include relevant links to projects wherever applicable (we recommend using Anonymous4OpenScience to hide identifying details). We discourage lengthy appendices as reviewers are not required to read them.

Submit Paper on OpenReview »

Submissions will undergo a double-blind review process for relevance and adherence to ICML’s academic integrity standards. We recognize that it may be impossible for some submissions to be truly anonymous (e.g., a retrospective on a widely used library), so we ask authors to use their best judgment regarding potentially identifying details.

We aim to be inclusive while ensuring high-quality discussions that align with the workshop’s objectives. To that end, papers will be reviewed under the TMLR criteria:

Correctness: Are the claims made in the submission supported by accurate, convincing and clear evidence?
Audience: Would some individuals in the workshop’s audience be interested in the findings of this submission?

Accepted submissions will be presented during joint poster sessions and made publicly available as non-archival reports, allowing future submissions to archival conferences or journals.

Camera-Ready Submission

For the camera-ready version of accepted papers, please use this style file which serves as a drop-in replacement of the ICML style file. Camera-ready papers may have five pages of main content (one additional from the submission page limit).

Schedule

Time	Session
09:00 - 09:15	Opening remarks
09:15 - 09:45	Tri Dao: "Open-Source Attention Optimizations"
09:45 - 10:15	Coffee break
10:15 - 10:45	Evan Shelhamer: "DIY Deep Learning a Decade Later: a Retrospective on Brewing Community with Caffe"
10:45 - 11:15	Sara Hooker: "Beyond the Research Paper: Why the Way Breakthroughs Happen Is Ripe for Revolution"
11:15 - 12:00	Contributed talks / demos
12:00 - 13:00	Lunch and discussion
13:00 - 13:30	Matt Johnson: “JAX and OSS at Google / DeepMind”
13:30 - 14:00	Chris Rackauckas : “Differentiating and Integrating Open Source Development with Research in Julia's SciML”
14:00 - 14:15	Contributed talks / demos
14:15 - 15:00	Poster session
15:00 - 15:30	Coffee break
15:30 - 16:00	Stella Biderman: "Lessons from the Trenches on Reproducible Evaluation of Language Models"
16:00 - 16:55	Panel discussion
16:55 - 17:00	Closing remarks

Invited Speakers

Sara Hooker is a renowned ML researcher and leader in AI fairness and interpretability, and currently the VP of research at Cohere. She previously was a research scientist at Google Brain, focusing on training models that are not only accurate, but also interpretable, fair, and robust. Sara is the founder and a current advisor of Delta Analytics, a nonprofit organization dedicated to bringing data science expertise to underserved communities. She uses her expertise and outreach to advocate for trustworthy, accessible and equitable ML practices and to promote open research and collaboration.

Tri Dao is an assistant professor at Princeton University and chief scientist at Together.AI. He completed his Ph.D. at Stanford University working with Christopher Re and Stefano Ermon. Tri is a leading expert in machine learning and systems with a focus on efficient training and long-range context. He has made significant contributions to the development of open-source tools and frameworks, including Mamba and FlashAttention.

Stella Biderman is a researcher at Booz Allen Hamilton & executive director at EleutherAI who specializes in natural language processing, ML interpretability, and AI ethics. She has contributed to the release of several open-source generative models such as GPT-NeoX, BLOOM, VQGAN-CLIP, and OpenFold. Her current research focuses on mechanistic interpretability research and the learning dynamics of large language models. Stella is currently a lead contributor to the Pythia project for transformer interpretability.

Matt Johnson is a researcher at Google Deepmind, where he works on the development of open-source tools and frameworks for machine learning. He has made numerous contributions to numerical libraries used in machine learning. Matt was a founder and lead contributor of Autograd, a precursor to the widely popular JAX library, and now is a key contributor to JAX.

Evan Shelhamer is an assistant professor of computer science at UBC and a faculty member at the Vector Institute. He has over ten years' experience in research and development for computer vision and machine learning and is an advocate for DIY open science and open-source code. Most notably, Evan served as the lead developer of the Caffe deep learning framework from version 0.1 to 1.0.

Chris Rackauckas is the Director of Modeling and Simulation at Julia Computing, the lead developer of the SciML Open Source Software Organization, Co-PI of the Julia Lab at MIT and Director of Scientific Research at Pumas-AI. He is the lead developer of several major open-source packages within the Julia ecosystem, perhaps most notably DifferentialEquations.jl. Chris's research focuses on scientific machine learning, which aims to integrate domain-specific scientific models with data-driven approaches from machine learning to accelerate simulations.