-
Notifications
You must be signed in to change notification settings - Fork 772
Description
Hi everyone,
This issue is to open a discussion on a crucial topic for the future of our project(s):
The use of AI-powered coding assistants (like Anthropic Claude, Google Gemini, GitHub Copilot etc.) by project contributors.
Here is the problem:
which leads to:
The goal here is to gather opinions, have an open discussion, figure out and agree on how we move forward with all of that, as an Open-source project, and ultimately crystallize and embody into an AI policy and more, please see below deliverables.
Your Input is Crucial
This policy will shape how we work going forward in this new world unfolding. Please share:
- Your experiences using AI coding assistants
- Concerns about the proposed approach
- Suggestions for making the policy both protective and practical
- Examples of good/bad AI usage you've encountered
Let's work together to create a policy that protects our project while remaining welcoming to contributors who use modern tools responsibly!
Ah, and sorry for the "long" issue, I took quite some time and care to collect material and thoughts exhaustively (my understanding / eyes / views / opinions) so we have some meat to chew and discuss;)
Applicable To
This issue is relevant to all of these projects, all WAMP related:
- WAMP: The Web Application Messaging Protocol (the protocol specification and website)
- txaio: txaio is a helper library for writing code that runs unmodified on both Twisted and asyncio / Trollius.
- Autobahn|Python: WebSocket & WAMP for Python on Twisted and asyncio.
- Autobahn|JS: WAMP for Browsers and NodeJS.
- Autobahn|Java: WebSocket & WAMP in Java for Android and Java 8
- Autobahn|C++: WAMP for C++ in Boost/Asio
- Autobahn|Testsuite: The Autobahn|Testsuite provides a fully automated test suite to verify client and server implementations of The WebSocket Protocol (and WAMP) for specification conformance and implementation robustness.
- Crossbar.io: Crossbar.io is an open source networking platform for distributed and microservice applications. It implements the open Web Application Messaging Protocol (WAMP)
- zLMDB: Object-relational in-memory database layer based on LMDB
- cfxdb: cfxdb is a Crossbar.io Python support package with core database access classes written in native Python.
Rather than filing one issue on each of above 10 repositories, I've decided it makes more sense - for the discussion - to happen in one repository only, the one with the most GitHub stars - which is Autobahn|Python. But if and once it concludes, I would file the other corresponding 9 issues on the respective repos, promise.
Sidenote 1: Collecting all of this, I just realize, a) how crazy this whole endevour (WAMP etc) has turned out to be, and b) how much we have achieved with all of you contributing (OSS, oh yeah!), and c) that I am crazy! Did I mention already? Well, it's true;)
Sitenote 2: Personally, I have lately done quite some experimentation with "AI" is various ways and for various uses, and I am quite thrilled and optimistic that AI can indeed help us tame above beast! At least, for me, for hacking, coding and all that, it is an incredible catalyst / accelerator, time saver, and time is of the essence, always "too little" and all. Which is part of the reason I am filing this issue.
Deliverables
-
AI_POLICY.rst
: Human contributor/developer/user addressed AI policy and guideline -
CLAUDE.md
: AI assistant/agent addressed AI policy and guideline; also technical matters (code formatting, GitHub workflow, documentation, test strategy, ..) -
README.rst
: single paragraph ("IMPORTANT") pointing to above - GitHub Issue, PR and Commit Templates including AI matters
Meta-Goal: Making the Right Thing the Easy Thing
Before diving into the details, let's be clear about our philosophy. We've all seen compliance processes fail because they create overhead without value. Our goal is different:
We want to create a process that:
- ✅ Actually helps developers write better code and documentation
- ✅ Makes collaboration more transparent and effective
- ✅ Integrates seamlessly into existing workflows
- ✅ Creates legal protection as a natural byproduct, not as bureaucratic overhead
We explicitly reject:
- ❌ Compliance theater that wastes developer time
- ❌ Processes that exist only to check boxes
- ❌ Training videos, quizzes, or attestation forms
- ❌ Anything that makes contributing harder without making it better
The principle is simple: If following the process makes developers' work better, they'll actually follow it.
Our Intent: Responsible Innovation
As AI tools become more powerful and integrated into our workflows, it's vital that we proactively establish a clear policy to:
- Protect the legal integrity of our codebase
- Respect our licensing commitments to users and contributors
- Enable responsible use of productivity-enhancing AI tools
- Create transparent documentation of our development practices
- Lead by example in the open source community
This affects all our projects, from the dual-licensed Crossbar.io to the permissively-licensed Autobahn|XYZ family.
The Core Challenge: AI, Authorship, and Copyright
The central issue stems from a fundamental principle in copyright law (e.g., as interpreted by the U.S. Copyright Office):
A work must be created by a human to be copyrightable. An AI cannot be an author and cannot hold copyright.
This has several critical consequences for us:
-
The Ownership Gap: Code generated by an AI without significant human creative input or modification is not owned by the user who prompted it. It may fall into the public domain.
-
The "Union License" Problem: AI models are trained on vast datasets containing code under various licenses (MIT, GPL, Apache, proprietary, etc.). If AI output is considered a derivative work of its training data, the legal implications are staggering:
- The output could carry obligations from ALL licenses in the training set (L₁ ∪ L₂ ∪ ... ∪ Lₙ)
- If any two licenses are incompatible (e.g., GPL-2.0 vs GPL-3.0), the output may be legally unusable
- Even if all licenses are compatible, determining and complying with this "union license" is practically impossible
-
The "Derivative Work" Interpretation Chaos: The term "derivative work" itself is a legal minefield:
- The U.S. Copyright Office has one interpretation
- The FSF has another (particularly relevant for GPL)
- The Linux Foundation might have yet another view
- Ultimately, a specific court in a specific jurisdiction will decide - and different courts may rule differently
- Penalties could include statutory damages, and willful infringement can result in enhanced damages
How This Impacts Our Projects
The risks differ depending on the project's license, but they are significant in all cases.
For Permissively-Licensed Projects (e.g., Autobahn|XYZ - MIT License)
- The Problem: The primary risk is license pollution. A contributor might unknowingly submit AI-generated code that is a derivative of GPL-licensed training data.
- The Result: Our MIT-licensed project could inadvertently contain code with copyleft obligations. This creates a serious compliance problem for downstream users who build proprietary products on top of our libraries, as they rely on the clean, permissive nature of the MIT license.
For Dual-Licensed Projects (e.g., Crossbar.io - EUPL + Commercial)
For our dual-licensed projects, the introduction of un-owned, AI-generated code creates two severe problems that impact both sides of our license. One risk is again license pollution. The other risk related to the dual-licensing model which is based entirely on my current company (typedef int GmbH, Germany) - that funded much of development - owning 100% of the copyright, which is achieved through our Contributor Assignment Agreement (CAA).
- The Problem 1: Threat to the OSS License Integrity (License Pollution): The EUPL license is a legal grant of rights from the copyright holder. If parts of the code have no copyright holder, the EUPL license applied to those parts is legally void. This compromises the integrity of the project for everyone, including those who fork it or use it strictly under the EUPL terms. The codebase becomes a legally ambiguous patchwork of "EUPL-licensed" code and "public domain" code, creating uncertainty and compliance risks for all downstream users.
- The Problem 2: Threat to the Commercial License (CAA Failure): Our ability to offer a commercial license depends entirely on owning 100% of the copyright, which we secure through our Contributor Assignment Agreement (CAA). If a contributor submits AI-generated code, they do not own its copyright and therefore cannot legally assign it to us. This creates "ownership gaps" in our IP, making it impossible to grant a clean commercial license and undermining the business model that sustains the project.
- The Result: "Holes" of un-owned, public domain code appear in our codebase. This breaks pure EUPL based OSS forks. And it also breaks my company's ability to offer a clean commercial license, as my company can no longer warrant that it is the sole IP owner. Note that dual-licensing in no case limits the ability for anyone to fork Crossbar.io under its OSS license! But you would fork a code base with license gaps ("holes"). Also note that the trademark for "Crossbar.io" is a different matter altogether, and the rights to that are, have always been and will remain owned by (now) typedef int GmbH, Germany.
Real-World Context: The Industry is Taking Notice
Several high-profile projects and organizations are grappling with this issue:
- Linux kernel maintainers have expressed concerns about AI-generated patches
- Some projects now require explicit disclosure of AI tool usage
- Corporate legal departments are developing internal policies for their engineers
- The Software Freedom Conservancy has published guidance on GPL compliance risks
This isn't theoretical - it's a present challenge that responsible projects must address.
Why This Isn't Just Paranoia
Before you run for the basement, remember: we're not abandoning AI tools, we're learning to use them responsibly. Many industries have navigated similar transitions:
- Photography didn't end painting, but we learned to distinguish between them
- Calculators didn't replace mathematical understanding
- GPS didn't eliminate the need to understand navigation
Similarly, AI tools won't replace programmers, but we need clear boundaries between "AI-assisted" and "AI-generated" code.
The good news: By addressing this proactively, we:
- Protect our project's legal integrity
- Give contributors clear guidelines
- Can still benefit from AI as a productivity tool
- Position ourselves as a responsible leader in the OSS community
A Proposed Path Forward: A Multi-Layered Approach
To address this comprehensively, I propose we develop a two-pronged strategy:
1. Human Contributor Policy (AI_POLICY.rst
)
A formal policy that contributors must follow, covering:
- Principle of Accountability: The human contributor is 100% accountable for any code they submit, regardless of the tools used to create it.
- Mandatory Disclosure: Contributors must disclose when they have used an AI assistant in a substantive way (suggested threshold: >10 lines of logic or any complete function).
- Defining Acceptable Use:
- ✅ Acceptable: Using AI as a "tool" for boilerplate, refactoring, syntax fixes, or editing existing code
- ❌ Unacceptable: Using AI as a "creator" to generate entire functions or algorithms without significant human creative modification
- Warranty of Authorship: By submitting code, the contributor warrants that they are the legal author and can transfer copyright ownership.
- Certification Statement: Consider adding to PR templates: "I certify that I wrote this code or have the right to submit it under the project license"
2. AI Assistant Guidelines (CLAUDE.md
)
A machine-readable file that instructs AI assistants on how to behave when working with our codebase:
- Limit code generation to modifications of existing patterns
- Refuse to generate complete implementations
- Always remind users about disclosure requirements
- Include automatic disclaimers in generated code
Proposed Implementation Timeline
If we reach consensus, I suggest:
- Week 1-2: Gather community feedback on this issue
- Week 3: Draft initial policy documents based on feedback
- Week 4: Review period for draft policies
- Month 2: Finalize and merge policies with clear effective date
- Ongoing: Update as we learn from real-world application
Questions for Discussion
- Disclosure threshold: What level of AI assistance requires disclosure? Any use? Substantial use (>X lines)?
- Enforcement: How do we verify compliance? Honor system? Code review flags?
- Retroactive application: Do we need to audit recent contributions?
- Tooling: Should we develop linters or hooks to detect potential AI-generated patterns?
- Education: How do we help contributors understand what constitutes "significant human creative input"?
- Risk tolerance: Given the legal uncertainty, how conservative should our policy be?
- Evolution: How do we update our policy as case law develops?
The Bottom Line
We're navigating uncharted legal waters. Different jurisdictions will likely reach different conclusions about AI and derivative works. Our policy needs to be protective enough to safeguard the project while practical enough to not discourage contribution.
This isn't about fear - it's about responsible stewardship of a codebase that others depend on.
Thanks a lot for your attention and time!
References: