On Thu, Jun 12, 2025 at 1:11 PM Allison Randal <allison@lohutok.net> wrote:

On 5/30/25 4:01 PM, Eoghan Glynn wrote:
>
> Also worth noting the actual choice of models in this small sample
> suggests not much weight is being put on the current policy's steer
> towards: "exclusively using compatible licensed inputs to train the
> model, using a model released as open source, using a model trained
> exclusively on compatible licensed code".
>
> To catch up with emerging practice, I'd suggest that we augment the
> formal policy with "Assisted-By" as soon as possible. FWIW, I think we
> should also simplify, shorten, and make the policy more permissive
> overall (e.g. by removing the language on the choice of model, and
> pushing the secondary scanning into the code review or merge gating
> flow). Though at least we should get the Assisted-By option codified soon.

Agreed on simplifications and adding Assisted-By as an option.

But, this is not the time to remove the clear reminder to contributors
that there is real uncertainty and risk around AI tools and whether
their outputs can be used in open source software. Tagging the commits
does mitigate some of the risk (it at least gives us the option to rip
out that code someday if we have to), but making smart choices about
which tools you use in the first place is even better.

The world is really only beginning the journey of understanding the
legal implications of AI generation in the context of copyrighted works,
for example:

https://www.wired.com/story/disney-universal-sue-midjourney/

Totally get the concern about the risks of future litigation.

We clearly have vanishingly low invocation of the existing policy though, at least for the OpenStack codebases.

Now that could be due to one of several reasons, including:

(a) very low genAI tool usage among our contributors

or:

(b) widespread non-observance of the policy's requirements

IMO both are bad outcomes for the community, since (a) cuts us off from the productivity gains accruing from using these tools, whereas (b) cuts us off from the future risk mitigation of clearly marked AI-assisted contributions.

FWIW I suspect that the current wording of the policy could be a factor in either (a) or (b). and that a simplified, lightweight policy is more likely to be widely understood and observed.

Cheers,

Eoghan

Allison