On Fri, May 1, 2026 at 1:45 PM Jan Ainali via Wikimedia-l <
wikimedia-l(a)lists.wikimedia.org> wrote:
> ... there is one of our central values I want us to keep held front of
> mind in this moment, and that is to focus on open source and not fall for
> the lure of the proprietary just because it is AI. And here I would like us
> to follow the principles of the Digital Public Goods Alliance (who made us
> so proud when they awarded Wikipedia and Wikidata with their certification
> of being Digital Public Goods) and go even further than the definition from
> the Open Source Initiative definition for open source AI. Their extension
> means that beyond the free license on the model and the code, also the
> dataset used for training should be freely licensed.[1]
>
> This would not only be the ethically right thing to do, it would also
> ensure we aren't dependent on Big Tech when doing our adaptation to the new
> landscape.
>
Hey, Jan! Thanks for raising this. I think it's such an important topic
that it is worth breaking out into a separate thread.
I agree with your bottom line: we can't have truly open knowledge without a
truly open ecosystem (not just software stack). That should be the goal we
are always, always striving to get to.
But there are some important wrinkles.
*Our knowledge ecosystem has never been purely open*
Our core web services have always been FOSS from the ground up.
But our knowledge ecosystem is very much not open.
Our tech ecosystem has always been co-dependent on web search generally,
and Google Web Search specifically. Google is how most people find us, and
how most of us find knowledge to put into the encyclopedia. This is not
*good*—it is in fact very bad—but it is, and always has been, our reality.
Mostly we ignore this inconvenient dependency, and mostly that is fine. But
if we’re going to try to see the world as it is, we also have to be honest
about that dependency.
LLMs are not perfect, but *at worst* the reasons they’re bad are the same
reasons Google Web Search (and essentially every other web search, and the
publishing industry too) is bad: controlled by an unaccountable
corporation, hard to audit, subject to all sorts of biases.
Open-weight models still aren’t perfect, but: we can audit them for bias;
we can modify them (within boundaries); we can rebuild them with open
knowledge (Ai2 says hi); we can even run them locally. That’s true even
when they aren’t DGPA-open (or in many cases even when they’re not
OSI-open).
And there are still *possibilities* of truly open (training data and
weights) models, about which more in the next point.
*Open has always involved compromise*
New open ecosystems do not just magically spring into existence—they have
always required hard work *and strategic compromise*. The GNU folks had to
compromise for almost two decades, running on proprietary Unices. It took
Mozilla most of a decade to beat IE, and they had to run proprietary
plugins starting on day one to do it. As you’re well aware, open access
publishing is still very much a work in progress two decades in.
All of those things built on each other. If Stallman hadn’t compromised by
building his open compiler on Solaris, Linus doesn’t build learn about GPL
and free Linux. If Linus doesn’t build Linux, Netscape doesn’t open
Mozilla. Mozilla used a compromise open license deliberately written to
ensure Netscape could ship proprietary plugins. Etc. Etc. Etc.
We’re only five years or so into the LLM era. It is not very open. I am not
sure what compromises will be made. But we’re probably going to need to
make compromises, in order to learn; to gain influence; to beat back our
competitors. In the best case that’s going to mean tech like Olmo and
partners like Ai2, but it is also going to mean some compromises—and some *hope
*that our work will inspire the next generation of openness.
*There’s supposed to be a third thing*
I really want to have a third thing but uh I’m drawing a blank. So again:
I have to stress, this is not a call to throw away our principles.
We should absolutely be using every bit of influence and leverage (and
money) we have to push every player in this ecosystem towards the most
possible openness. But that’s also going to mean getting involved and
building bridges, not sitting on the sidelines. And it’s going to mean
building *practical *bridges, so that (like Wikipedia Library) we sometimes
are doing deals with entities who don’t share our values. Those compromises
will have to be done vigilantly and carefully. But the world is changing
radically, and fast. So our compromises will have to be done boldly too.
Sincerely—in open and in progress—
Luis