I applaud this idea. Preferably a language family with a large community of practice, 'minority' in the sense of coverage and support by modern tools and scaffolding, not in the sense of limited use.
We used to have a roughly weighted list of major world languages by (spoken, written; primary, secondary) and how well covered they were by wp (articles, contributors). Is there something like that still?
//S
🌍🌏🌎
On Wed., Aug. 5, 2020, 3:19 p.m. C. Scott Ananian, cananian@wikimedia.org wrote:
Sorry I'm coming to this discussion a bit late, but I'd like to underline a slightly different aspect of the concern that Phoebe raised:
It concerns me that, at least in the high-level project proposals I've seen (I haven't been tracking this closely, and haven't read the academic papers) I have not yet seen discussions of ethical data, or how we might think about identifying bias, or even how to recruit contributors and the impact on existing contributors.
Using the terminology of Ibram X. Kendi (and others), I'd put this as: "it's not enough to not be racist, you must actively be *anti-racist*."
Abstract Wikipedia is a "color blind" project. Indeed it is often described as advancing WMF goals by improving the amount of content available for minority languages.
However, it is built on a huge edifice of ML and AI technology which advantages majority languages and the already-powerful.
As Phoebe mentioned, the subtle biases of ML translation toward majority views (selecting the "proper" gender pronoun for someone described as a "doctor" or "professor", say) are well known, and certainly deserve to be foregrounded from the start, as Danny has pledged to do in his response to Phoebe.
But the infrastructure of this project is built this way from the ground up. Language models for European languages are orders of magnitude better than language models for minority languages (if the latter exist at all). The same is true for ontologies and every other constructed abstraction, down to choices of what topics are significant enough to include in an abstract article---but that ground has been ably covered by Kaldari and others. So let me concentrate solely on language models in the remainder (with some parenthetical asides, for which I hope you'll forgive me).
I would like to challenge Abstract Wikipedia not only to be "not racist" or "color blind", but to be actively *antiracist*. That is, instead of passively accepting the status quo wrt language models (& etc), to commit to actively supporting a language model in *at least one* minority language, treating it as a first-class citizen or (better) the *main* output of the project. That means not just looking for "a good enough language model that happens not to be a European language" but *actively developing the language model* so that the Abstract Wikipedia project *from inception* has a positive effect on *at least one* community speaking a underrepresented language with a small Wikipedia. (Again, WLOG this could apply to general AI/ML support for many many minority groups, but I'm sticking with "at least one" and "language model" in order to make this as concrete and actionable as possible.) This of course also means committing to hire a speaker of that non-European language as part of the core team (not just an "and translations" afterthought), committing to foregrounding that language in demonstrations, and doing outreach and community building to the language group in question. (All the mockups I've seen have been in German and English, and have been pitched to an English-speaking audience.)
I don't think it is wise in 2020 to pretend that "colorblind" business as usual will advance the goals of our organization. We need to actively work to ensure this project has effects that *work against* the significant pre-existing biases toward highly-educated speakers of European languages. It is not enough to say that "someday" this "may" have an effect on minority language groups if "somebody" ever gets around to doing it. We must make those investments proactively and with clear intention in order to effect the change we wish to see in the world. -- C. Scott Ananian _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe