Scott,
thank you for raising this really important issue, and I whole-heartedly
agree. Since I heard of Ibram X. Kendi's argument to not just be not racist
but rather be actively anti-racist, I thought a lot about it (I have a long
essay trying to sort my thoughts on that, but I am not sure my voice is
helpful in that conversation). But yes, I agree with the sentiment and the
idea.
Another statement that has deeply influenced my thinking in preparation for
this project was the statement "nothing about us without us", and the
implications of that for the Abstract Wikipedia project (and how,
currently, we are not really achieving it).
So, in short, yes, I want to commit to both of these as guidelines for how
the project will unfold.
Having a specific, non-European and underrepresented language as a
first-class development target is a great suggestion, and having someone on
the core team with a native-level grasp of that language is, I think, a
very good suggestion. Whether and when we can actually implement this
depends on a number of factors, such as funding, but yes, ensuring such
representation is very much a high priority for myself, and I am very much
(and painfully) aware that we are not fulfilling this promise yet.
For the choice of language I hope to go through a process similar as we did
for Wikidata, where we worked with the Wikipedia communities to identify
potential language communities that would be interested and willing to work
together with us. I am planning for us to have a similar process within the
next few months.
One advantage of the current state is that the focus for the first part of
the project will be solely on the wiki of functions, not yet on the part
that generates natural language, and that the current plan calls for
additional hires when this second part starts. So all of these decisions
and preparations are not blockers during the first part of the project, but
will be so for the second - and obviously I want to have them resolved well
before.
Also, one correction - we are fortunately not blocked by the availability
of language models in a given language. Since the natural language
generation, as we plan it, is developed by the communities using functions,
we do not need to have a good language model, or in fact, any language
model at all, for the system to work. So we have that going for us.
Finally, as answered to Phoebe, I want to tackle these issues heads-on with
a call for discussing the ethical implications of this project. Your
suggestions are good, and will inform our planning and development, but I
am also aware that, in order to have a fuller picture, we need to hear more
voices and figure out how to have these conversations. This will happen
within the next few months.
Thanks again for raising this important issue! I hope my thoughts on that
make sense, and I am happy to further work on them,
Denny
On Wed, Aug 5, 2020 at 11:19 PM Nick Wilson (Quiddity) <
nwilson(a)wikimedia.org> wrote:
On Wed, Aug 5, 2020 at 2:01 PM Samuel Klein
<meta.sj(a)gmail.com> wrote:
We used to have a roughly weighted list of major
world languages by
(spoken, written; primary, secondary) and how well covered they were by
wp
(articles, contributors). Is there something
like that still?
I think you might be referring to the links in the 3rd and 4th line of
https://meta.wikimedia.org/wiki/Template:Lists_of_Wikipedias ?
Looking more closely, it appears that the "speakers per article" listing is
unfortunately a few years out of date, as the column of "Speakers" was
being manually updated from Ethnologue stats (which are now paywalled).
I've started a tangential discussion on the talkpage there, about using
Wikidata instead.
Additionally, none of those links contain the "primary / secondary
language" statistics, for which I think we'd need to cross-reference with
https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
(
https://www.wikidata.org/wiki/Q1394450) Or perhaps Wikidata can resolve
it
again, as at least some languages' items include a split of the statistics
for that, e.g. Q150. Let's discuss further onwiki?
And +1 to the overall recommendation from C. Scott. :)
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>