Scott,
thank you for raising this really important issue, and I whole-heartedly agree. Since I heard of Ibram X. Kendi's argument to not just be not racist but rather be actively anti-racist, I thought a lot about it (I have a long essay trying to sort my thoughts on that, but I am not sure my voice is helpful in that conversation). But yes, I agree with the sentiment and the idea.
Another statement that has deeply influenced my thinking in preparation for this project was the statement "nothing about us without us", and the implications of that for the Abstract Wikipedia project (and how, currently, we are not really achieving it).
So, in short, yes, I want to commit to both of these as guidelines for how the project will unfold.
Having a specific, non-European and underrepresented language as a first-class development target is a great suggestion, and having someone on the core team with a native-level grasp of that language is, I think, a very good suggestion. Whether and when we can actually implement this depends on a number of factors, such as funding, but yes, ensuring such representation is very much a high priority for myself, and I am very much (and painfully) aware that we are not fulfilling this promise yet.
For the choice of language I hope to go through a process similar as we did for Wikidata, where we worked with the Wikipedia communities to identify potential language communities that would be interested and willing to work together with us. I am planning for us to have a similar process within the next few months.
One advantage of the current state is that the focus for the first part of the project will be solely on the wiki of functions, not yet on the part that generates natural language, and that the current plan calls for additional hires when this second part starts. So all of these decisions and preparations are not blockers during the first part of the project, but will be so for the second - and obviously I want to have them resolved well before.
Also, one correction - we are fortunately not blocked by the availability of language models in a given language. Since the natural language generation, as we plan it, is developed by the communities using functions, we do not need to have a good language model, or in fact, any language model at all, for the system to work. So we have that going for us.
Finally, as answered to Phoebe, I want to tackle these issues heads-on with a call for discussing the ethical implications of this project. Your suggestions are good, and will inform our planning and development, but I am also aware that, in order to have a fuller picture, we need to hear more voices and figure out how to have these conversations. This will happen within the next few months.
Thanks again for raising this important issue! I hope my thoughts on that make sense, and I am happy to further work on them, Denny
On Wed, Aug 5, 2020 at 11:19 PM Nick Wilson (Quiddity) < nwilson@wikimedia.org> wrote:
On Wed, Aug 5, 2020 at 2:01 PM Samuel Klein meta.sj@gmail.com wrote:
We used to have a roughly weighted list of major world languages by (spoken, written; primary, secondary) and how well covered they were by
wp
(articles, contributors). Is there something like that still?
I think you might be referring to the links in the 3rd and 4th line of https://meta.wikimedia.org/wiki/Template:Lists_of_Wikipedias ? Looking more closely, it appears that the "speakers per article" listing is unfortunately a few years out of date, as the column of "Speakers" was being manually updated from Ethnologue stats (which are now paywalled). I've started a tangential discussion on the talkpage there, about using Wikidata instead. Additionally, none of those links contain the "primary / secondary language" statistics, for which I think we'd need to cross-reference with https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers (https://www.wikidata.org/wiki/Q1394450) Or perhaps Wikidata can resolve it again, as at least some languages' items include a split of the statistics for that, e.g. Q150. Let's discuss further onwiki?
And +1 to the overall recommendation from C. Scott. :) _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe