The on-wiki version of this is here: https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Focus_languages

Hello all,

The Wikidata team at Wikimedia Deutschland will be working on improvements to the lexicographic data part of Wikidata during this year. The Abstract Wikipedia team at the Wikimedia Foundation will be working on the generation of natural language text for baseline Wikipedia articles in the next few years, and on functions in Wikifunctions to work with lexicographic data. For these cases, it would be beneficial to focus on a small specific set of languages at first. Participating communities will hopefully find that this project leads to long-term growth in Wikipedia and Wiktionary in and about their language.

Lydia and Denny would like to choose the same focus languages for both of the teams, as this is beneficial for both projects to have this aligned.

We will be working closely together with the focus communities over the next few years. This means that features will land first in these languages and we will have particularly active feedback channels. We are looking for communities that are open to trying out new things.

The decision of which languages should be the focus languages should be done together with the wider communities. In particular, we would like to make the decision with a promising self-selecting community. This worked very well for Wikidata, where the focus projects were self-selected.

We will use English as a demonstration language and two or three other languages as focus languages. English is chosen as it is easy to demonstrate to a wide audience and is a working language for both development teams.

For the focus languages, we want to work with an active and enthusiastic community or seed of a community over the next few years on these projects.

In order to be fully transparent, we have compiled a number of detailed other criteria we would like to use to guide us in our decision, but this assumes that there are communities to choose from. None of these criteria are set in stone, and we are happy to discuss them, remove some if they are not good ideas, or add others if we missed something. Regard this as a strawdog proposal. For example, Mahir Morshed came up with a complementary set of criteria on Phabricator, which we will consider in the selection as well. We will have Q&A office hours for discussion, and are open to comments via wiki or email.

We are thinking of a two-pronged approach:

first, to call for communities to propose themselves to work with us;
second, to look at the data and see which languages would be good candidates.

We don’t want to set too strict a process. We would like the second prong of the approach to go on throughout the whole process to help us come to a good understanding of the options.

For the first prong, we would like the candidate seed groups to describe and nominate themselves on wiki, following a short form. Nominations should be submitted by April 7, and the decision will be made by April 14 by the teams taking your comments into account. If we notice that self-nominations are not happening, we will try to engage with language communities directly.

It is possible that the two teams will choose different candidates, although we will try to avoid that.

We are looking forward to hearing about what you think of this proposal. Please comment on the talk page on wiki.

Lydia and Denny