Hoi,

I have been asked by Erik what can be done to better support small languages and in particular what we can do to support more small languages more effectively. I have thought about it for a long time and as far as I am concerned, try as we might it will not happen as long as there is no clear benefit we bring. In this text I describe how we can provide more value to people of any language. For new and small languages the emphasis will be on bold and easy strokes that have a big impact.

Key in all this will be that we have to connect to what is already there. This makes search key. Two of the most important objectives are finding pictures and finding information. Wikidata provides the most obvious tool in this because it takes little effort to connect to the information that is already there. Half an hour a day on labelling items that are in the news will swell the most often searched terms rapidly in any language.

When people search for something, they either find it or they do not. When a search is entered and nothing is found, it may exist either under a different spelling or in a different language. When something is NOT found, we should ask if the person knows a synonym in his language or a translation in another language. With the new term we iterate in the search. When something is found after one iteration, we ask if this item is indeed what was intended to be found. One image and a first paragraph of text should suffice. When it does, we add the search item as a (dirty) label. Adding labels in this way will quickly swell the number of terms available in a language for a search. Most importantly we make from a failure a success. A success that benefits everyone who seeks the same information.

When an item is found in a language, we can provide information in that language in the format of an infobox or a reasonator page. Obviously many statements may not exist in that language. They are blinking or presented in another language or whatever so that they can be added in the primary language. This approach will ensure that a teacher can select the search terms he is interested in and prepare the information for his students.

Another approach is to learn where we fail to provide information. We do not know what search terms fail most often. Consequently we do not have the tool to remedy this in any language. The basis of data driven user participation is that we KNOW what to ask for and why. When people start to find pictures because of the link Wikidata has with Commons, we need to understand it and see it coming before kids in school from all over the world really start hammering our servers.

The objective is to reach the tipping point where we become useful in a language.

I have been asked to become an advisory board member for the PanLex Project of The Long Now Foundation. I have accepted this and what they are interested in is experimenting with one language and see how their content can make a difference in Wikidata but equally how Wikidata can make a difference in Wikidata. My take on their objective is that their work makes no difference if it is not used. An experiment will see their staff work on leveraging our data and software and vice versa. In my opinion this will make information useful as explained above.'

We have the opportunity to experiment with the Long Now Foundation and at the same time develop tooling that will help all our languages and will help us reach the tipping point where Wikidata is useful for all of them.

I also propose to change the criteria for accepting new WMF projects. So far we asked for Wikipedia many written articles of high quality. Effectively we accepted many articles of a stub quality. What I propose is to have something like 50 articles of a substantial size and complement this with 250 items that have labels for all the statements. These 250 items cover many domains but are optimised for being what people are likely to search.

I have been pushing and experimenting along these lines. The result is a search tool using Wikidata in Wikipedia. A demonstration that Wikidata knows more items than Wikipedia has articles. Visualisation for people and organisms in the "Reasonator" and a personal conviction that increasingly says that this is how we can grow any language to the fullest of its potential.

My question is, what do you think. How can we be implement this. What more can we do.

Thanks,

Gerard

PS I fear that when children find that they can find pictures in THEIR language, that they will be able to bring our servers down.. A luxury problem I am sure :)

PS-2 A big thank you to Magnus Manske and Lydia Pintscher for the wonderful work they do.