I am Gabriel Lee from Hong Kong, I would like to be a member in the
Langcom. I know zh-n, zh-yue-5, en-5, ja-2 and fr-2. Please contact me with
sending a mail to chihonglee777(a)gmail.com. Thanks a lot!
Gabriel Chi Hong Lee
I just read this . They will probably want to include this in their
Wikipedia and I am sure that it is better to do this in a Wikipedia than it
is to add it in the Incubator..
My proposal is to allow them a Wikipedia if we can get some assurances that
they will do asap the work needed for localisation..
Hope you concur.
You may have missed it in the mail I send yesterday, but I want to propose
to change the requirements for a new Wikipedia.
My proposal is to ask for the current number of full articles or provide 50
full articles and the complete labelling of a selected set of 250 Wikidata
The rationale is that it takes much less effort to work on Wikidata and it
will quickly amount to providing additional results for other items. An
example: the property "sex" should be in use on all humans and the
qualifiers "male" and "female" should be in use on all of these as well.
Given that we can provide results from Wikidata in a search on Wikipedia
and given that we can provide visualisation for humans and organisms
already, the arguments in favour of this move at this time are quite strong.
Add to this that through Wikidata we can connect to Commons and we have all
the attributes why this could quickly have a big impact for the usability
of our projects for a new language.
When it proves that we are in favour of this move, I expect it will require
confirmation from the board. It will also mean that WMF has to consider
some changes to the official search routines. (I expect this will NOT be a
I have been asked by Erik what can be done to better support small
languages and in particular what we can do to support more small languages
more effectively. I have thought about it for a long time and as far as I
am concerned, try as we might it will not happen as long as there is no
clear benefit we bring. In this text I describe how we can provide more
value to people of *any *language. For new and small languages the emphasis
will be on bold and easy strokes that have a big impact.
Key in all this will be that we have to connect to what is already there.
This makes search key. Two of the most important objectives are finding
pictures and finding information. Wikidata provides the most obvious tool
in this because it takes little effort to connect to the information that
is already there. Half an hour a day on labelling items that are in the
news will swell the most often searched terms rapidly in any language.
When people search for something, they either find it or they do not. When
a search is entered and nothing is found, it may exist either under a
different spelling or in a different language. When something is NOT found,
we should ask if the person knows a synonym in his language or a
translation in another language. With the new term we iterate in the
search. When something is found after one iteration, we ask if this item is
indeed what was intended to be found. One image and a first paragraph of
text should suffice. When it does, we add the search item as a (dirty)
label. Adding labels in this way will quickly swell the number of terms
available in a language for a search. Most importantly we make from a
failure a success. A success that benefits everyone who seeks the same
When an item is found in a language, we can provide information in that
language in the format of an infobox or a reasonator page. Obviously many
statements may not exist in that language. They are blinking or presented
in another language or whatever so that they can be added in the primary
language. This approach will ensure that a teacher can select the search
terms he is interested in and prepare the information for his students.
Another approach is to learn where we fail to provide information. We do
not know what search terms fail most often. Consequently we do not have the
tool to remedy this in any language. The basis of data driven user
participation is that we KNOW what to ask for and why. When people start to
find pictures because of the link Wikidata has with Commons, we need to
understand it and see it coming before kids in school from all over the
world really start hammering our servers.
The objective is to reach the tipping point where we become useful in a
I have been asked to become an advisory board member for the PanLex
Project<http://panlex.org>of The Long Now Foundation. I have accepted
this and what they are
interested in is experimenting with one language and see how their content
can make a difference in Wikidata but equally how Wikidata can make a
difference in Wikidata. My take on their objective is that their work makes
no difference if it is not used. An experiment will see their staff work on
leveraging our data and software and vice versa. In my opinion this will
make information useful as explained above.'
We have the opportunity to experiment with the Long Now Foundation and at
the same time develop tooling that will help all our languages and will
help us reach the tipping point where Wikidata is useful for all of them.
I also propose to change the criteria for accepting new WMF projects. So
far we asked for Wikipedia many written articles of high quality.
Effectively we accepted many articles of a stub quality. What I propose is
to have something like 50 articles of a substantial size and complement
this with 250 items that have labels for all the statements. These 250
items cover many domains but are optimised for being what people are likely
I have been pushing and experimenting along these lines. The result is a
search tool using Wikidata in Wikipedia. A demonstration that Wikidata
knows more items than Wikipedia has articles. Visualisation for people and
organisms in the "Reasonator" and a personal conviction that increasingly
says that this is how we can grow any language to the fullest of its
My question is, what do you think. How can we be implement this. What more
can we do.
PS I fear that when children find that they can find pictures in THEIR
language, that they will be able to bring our servers down.. A luxury
problem I am sure :)
PS-2 A big thank you to Magnus Manske and Lydia Pintscher for the wonderful
work they do.
When a new language is deemed "eligible" we, the language committee, inform
the chair of the board because he or she has a week to indicate that the
board does not agree with our recommendation. When the chair of the board
does not reply a bug in bugzilla will be asking for the new language.
At this time I ask formal permission for Ottoman Turkish to be
*exclusively*used for Wikidata and its applications. Ottoman Turkish
is a historic
language but with information in this language we will be able to include
the original name of something in an info-box in any Wikipedia.
Jan-Bart we are likely to make requests like this for many living languages
in the near future as well because it is expected that in 6 months time
development starts of integrating Wikidata to Commons. This will make it
possible to optimally search for pictures in the native language of the
children of this world. To a large extend this is already possible for the
"big"languages like Dutch. (It only requires a relatively easy hack for
this to happen.... Interested? )
In the language committee we have agreed that any iso-639-3 language may be
eligible for adding labels in Wikidata. This will open up Wikidata as a
source in its own right and it will allow for searching images and
Wikipedia articles in other languages.
There are technical considerations that need to be sorted. Particularly the
method whereby languages are enabled within the Wikimedia Foundation. I
have created a concept  for a blogpost announcing this decision.It has
already been read by Amir so it can be safely read on Meta.
One of the reasons for this is an old request by Erik to look into making
it easier for languages to find their way into WMF. It is likely that one
minority language will get its data into Wikidata when this policy is