A new and very large problem domain for Asian language groups to
participate in / re-engage with.
short version: Importing Wiktionary into Wikidata is now a real thing.
---------- Forwarded message ----------
From: Léa Lacroix <lea.lacroix(a)wikimedia.de>
Date: Wed, May 23, 2018 at 7:33 PM
Subject: [Wikidata] First experiment of lexicographical data is out
To: "Discussion list for the Wikidata project." <wikidata(a)lists.wikimedia.org>
After several years discussing about it, and one year of development
and discussion with the communities, the development team has now
released the first version of lexicographical data support on
Since the start of Wikidata in 2012, the multilingual knowledge base
was mainly focused on concepts: Q-items are related to a thing or an
idea, not to the word describing it. Starting now, Wikidata stores a
new type of data: words, phrases and sentences, in many languages,
described in many languages. This information will be stored in new
types of entities, called Lexemes, Forms and Senses. It will allow
editors to describe precisely all words in all languages, and will be
reusable, just like the whole content of Wikidata, by multiple tools
and queries, everything that the community creates to play with words.
Lexicographical data can be reused inside and outside the Wikimedia
projects, and can provide support for Wiktionary.
The first release
A new namespace and several new entity types have been created in
order to model words and phrases. If you’re new to this project, you
can learn more by looking at the documentation, briefly describing the
data model and the interface. The technical structure is set, but the
editors remain free to model and organize data as they prefer, with
the usual open discussions and community processes that we apply on
Wikidata. Some discussions about new properties to create have already
started: if you want to be involved in the early stage of the project
to shape it, please participate!
Please note that the version that is now deployed is a first
experiment, that will be continuously improved in the future. Some
features are missing, some bugs may certainly occur. Here are the
features that are included in the first release:
Add, edit and delete Lexemes, Forms, statements, qualifiers, references
Link between the different entity types (Item to Lexeme, Form to Item, etc.)
Entity suggestion when adding a property or a value
And the following features will not be included in the first version,
but are planned for the future:
Find Lexemes and Forms via Special:Search
RDF support (which also means: the ability to query it with query.wikidata.org)
Support for Senses
Merging of Lexemes
Including the data on other Wikimedia projects, such as Wiktionary
How to try it?
The features described above are now deployed on Wikidata.org. Here
are some suggestions of what you can do to explore this new territory:
If you’re not familiar with the structure of Lexemes, have a look at
Look at what is already existing. Please note that Special:Search and
the search bar on the top right corner of pages is not supporting
Lexemes yet. We’re working on this.
Create a new Lexeme with Special:NewLexeme
If a property that you need is missing, you can suggest it here
Discuss about how to model words and ask questions on Wikidata
Report bugs or issues that you may encounter: either on the talk page
or on Phabricator, if you’re comfortable using it (create a task, add
the tag Lexicographical data, and add Lea_Lacroix_(WMDE) as a
About mass imports and tools
We kindly ask you to not plan any mass import from any source for the
moment. There are several reasons behind that: first of all, like
mentioned above, the release is a first version and we need to observe
how our system reacts to the manual edits before starting considering
automatic ones. The system may not be ready for big massive imports at
the beginning. Second reason is legal. Lexicographical data in
Wikidata is released under CC0, and the responsibility of each editor
is to make sure that the data they will add is compatible with CC0.
For more information, you can have a look at the advice of WMF Legal
team. Finally, we strongly encourage you to discuss with the
communities before considering any import from the Wiktionaries.
Wiktionary editors have been putting a lot of efforts during years to
build definitions, and we should be respectful of this work, and
discuss with them to find common solutions to work on lexicographical
data and enjoy the use of it together.
We also suggest you to wait a bit before building tools or scripts on
the top of lexicographical data. The interface and its API are
probably going to evolve during the next months, and the system may
not be stable enough to support such tools. We will inform you as soon
as it will be possible.
After this first release, some improvements will be made on a very
regular basis (new deployments every week). Once you tried playing
with the new data, feel free to give us feedback. We’re looking
especially to know what are the most important features for you to be
worked on next.
What did you experiment while editing lexicographical data? What went
wrong or was unexpected?
What bugs or troubles during the process did you encounter?
What are the features that are, in your opinion, the most important?
Which one should we work on next?
If you’re interested in following the discussions and further
announcements about lexicographical data, I encourage you to follow
Wikidata:Lexicographical data and its talk page, where we will discuss
about how to organize and structure data, new features to be added,
ideas of tools and queries, and a lot of other things.
Additional note: with this new kind of data enabled on Wikidata, we
expect some new editors to get interest in it, edit Lexemes, suggest
properties or ask questions. They may not be familiar with all of our
community processes and our ways to organize content. They will need
help and support as well as links to useful resources to understand
how the Wikidata community works. I hope that we will all be kind and
patient, both with other editors and with the software that may not
work exactly as we want it to at the beginning :)
Thanks to the people who tested the model and the interface before the
release, who showed support and curiosity about lexicographical data
If you have any question or idea, feel free to write on Wikidata
talk:Lexicographical data or contact me.
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list