For those of you who haven't looked at WiktionaryZ lately, browse around a bit on
There are now about 50 users with access to our edit functions. For those without access, these two screenshots show the functionality that is currently there:
http://wiktionaryz.org/File:Wz-edit.png Adding and removing translations/synonyms, relations, and attributes.
http://wiktionaryz.org/File:Wz-edit2.png Editing and translating existing definitions.
Some of the functionality for generating these forms is already abstracted in a way which will make it suitable for other Wikidata (wiki-based structured data) applications. WiktionaryZ also uses the existing namespace mechanism of MediaWiki in order to access its relational tables, so that mechanisms like Special:Allpages work to retrieve content.
Our biggest challenge is to define a model that allows us to deal with the complex versioning requirements of relational data in an abstract fashion. This is what much of my personal energy will be focused on in the coming months. Other developers will concentrate on the UI bits, on import/export interfaces, and on adding new lexicological and ontological functionality.
We're still working with an old MediaWiki codebase because the namespace manager code which we use is not yet merged into MediaWiki proper. (BRIOOOON ;-) I hope to finish that process in the next couple of weeks. At that point, we will create a new WiktionaryZ branch in SVN so that any changes we make to the main MediaWiki code can be effectively maintained. These changes are currently minimal, and I think we can actually put them in the trunk once they have been made sufficiently abstract and secure (e.g. making it possible to load any PHP class for handling a particular namespace).
We currently have one more developer besides myself, not counting the ongoing work on multilingual functionality for MediaWiki proper (mlmw branch). Hopefully, we will soon have one or two additional developers, and will also be able to bring in some open source volunteers into our team. My own role is increasingly transitioning to that of managing development and defining future strategy, which I am very comfortable with. :-)
For those interested in the project, if all goes well, I will be talking a lot about it at Wikimania's hacking days, as well as the main conference. FWIW, I also hope that we will be able to build strong and meaningful bridges to the Semantic MediaWiki project, which is in many ways complementary to our own effort.
Erik
Thanks for the update, Erik. I am in the process of designing my own Wikidata dataset and have some questions which I'll discuss with you offline.
There are several issues which I've found, though, that are of general concern so I'll post them here. All of this is based upon the WiktionaryZ tarball you released a couple months back, so sorry if some of this is no longer applicable:
1. Wikidata uses a single set of tables for storing all multilingual content: shorttext , translated_content . The problem with this approach is that it will not scale: small multilingual tables with limited ("seed data") content will be forced to do lookups on these gigantic and ever-growing tables, and exporting a single dataset will bog-down unnecessarily doing lookups (or even worse, full-table scans!) on these generic tables. I have put a page on Meta- [[m:Multilingual Wikidata]]- which discusses an approach for striping tables with multilingual content
2. The tables for defining languages and language groups seem to be shared between multilingual Mediawiki (the software) and WiktionaryZ (user data). I think this is not only conceptually incorrect, but also a security hole. See my comments from a while back: [[m:Talk:Ultimate_Wiktionary_data_design#.22Dog_Food.22]]
Anyway, it would be much appreciated if you could post more Wikidata designs as you come up with them. Thanks, and keep up the good work.
wikitech-l@lists.wikimedia.org