Magnus-
Completely ignoring the progress on the according meta
page, I went
ahead and created a database feature for the wiki.
That's fine, it probably wouldn't have gone anywhere for the next couple
of months or so anyway. So thanks a lot for taking the time to code this
up and giving us something to chew on. You're aiming for another Magnus
Manske day, aren't you? ;-)
The system works nicely as a proof of concept, but I am wary of using it
as something to build upon, for these reasons:
1) I would like the data tables to be integrated with the new database
scheme, so that regular wiki pages are just another type of data, and new
fields can easily be attached to them.
Treating data as an afterthought will make integration with key wiki
concepts like page histories, diffs, recent changes, watchlists, and so
forth harder. Having a single Data: namespace also strikes me as too
limited. And of course if we intend to make it possible to manage multiple
languages in one database, Wikidata should be no exception to that rule
and use the same conventions as other pages.
My current thinking is that
wikidata.org will be used for general
databases that are of possible use to multiple Wikimedia sites, but that
individual Wikimedia sites like Wiktionary may want to make use of some
Wikidata functionality to "enrich" their content. Our software should
allow both, by having editable field definitions for each namespace.
The subdomain setup as proposed on Meta should not be taken to indicate
that these should indeed be separate wiki installations. With that I
merely wish to express that we should try to group related purposes, so
that we can get filtered RC, watchlists etc. But that grouping may perhaps
best happen on an individual user level, with a reasonable set of default
groups presented to anons. Again, this shows the importance of integrating
this into our general database scheme, as filtering and aggregation are
much-requested features for RC, watchlists etc. in general.
2) Your present model does not allow for relations between data, which is
essential for anything but the simplest of use cases. Right now you
basically hardcode these relations into the data model using the
"dropdown" declaration (in your example, the movie's color). Where this
isn't possible, you use simple multiline fields (in your example, the
actors). This of course makes it very hard to do proper queries, updates
etc.
3) Storing everything in MEDIUMTEXT fields is probably not a good idea,
but I'd like to do some benchmarks on that. I'm particularly concerned
about queries - our indexing options with BLOBs are very limited, and our
problems with the MySQL FULLTEXT indexing over the years have shown that
this may not be the most reliable thing to do. This is why in the
[[m:Wikidata]] scheme I proposed a limited set of data-* tables for
different types.
- - - - - - -
All the big stuff that is happening right now makes me think that we
should probably start branching towards a "phase 4" (V2.0) soon, while
also kicking out a 1.4 and stabilizing it.
Regards,
Erik