Completely ignoring the progress on the according meta page, I went ahead and created a database feature for the wiki.
It is in CVS HEAD. You'll have to run the two CREATE queries (in SpecialData.php, as a comment) and set $wgUseData = true ; in your LocalSettings.php
A new "Data" namespace contains the display/edit form for entering the data sets. NOTE: This is mono-language only at the moment, but I don't see a problem in either putting many language versions into it, or use data namespaces from several databases.
Example [[Data:Movie]] :
{| cellpadding=5 !Field!!width='50%'|Value!!Notes |- |Title||((!title/line))|| |- |Year||((year/number))|| |- |Tagline||((tagline/line))|| |- |Plot summary||((plot/multiline))|| |- |Actors||((actors/multiline))|| |- |Runtime||((runtime/number))||min |- |Country||((country/line))|| |- |Color||((color/dropdown/Technicolor/B&W))|| |}
This display similar to the mock-up I found on meta. Keys are defined like ((this)), parameters for the edit screen are separated by a "/". Note the multiple options for the ((color)) key, and the ((!title)) key, where the "!" defines the primary key.Multiple versions (history) are made on entries with the same primary key. Currently, only one primary key is allowed, but it would be no real problem to change that.
The actual data entry/display is done via Special:Data. At the moment, it can already add data using the above form, preview it, and store it with multiple revisions (like wiki). I have started minimal work on listing entries, but will probably continue later today.
Now begin stomping me for just hacking this without informing anyone first ;-)
Magnus
Magnus Manske wrote:
Completely ignoring the progress on the according meta page, I went ahead and created a database feature for the wiki.
It is in CVS HEAD. You'll have to run the two CREATE queries (in SpecialData.php, as a comment) and set $wgUseData = true ; in your LocalSettings.php
A new "Data" namespace contains the display/edit form for entering the data sets. NOTE: This is mono-language only at the moment, but I don't see a problem in either putting many language versions into it, or use data namespaces from several databases.
Example [[Data:Movie]] :
{| cellpadding=5 !Field!!width='50%'|Value!!Notes |- |Title||((!title/line))|| |- |Year||((year/number))|| |- |Tagline||((tagline/line))|| |- |Plot summary||((plot/multiline))|| |- |Actors||((actors/multiline))|| |- |Runtime||((runtime/number))||min |- |Country||((country/line))|| |- |Color||((color/dropdown/Technicolor/B&W))|| |}
This display similar to the mock-up I found on meta. Keys are defined like ((this)), parameters for the edit screen are separated by a "/". Note the multiple options for the ((color)) key, and the ((!title)) key, where the "!" defines the primary key.Multiple versions (history) are made on entries with the same primary key. Currently, only one primary key is allowed, but it would be no real problem to change that.
The actual data entry/display is done via Special:Data. At the moment, it can already add data using the above form, preview it, and store it with multiple revisions (like wiki). I have started minimal work on listing entries, but will probably continue later today.
Now begin stomping me for just hacking this without informing anyone first ;-)
Magnus _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Magnus,
Would it be possible to create the GEMET glossary database using this tool ? How hard would it then be to create a XML import for the GEMET data ??
We have the SQL statements for GEMET, we have the XML data and we have consent to use them all. It would be an excerice to explain how, with more additions, Wiktionary in a database format would look like.
Stomping, isn't that a way of dancing ?? This is SOO timely as far as I am concerned :)
Thanks, GerardM
Gerard Meijssen wrote:
Would it be possible to create the GEMET glossary database using this tool ? How hard would it then be to create a XML import for the GEMET data ??
Well, it can basically recreate everything that fits into an SQL dump and has a unique key. Table "Xyz" would become [[Data:Xyz]], and each data row becomes a first revision of a primary key.
What I can't reproduce, at least at the moment, is SQL queries. But I doubt anyone would volunteer to write an SQL frontend for this;-) But then, for just storage/retival, it won't be necessary anyway.
I just lookes at the XML data of GEMET. There's a file for every language, so do you want 1) make tables [[Data:GEMET_DE]], [[Data:GEMET_EN]] etc., or 2) make a single table [[Data:GEMET]] with different fields for each language?
Both is possible, alas, I have not yet written the retrival part...
Magnus
Magnus Manske wrote:
Gerard Meijssen wrote:
Would it be possible to create the GEMET glossary database using this tool ? How hard would it then be to create a XML import for the GEMET data ??
Well, it can basically recreate everything that fits into an SQL dump and has a unique key. Table "Xyz" would become [[Data:Xyz]], and each data row becomes a first revision of a primary key.
What I can't reproduce, at least at the moment, is SQL queries. But I doubt anyone would volunteer to write an SQL frontend for this;-) But then, for just storage/retival, it won't be necessary anyway.
I just lookes at the XML data of GEMET. There's a file for every language, so do you want
- make tables [[Data:GEMET_DE]], [[Data:GEMET_EN]] etc., or
- make a single table [[Data:GEMET]] with different fields for each
language?
Both is possible, alas, I have not yet written the retrival part...
Magnus _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
The easiest thing is to create the tables as defined in the SQL definitions you will find in META. This will create the stuff as is at GEMET. This will allow for changes to be made to the data (like adding "hypoxia".
Thanks, GerardM
On Sep 24, 2004, at 8:00 AM, Magnus Manske wrote:
Completely ignoring the progress on the according meta page, I went ahead and created a database feature for the wiki.
Magnus, *please* don't forget to escape strings before putting them in HTML output or SQL commands.
Also, please remember to use $wgRequest rather than $_REQUEST and $_POST directly; this is meant to: a) provide some limited basic validation b) provide for default values if the item is not set c) perform some character conversion when necessary d) validate and normalize UTF-8 text input e) allow for wrapping requests in a way that can be tested or sub-requested easily without messing with on fifty separate globals
Additionally there's no need to specify MyISAM as the table type if there's not a fulltext index. Better to use the default (which may not be MyISAM).
-- brion vibber (brion @ pobox.com)
Magnus-
Completely ignoring the progress on the according meta page, I went ahead and created a database feature for the wiki.
That's fine, it probably wouldn't have gone anywhere for the next couple of months or so anyway. So thanks a lot for taking the time to code this up and giving us something to chew on. You're aiming for another Magnus Manske day, aren't you? ;-)
The system works nicely as a proof of concept, but I am wary of using it as something to build upon, for these reasons:
1) I would like the data tables to be integrated with the new database scheme, so that regular wiki pages are just another type of data, and new fields can easily be attached to them.
Treating data as an afterthought will make integration with key wiki concepts like page histories, diffs, recent changes, watchlists, and so forth harder. Having a single Data: namespace also strikes me as too limited. And of course if we intend to make it possible to manage multiple languages in one database, Wikidata should be no exception to that rule and use the same conventions as other pages.
My current thinking is that wikidata.org will be used for general databases that are of possible use to multiple Wikimedia sites, but that individual Wikimedia sites like Wiktionary may want to make use of some Wikidata functionality to "enrich" their content. Our software should allow both, by having editable field definitions for each namespace.
The subdomain setup as proposed on Meta should not be taken to indicate that these should indeed be separate wiki installations. With that I merely wish to express that we should try to group related purposes, so that we can get filtered RC, watchlists etc. But that grouping may perhaps best happen on an individual user level, with a reasonable set of default groups presented to anons. Again, this shows the importance of integrating this into our general database scheme, as filtering and aggregation are much-requested features for RC, watchlists etc. in general.
2) Your present model does not allow for relations between data, which is essential for anything but the simplest of use cases. Right now you basically hardcode these relations into the data model using the "dropdown" declaration (in your example, the movie's color). Where this isn't possible, you use simple multiline fields (in your example, the actors). This of course makes it very hard to do proper queries, updates etc.
3) Storing everything in MEDIUMTEXT fields is probably not a good idea, but I'd like to do some benchmarks on that. I'm particularly concerned about queries - our indexing options with BLOBs are very limited, and our problems with the MySQL FULLTEXT indexing over the years have shown that this may not be the most reliable thing to do. This is why in the [[m:Wikidata]] scheme I proposed a limited set of data-* tables for different types.
- - - - - - -
All the big stuff that is happening right now makes me think that we should probably start branching towards a "phase 4" (V2.0) soon, while also kicking out a 1.4 and stabilizing it.
Regards,
Erik
wikitech-l@lists.wikimedia.org