Hey all,
I've run into some issues with storing revisioned configuration during the creation of the Contest extension and the upload campaigns functionality in the Upload Wizard, as well as in several other extensions. Since this issue appears to be a quite common, this email is intended to outline the problem and kick off discussion about how it can be resolved.
== Problem ==
Many extensions need some kind of admin UI in which non-user specific configuration can be done, ie creating contests or upload campaigns. If this configuration is just stored in some db tables created specially for this purpose, you do not have the ability to revert changes, restore deleted entities, simply compare changes, see who changes what and when, ect. This is quite problematic on big wikis, since someone can accidentally break a lot of functionality by deletion something, and then lack the ability to restore this. Or worse, if access to an account with sufficient privileges is obtained by a malicious person, he can modify configuration without this being logged anywhere. So what we need is to have similar versioning and logging for this kind of configuration as for wiki pages.
== Why wiki pages are not the answer ==
Historically, a few extensions have been storing simple configuration in wiki pages, typically in the MediaWiki namespace. Unfortunately there are several issues with this approach when it comes to more complex configuration, such as a contest. Since the data to be stored is an object, it needs to be serialized to fit into a wiki page. The general consensus here seems to be to use JSON.
Problem 1: querying the data. Since it's not stored in relational form, how are you going to get a list of active contests, or do any of the many other queries that are needed by the extension? One could solve this by not only serializing the data to a wiki page, but on every change, also update a redundant relational copy in the db. This is less nice though, because of the redundancy, and because you'd be looking at diffs of JSON instead of the changes to the single value you care about. So even though it would be a hack, you could go with this approach if not for problem 2.
Problem 2: no fine grained rights whatsoever. Contests should only be viewable by contest admins and contest judges, and only editable by the former. An approach that goes part of the way to providing this is putting each type of object (ie contest, upload campaign) into it's own namespace, and applying the needed rights restrictions on that. Since MediaWiki is not designed to prevent read access, I'd not be to eager to put anything that should not be accessible by non authorized people in there though. And this approach does not work at all when you need more fine grained restrictions, such as people with a certain right only being able to edit part of the configuration object.
And in the end, storing config in wiki pages is a hack, they where not designed to facilitate such usage.
== Request for Discussion ==
How would you go about creating a generic solution for storing revisioned configuration?
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Hello Jeroen,
On 10 November 2011 09:37, Jeroen De Dauw jeroendedauw@gmail.com wrote:
How would you go about creating a generic solution for storing revisioned configuration?
The term commonly used for this is a database audit trail. Of course, mediawiki already implements this for one table (Page, with Revision as audit table). In general, I think the important thing to keep in mind is that you want a log of what happens - you could even consider dumping it to a text file, for instance. You would need to revert changes manually, but it's still better than nothing.
In general, I think there are three options that are in general use (when storing the data in a database): a temporal database, an oldversion table, or a changes table. The last two can be in two forms: one for the entire database, storing data as JSON or XML, or with one audit table per 'real' table).
A temporal database just adds two columns: valid_from, valid_to, and maybe another to actually store data on who changed data. If data is changed, the old value get valid_to set to the current date & time, and the new value has valid_from set to the current date & time.
An oldversion table essentially is mediawiki is doing for Page objects: you have a list of 'revisions' of the objects with related information. Page/Revision is an example of having one audit table for one real table. The JSON alternative is (in terms of database design) ugly, but it means you don't have twice the number of tables floating around.
The last option is storing 'actions': essentially storing the SQL queries that are run on the table. You would thus have an entry 'INSERT' with certain values, 'UPDATE', and 'DELETE'. Again, you can use this in a nice database form or a serialized form.
On the question of which one is the best to use in this context -- I don't know. In general, I think it makes some sense to use the tools you already have (i.e. store it in page text), but as you noted this is inconvenient if you don't want configuration public.
Anyway, I hope this was sort of useful, at least to help you find more information.
Good luck, and best regards, Merlijn van Deen
wikitech-l@lists.wikimedia.org