I have some experience with this sort of thing, so thought I would add my 2p to the information pool being shared here.
1) In general, there is no such thing as a universal format. Having a data mediation format that spans versions is often an intractable problem to solve. Essentially, if we can find a format that is agnostic to any version of the application, then we would just use that format as the data schema and not worry about data migrations for any version change because every version uses the same format. Finding such a format nearly always subsumes the possibility of future application innovation.
2) An existing standard can be settled upon that meets core needs. In this case, the stakeholders identify a standard format that has some level of widespread use and agree to always have the capability to export and import in that format. This is how we individually overcome limits in the applications we use daily. Specifically, we often search for a Save-As format from a source application that we know is accommodated by a destination application. The problem with this is that although it can be lossy, it is more likely to be gainful - meaning that the importing application has to make assumptions in order to fill in missing data that it might need. This solution is not ideal, primarily because there may be a data requirement of the importing application that cannot be algorithmically determined. As a result, human intervention might be required for each unit of data imported. This is certainly not a reasonable solution for even moderately sized datasets of just a few hundred elements.
3) Look-ahead designs are used before features are implemented. In this case, a very heavy-weight design effort attempts to prognosticate the data design well ahead of code implementation. This actually can be done if innovation is buffered and features are queued and agreed upon well in advance. This is about as un-agile as software development gets, however; and, as most software engineers know, it is brutally difficult to design something to this level of detail so far ahead of implementation (and indeed it almost always fails in my experience).
4) Create a migration mechanism for each release. This is typically what is done. The reasons are simple, the source application data formats are well known and the destination data formats are well known. The only thing needed is an intelligent mapping from one to the other. As Lee has pointed out, the problem with this is that it places a burden on the user community to stay abreast of development whenever a migration is required.
I am sure there are other analyses in the solution domain, but the above is off the top of my head. Although certainly not empirical, I conjecture that an industry best practice is to provide 4) as a minimum, and support a collection of widespread formats for 2).
Sorry for rambling on about this, but this has been a problem that has been around for a long time in software engineering circles. Comments and criticisms welcome.
Thanks, George
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of Lee Daniel Crocker Sent: Monday, March 28, 2005 11:26 AM To: Wikimedia developers Cc: Mediawiki List Subject: [Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export format
On Mon, 2005-03-28 at 17:51 +0200, Lars Aronsson wrote:
It sounds so easy. But would you accept this procedure if it requires that Wikipedia is unavailable or read-only for one hour? for one day? for one week? The conversion time should be a design requirement. ... Not converting the database is the fastest way to cut conversion time. Perhaps you can live with the legacy format? Consider it.
A properly written export shouldn't need to have exclusive access to the database at all. The only thing that would need that is a complete reinstall and import, which is only one application of the format and should be needed very rarely (switching to a wholly new hardware or software base, for example). In those few cases (maybe once every few years or so), Wikipedia being uneditable for a few days would not be such a terrible thing--better than it being down completely because the servers are overwhelmed.
mediawiki-l@lists.wikimedia.org