[Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export format

George Stevens gstevens at guidelightsolutions.com
Tue Mar 29 00:48:24 UTC 2005


I have some experience with this sort of thing, so thought I would add
my 2p to the information pool being shared here.

1) In general, there is no such thing as a universal format.
Having a data mediation format that spans versions is often an
intractable problem to solve.  Essentially, if we can find a format that
is agnostic to any version of the application, then we would just use
that format as the data schema and not worry about data migrations for
any version change because every version uses the same format.  Finding
such a format nearly always subsumes the possibility of future
application innovation.

2) An existing standard can be settled upon that meets core needs.
In this case, the stakeholders identify a standard format that has some
level of widespread use and agree to always have the capability to
export and import in that format.  This is how we individually overcome
limits in the applications we use daily.  Specifically, we often search
for a Save-As format from a source application that we know is
accommodated by a destination application.  The problem with this is
that although it can be lossy, it is more likely to be gainful - meaning
that the importing application has to make assumptions in order to fill
in missing data that it might need.  
	This solution is not ideal, primarily because there may be a
data requirement of the importing application that cannot be
algorithmically determined.  As a result, human intervention might be
required for each unit of data imported.  This is certainly not a
reasonable solution for even moderately sized datasets of just a few
hundred elements.

3) Look-ahead designs are used before features are implemented.
In this case, a very heavy-weight design effort attempts to
prognosticate the data design well ahead of code implementation.  This
actually can be done if innovation is buffered and features are queued
and agreed upon well in advance.  This is about as un-agile as software
development gets, however; and, as most software engineers know, it is
brutally difficult to design something to this level of detail so far
ahead of implementation (and indeed it almost always fails in my
experience).
 
4) Create a migration mechanism for each release.
This is typically what is done.  The reasons are simple, the source
application data formats are well known and the destination data formats
are well known.  The only thing needed is an intelligent mapping from
one to the other.  As Lee has pointed out, the problem with this is that
it places a burden on the user community to stay abreast of development
whenever a migration is required.

I am sure there are other analyses in the solution domain, but the above
is off the top of my head.  Although certainly not empirical, I
conjecture that an industry best practice is to provide 4) as a minimum,
and support a collection of widespread formats for 2).

Sorry for rambling on about this, but this has been a problem that has
been around for a long time in software engineering circles.  Comments
and criticisms welcome.

Thanks,
George


-----Original Message-----
From: mediawiki-l-bounces at Wikimedia.org
[mailto:mediawiki-l-bounces at Wikimedia.org] On Behalf Of Lee Daniel
Crocker
Sent: Monday, March 28, 2005 11:26 AM
To: Wikimedia developers
Cc: Mediawiki List
Subject: [Mediawiki-l] Re: [Wikitech-l] Long-term: Wiki import/export
format

On Mon, 2005-03-28 at 17:51 +0200, Lars Aronsson wrote:

> It sounds so easy.  But would you accept this procedure if it requires
> that Wikipedia is unavailable or read-only for one hour? for one day?
> for one week?  The conversion time should be a design requirement.
> ...
> Not converting the database is the fastest way to cut conversion time.
> Perhaps you can live with the legacy format?  Consider it.

A properly written export shouldn't need to have exclusive access to the
database at all.  The only thing that would need that is a complete
reinstall and import, which is only one application of the format and
should be needed very rarely (switching to a wholly new hardware or
software base, for example).  In those few cases (maybe once every few
years or so), Wikipedia being uneditable for a few days would not be
such a terrible thing--better than it being down completely because the
servers are overwhelmed. 

-- 
Lee Daniel Crocker <lee at piclab.com>  <http://www.piclab.com/lee/>
<http://creativecommons.org/licenses/publicdomain/>
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l at Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l





More information about the MediaWiki-l mailing list