[Wikipedia-l] WikiDB (was Re: Tabular data in Wikipedia (Wikispecies -> Wikicommons))

Tim Starling ts4294967296 at hotmail.com
Wed Aug 25 03:14:57 UTC 2004

Anthony DiPierro wrote:
> I think there is a growing sentiment that people do not want to fork
> Wikipedia to create Wikispecies.  This makes a lot of sense.  However, there
> is something important that is needed for Wikispecies which Wikipedia does
> not provide: efficient access of tabular data.

I was thinking along similar lines a couple of months ago, when 
considering the needs of Wikiquote, and to a lesser extent Wiktionary. 
The basic plan I came up with was:

* Each record is an arbitrary list of key-value pairs.
* User-designed edit forms and search forms constrain data entry to 
particular fields
* User-designed display templates are used to format the data and 
integrate it with other parts of the wiki
* Each field has an index, to allow fast searching and report generation.
* Indexes may be sparse, with most records not containing a given field. 
I believe this can be efficiently handled by creating a separate table 
for each key, on demand at runtime.

The technical design is subject to constraints such as:
* Robust against frivolous or malicious addition of new fields to a 
small number of records
* Peer review and the reverting of any change must be easy
* Capable of efficiently storing a number of different schemas 
(represented by different data entry, search and display forms) in the 
one database.

Challenges which still have to be addressed include:
* Possibility of slow insert times
* Malicious or accidental destruction of fields which are commonly used 
for indexed retrieval -- this may make records hard to find and revert.

Far be it from me to co-opt such a generic term as "WikiDB", I would 
call this *a* WikiDB module for MediaWiki. The applications for such a 
package would be extraordinarily broad. All that we need now is for me 
(or someone else) to get motivated and write it.

-- Tim Starling

