[Wikipedia-l] WikiDB (was Re: Tabular data in Wikipedia (Wikispecies -> Wikicommons))
Tim Starling
ts4294967296 at hotmail.com
Wed Aug 25 03:14:57 UTC 2004
Anthony DiPierro wrote:
> I think there is a growing sentiment that people do not want to fork
> Wikipedia to create Wikispecies. This makes a lot of sense. However, there
> is something important that is needed for Wikispecies which Wikipedia does
> not provide: efficient access of tabular data.
I was thinking along similar lines a couple of months ago, when
considering the needs of Wikiquote, and to a lesser extent Wiktionary.
The basic plan I came up with was:
* Each record is an arbitrary list of key-value pairs.
* User-designed edit forms and search forms constrain data entry to
particular fields
* User-designed display templates are used to format the data and
integrate it with other parts of the wiki
* Each field has an index, to allow fast searching and report generation.
* Indexes may be sparse, with most records not containing a given field.
I believe this can be efficiently handled by creating a separate table
for each key, on demand at runtime.
The technical design is subject to constraints such as:
* Robust against frivolous or malicious addition of new fields to a
small number of records
* Peer review and the reverting of any change must be easy
* Capable of efficiently storing a number of different schemas
(represented by different data entry, search and display forms) in the
one database.
Challenges which still have to be addressed include:
* Possibility of slow insert times
* Malicious or accidental destruction of fields which are commonly used
for indexed retrieval -- this may make records hard to find and revert.
Far be it from me to co-opt such a generic term as "WikiDB", I would
call this *a* WikiDB module for MediaWiki. The applications for such a
package would be extraordinarily broad. All that we need now is for me
(or someone else) to get motivated and write it.
-- Tim Starling
More information about the Wikipedia-l
mailing list