Wiki editing for databases

List overview All Threads
Download

newer

older

Re: [Wikitech-l] [Fwd: [WikiFR-l]...

Remove my name, Please.

Stan Shebs

28 May 2004 28 May '04

10:30 p.m.

Has anybody thought about wiki databases? By these I mean applying the wiki idea to data that is most naturally organized as a database rather than as plain text. In practice, I visualize the edit page as resembling a list of smaller text boxes, and then having one or more display formatters to build a readable page from the raw database entry. The closest I could see online is a TWiki plugin, but it didn't look like it had a special UI.

The reason I'm interested in the general concept is that it seems to have cropped up in several contexts:

1. I've been working on a scheme to handle WP's thousands of bibliographic references. The idea is to have a sort of combination of BibTex and the Image namespace; each referenced work gets an entry with a name, you fill in fields of the entry, then just mention the name as something like [[Ref:Arnett2001]] in articles and the software puts out formatted author/title/ISBN etc. However, the ability to format consistently depends on the reference's data being stored in database style, while still being available for editors to fix up.

2. Wiktionary. Dictionary entries are database entries, not free text. There should be a popup menu to add/choose the language for which you're writing the definition, a list of definition numbers that can be xref'ed properly, popups for parts of speech, and so on.

3. "Wikistamp". As part of my philatelic obsession, I've built up a database of worldwide postage stamp data. It now includes info on about 150,000 types - about half of all in existence - but the details are often incomplete, and wiki seems like a good way both to publish what exists and to enlist others in filling in, plus links to WP could have info on the stamps' subjects. However, I've only been able to do this singlehandedly because I have custom C code that does extensive validity checking - it knows that "ltolgrn" is a valid shorthand for the color "light olive grenn" but "grnollt" is not, that "1sh6p" is only valid for UK stamps before 1971, how to apply ranges of defaults, and so forth. A wikified version of this data would need to have the rules continue to be enforced by software.

These kinds of things seem like obvious uses for the application of the wiki approach to database content. Is this just something that people haven't thought of doing before? Are there fundamental obstacles to implementation? (I'm reading the MediaWiki sources now, haven't yet tried to hack on them.)

Stan

Show replies by date

Ray Saintonge

29 May 29 May

12:41 a.m.

Stan Shebs wrote:

...

Has anybody thought about wiki databases? By these I mean applying the wiki idea to data that is most naturally organized as a database rather than as plain text. In practice, I visualize the edit page as resembling a list of smaller text boxes, and then having one or more display formatters to build a readable page from the raw database entry. The closest I could see online is a TWiki plugin, but it didn't look like it had a special UI.

I'd have to see this in action before I pass judgement.

...

I've been working on a scheme to handle WP's thousands of

bibliographic references. The idea is to have a sort of combination of BibTex and the Image namespace; each referenced work gets an entry with a name, you fill in fields of the entry, then just mention the name as something like [[Ref:Arnett2001]] in articles and the software puts out formatted author/title/ISBN etc. However, the ability to format consistently depends on the reference's data being stored in database style, while still being available for editors to fix up.

The idea is interesting, and the bibliography on many articles is weak to say the least. Developing the data table will be the challenge. There may not be a conflict with Arnett2001 but [[Ref:Smith2001]] could apply to nearly any topic.

...

Wiktionary. Dictionary entries are database entries, not free

text. There should be a popup menu to add/choose the language for which you're writing the definition, a list of definition numbers that can be xref'ed properly, popups for parts of speech, and so on.

That's the simplistic view of dictionaries. The definition numbers should be soft numbered to allow for the insertion of additional definitions. Some of it may work. Understanding the definition of a word and all its many connotations is never just a black and white process. One of the Wiktionarians who felt that Shakespeare quotes were out of date for understanding the meanings of words has introduced a list of words appearing in the Sherlock Holmes stories. I can envision eventual links between Wiktionary and Wikisource (perhaps the other projects too) that could give examples of actual usage.

...

"Wikistamp". As part of my philatelic obsession, I've built up a

database of worldwide postage stamp data. It now includes info on about 150,000 types - about half of all in existence - but the details are often incomplete, and wiki seems like a good way both to publish what exists and to enlist others in filling in, plus links to WP could have info on the stamps' subjects. However, I've only been able to do this singlehandedly because I have custom C code that does extensive validity checking - it knows that "ltolgrn" is a valid shorthand for the color "light olive grenn" but "grnollt" is not, that "1sh6p" is only valid for UK stamps before 1971, how to apply ranges of defaults, and so forth. A wikified version of this data would need to have the rules continue to be enforced by software.

This one is a very interesting idea. I've thought about it before. I've thought about it in terms of a whole new cataloging system that could challenge the proprietary and closely protected system that Scott uses in North America. This would involve a three level system that would be suitable for beginning, intermediate and advanced collectors. A Wikibook stamp catalog maybe?

Stan Shebs

11:49 p.m.

Ray Saintonge wrote:

...

Stan Shebs wrote:

...

I've been working on a scheme to handle WP's thousands of

bibliographic references. The idea is to have a sort of combination of BibTex and the Image namespace; each referenced work gets an entry with a name, you fill in fields of the entry, then just mention the name as something like [[Ref:Arnett2001]] in articles and the software puts out formatted author/title/ISBN etc. However, the ability to format consistently depends on the reference's data being stored in database style, while still being available for editors to fix up.

The idea is interesting, and the bibliography on many articles is weak to say the least. Developing the data table will be the challenge. There may not be a conflict with Arnett2001 but [[Ref:Smith2001]] could apply to nearly any topic.

I figure on using something like the Bibtex fields, because they're pretty well-developed. Titling of references is indeed an issue; redirs and disambigs will likely be necessary.

...

...

Wiktionary. Dictionary entries are database entries, not free

text. There should be a popup menu to add/choose the language for which you're writing the definition, a list of definition numbers that can be xref'ed properly, popups for parts of speech, and so on.

That's the simplistic view of dictionaries. The definition numbers should be soft numbered to allow for the insertion of additional definitions. Some of it may work. Understanding the definition of a word and all its many connotations is never just a black and white process. One of the Wiktionarians who felt that Shakespeare quotes were out of date for understanding the meanings of words has introduced a list of words appearing in the Sherlock Holmes stories. I can envision eventual links between Wiktionary and Wikisource (perhaps the other projects too) that could give examples of actual usage.

I didn't mean to downplay the complexity, just wanted to observe that much of the Wiktionary gruntwork with sections and subsections amounts to an attempt to maintain a collection of data tables by hand. That gruntwork takes away from content creation cycles, just as the use of raw html instead of Wikisyntax would soak up a bunch of time unnecessarily. Database format would also allow us to "skin" Wiktionary entries into the more familiar form, where parts of speech are abbreviated, etc.

...

...

"Wikistamp". As part of my philatelic obsession, I've built up a

database of worldwide postage stamp data. It now includes info on about 150,000 types - about half of all in existence - but the details are often incomplete, and wiki seems like a good way both to publish what exists and to enlist others in filling in, plus links to WP could have info on the stamps' subjects. However, I've only been able to do this singlehandedly because I have custom C code that does extensive validity checking - it knows that "ltolgrn" is a valid shorthand for the color "light olive grenn" but "grnollt" is not, that "1sh6p" is only valid for UK stamps before 1971, how to apply ranges of defaults, and so forth. A wikified version of this data would need to have the rules continue to be enforced by software.

This one is a very interesting idea. I've thought about it before. I've thought about it in terms of a whole new cataloging system that could challenge the proprietary and closely protected system that Scott uses in North America. This would involve a three level system that would be suitable for beginning, intermediate and advanced collectors. A Wikibook stamp catalog maybe?

What I have is indeed a whole new catalog system, built "clean" from multiple sources, not using anybody else's numbering system. I could use it with the current MediaWiki to make a Wikibook stamp catalog about a day. (The long list of postal entities in WP was just a dump of all the countries in the database.) But stamp data is complex to manage - consider Armenian overprints or 1940s China - and a plain text dump would be corrupted within days of being subjected to the editing process. If you're interested, I can send you more detail; the only people to have looked at it so far are stamp dealers who are bemused by seeing a world catalog / want list that fits on a Palm...

Stan

Nikola Smolenski

6:41 a.m.

On Friday 28 May 2004 22:30, Stan Shebs wrote:

...

Has anybody thought about wiki databases? By these I mean applying the

Yes, me. I am still thinking would it be better to insert it to Wikipedia, or to write everything from scratch.

Stan Shebs

6:08 p.m.

Nikola Smolenski wrote:

...

On Friday 28 May 2004 22:30, Stan Shebs wrote:

...
Has anybody thought about wiki databases? By these I mean applying the

Yes, me. I am still thinking would it be better to insert it to Wikipedia, or to write everything from scratch.

Could you expand on this please? It's not clear to me what you're trying to say.

Stan

Nikola Smolenski

30 May 30 May

12:12 a.m.

On Saturday 29 May 2004 18:08, Stan Shebs wrote:

...

Nikola Smolenski wrote:

...
On Friday 28 May 2004 22:30, Stan Shebs wrote:

...
Has anybody thought about wiki databases? By these I mean applying the

Yes, me. I am still thinking would it be better to insert it to Wikipedia, or to write everything from scratch.

Could you expand on this please? It's not clear to me what you're trying to say.

I am thinking about wiki databases. Currently I am thinking whether it would be better to write a wiki database from scratch, or to expand Wikipedia's code with support for wiki databases.

Lars Aronsson

29 May 29 May

9:47 p.m.

Stan Shebs wrote:

...

Has anybody thought about wiki databases? By these I mean applying the wiki idea to data that is most naturally organized as a database rather than as plain text. In practice, I visualize the edit page as resembling a list of smaller text boxes, and then having one or more display formatters to build a readable page from the raw database entry.

Already today, a Wikipedia page can contain subsections that are edited independently. What you suggest is in a way an extension of this, with individual database fields being editable. Another way to solve the user interface is to mine (parse) database fields out of the plain text. This is already done for [[links]], but could be done for other kinds of syntax as well. I have a wiki with a reg.exp that parses dates written in plain text (January 17, 1948) without having to put them in brackets. The dates found by this reg.exp are put in a database table that can be used to extract timelines across articles.

Apart from plain text wiki articles and the database tables you are talking about, a third data structure is the spread sheet. Each wiki page could be like an Excel sheet with values and formulas in a grid. Wiki links could reference values from cells on the same page or other pages, e.g. [[C,5]] * 17 + [[budget:A,3]].

There are a lot of new ways to try. Too few are doing this.

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se/

7518

Age (days ago)

7519

Last active (days ago)

wikitech-l@lists.wikimedia.org

6 comments

4 participants

tags (0)

participants (4)

Lars Aronsson
Nikola Smolenski
Ray Saintonge
Stan Shebs