[Foundation-l] data centralization for the benefit of small (and also bigger) projects
Marcus Buck
me at marcusbuck.org
Mon Aug 23 15:55:34 UTC 2010
Several wikis have used bots to increase their article count in the
past. Examples are the Volapük Wikipedia (vo) with 118,000 articles of
which about 117,000 are bot-created stubs or the Aromanian Wikipedia
(roa-rup) with 61,000 articles at the moment and less than 10,000 before
the bot run.
Why do they use bots? Because they have a small userbase and want to
cover as much topics as possible with small effort. Most of the
languages that use bots are small languages without much written
literature especially when it comes to non-fiction reference literature.
There are no Aromanian encyclopedias, no or few reference books, no
databases etc. An Aromanian either has to learn and use foreign
languages or he will never be able to get informations about places in
China or in America. The bot operator tried to change this by creating
stubs about places in China, America and elsewhere. (Geographic objects
are the easiest method to cover large numbers of topics without much
effort.) But he did a horrible job with really bad and uninformative
articles. I assume the reason for the bad articles is not any bad intent
but just lack of technical skills to program a more useful bot.
The easiest reaction to this is to just let them do their thing and
don't care about it. The second easiest reaction is running a delete bot
and removing the bad articles because of their negative effects. But
both methods do not address the original motivation of the bot operator:
the wish to have information about a large range of entities available
in the wiki's language.
How can this be addressed?
We need a datawiki. That's not a new proposal, proposals for datawikis
have a long history. But there never was a specific reason not to
implement it, it was just that nobody cared about it so much that it was
implemented until now.
Here's my idea about it:
When a search does not yield any matching articles on the local wiki,
the software will look up the name in the central datawiki. If the
central datawiki contains a matching entry, this entry will be loaded.
It will contain an instance of a template filled with information about
the entity. E.g.:
{{Town
|name=Fab City
|country=Awesomia
|pop=89042
|lat=42.0
|lon=42.0
|elevation=12
|mayor=Adam Sweet
}}
The software will now look for a template called "Town" on the local
wiki. The local template [[Template:Town]] will for example look like this:
{| class="infobox"
|-
! Name
|
{{{name|}}}
|-
! Country
|
{{{country|}}}
|-
! Population
|
{{{pop|}}}
|-
! Mayor
|
{{{mayor|}}}
|-
! Elevation
|
{{{elevation|}}} above sea level
|-
! Geographic position
|
{{latlon| {{{lat|}}} | {{{lon|}}} }}
|}
'''{{{name|}}}''' is a place in [[{{countryname| {{{country|}}} }}]] with a population of {{{pop|}}}.
[[Category:{{countryname| {{{country|}}} }}]]
[[Category:Towns]]
Of course this template will be localized in the language of the local
wiki. This information will now be shown to the user who entered the
name in the search. (The above examples are just, well, examples. Real
entries would most likely contain much more data.)
The datawiki can be filled with information about any entity that has a
certain set of recurring features (almost anything that has a infobox on
Wikipedia), especially geographic objects. These objects also have the
advantage that their names usually are international (at least among
Latin script languages).
The advantages are:
- when the central datawiki is filled with info (most of which can be
bot-extracted from existing Wikipedia infoboxes), every Wikipedia - how
small the userbase may be - has instant access to information about
hundreds of thousands or millions of objects, they just need to
implement some infobox templates
- this solution also erases problems with outdated information in
infoboxes (a problem even en.wp is suffering from). The data only needs
to be updated in one single place instead of every single Wikipedia
separately
With the work done by Nikola Smolenski on the Interlanguage extension
(<http://www.mediawiki.org/wiki/Extension:Interlanguage>) it shouldn't
be too hard to implement.
In view of the potential usefulness I cannot think of any argument that
speaks against this in general. The prospect of providing at least basic
information about millions of objects in all the different languages
seems really great to me.
Many native speakers of smaller languages use foreign language wikis as
their default wiki because the chance that their native wiki has an
article on the topic is small. If the number of topics where a search on
the native wiki yields results raises from "some thousands" to
"millions", there is a chance that users will finally accept their
native wiki as their default wiki. The entries will be basic, but if
interwikis (of existing articles not generated from the datawiki) will
be included in the info obtained from the datawiki, the more extensive
data is just one click away, while an unsuccessful search on the local
wiki (as you will get it as of now) is a dead end.
It certainly is worth putting some resources into it.
What do you think?
Marcus Buck
User:Slomox
More information about the foundation-l
mailing list