On 22/12/05, Anthony DiPierro <wikilegal(a)inbox.org> wrote:
On 12/22/05, David Gerard
<fun(a)thingy.apana.org.au> wrote:
Anthony DiPierro wrote:
You'd have to spend a whole lot of money to
get human editors to pick
the "useful articles". It might pay off in the really long term, but
it'd require a huge investment. And due to the GFDL some other
company could just come along and take the results of that huge
investment and drive you out of business anyway. I'm not at all
surprised no one is doing it.
They certainly didn't for de:. Oh, wait ...
- d.
You're referring to the producers of the DVD, I assume. I don't know
a whole lot about that project but I assumed they used some automated
method to select articles for inclusion (there was a mention of only
using articles which were last edited by a certain selection of logged
in users), not that they had someone go through each one by hand.
First two versions were hand-processed, last version was automated.
First edition:
"To produce the CD, a dump of the live Wikipedia had been copied to a
separate server, where a team of seventy Wikipedians vetted the
material, deleting nonsense articles and obvious copyright violations.
Questionable articles were added to a special list, to be reviewed
later. The final CD contained 132,000 articles and 1,200 images."
Second edition:
"The vetting process was similar to the one for the CD described above
and took place on a separate MediaWiki server. The process took about
a week and involved 33 Wikipedians, communicating on IRC. To prevent
duplication, editors would protect every article that they had
reviewed; links to protected articles were shown in green. List of
potential spam or vandalism had been produced ahead of time with SQL
queries. Unacceptable articles were simply deleted on the spot. The
final DVD contained about 205,000 articles, with every article linking
to a list of contributors."
Third edition:
The vetting process for this version was different and did not involve
human intervention. A "white list" of trusted Wikipedians was
assembled, the last 10 days of every article's history were examined,
and the last version edited by a white-listed Wikipedian was chosen
for the DVD. If no such version existed, the last version older than
10 days was used. Articles nominated for cleanup or deletion were not
used.
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk