Thanks for the reactions.
This is getting a bit editorial for this mailing list and perhaps the
editorial content bit should move to the project pages. Technically we
have is a decent script which eats a list of archived versions of
articles and puts out a cleaned static tree, obeying manually alterable
delete instructions. It is very easy to restore content or run this on
another list of articles if you have one.
But anyway Matthew makes a fair point, I should have thought through
exactly our process. Please bear in mind this is motley crew of
volunteer stuff not professional editors. The process was chuck
everything into a funnel, get a volunteer to read it and then throw
irrelevant stuff out; then sort by school topic then go get other
articles to fill holes in curriculum. However very US-centric content
and fringe content got thrown out (see list at
http://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_CD_Selection&…
for discarded articles), including things like baseball players. I
hadn't really noticed many FAs going from the 800 articles I did
personally but overall no doubt this included FA/GA stuff, plus a tonne
of Pokemon characters. In defence the current collection of FA and GAs
is very skewed.
On the other two questions
1) "how many good articles has Wikipedia" I concede I could be
completely wrong. There are a vast number of key school topics (such as
classic novels) where the content is hugely disappointing and we kept
hitting poor quality articles when trying to fill holes. We also had a
quick go at comparing with EB articles and were saddened. But Walkerma
thinks 50,000 good articles could be found and he could be right, it
could be far more.
2) Censorship: lets not get this out of proportion. There were a small
number of articles where we thought content might cause issues. We could
have left out all these articles with no sweat; no one would notice.
There are plenty of places a 15 year old can go for things not in this
collection. There is plenty of content which an 8 year old won't
understand. We have taken out a small amount of content to allow the
appeal to widen downwards in schools. You go get your list of archived
articles chosen your way and we will knock off a static copy for your
choice, with no section deletes: no problem.
3) I am happy to be guided on citations but part of the problem is that
the formatting is so variable in Wikipedia itself we were struggling
with it. WP chooses to nofollow citations so I guess we all agree this
part of content is unreliable? Anyway its done so many different ways we
thought it needed to come out.
BozMo
============
Matthew Brown wrote:
On 5/22/07, Andrew Cates
<andrew(a)catesfamily.org.uk> wrote:
It contains all Good & Featured content
(except adult content).
Not true, unless 'adult content' means not only content deemed
unsuitable for children but also content deemed not interesting or
using some other mechanism. Since I only could be bothered to go
through the FA process once, of course I looked to see if "my" FA was
included, and it wasn't.
Which is no problem, it's on a nerdy topic of little general interest,
but this seemed to diverge from what you said, so I thought I'd bring
it up before other people ;)
-Matt