Thanks for the reactions.
This is getting a bit editorial for this mailing list and perhaps the editorial content bit should move to the project pages. Technically we have is a decent script which eats a list of archived versions of articles and puts out a cleaned static tree, obeying manually alterable delete instructions. It is very easy to restore content or run this on another list of articles if you have one.
But anyway Matthew makes a fair point, I should have thought through exactly our process. Please bear in mind this is motley crew of volunteer stuff not professional editors. The process was chuck everything into a funnel, get a volunteer to read it and then throw irrelevant stuff out; then sort by school topic then go get other articles to fill holes in curriculum. However very US-centric content and fringe content got thrown out (see list at http://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_CD_Selection&a... for discarded articles), including things like baseball players. I hadn't really noticed many FAs going from the 800 articles I did personally but overall no doubt this included FA/GA stuff, plus a tonne of Pokemon characters. In defence the current collection of FA and GAs is very skewed.
On the other two questions 1) "how many good articles has Wikipedia" I concede I could be completely wrong. There are a vast number of key school topics (such as classic novels) where the content is hugely disappointing and we kept hitting poor quality articles when trying to fill holes. We also had a quick go at comparing with EB articles and were saddened. But Walkerma thinks 50,000 good articles could be found and he could be right, it could be far more. 2) Censorship: lets not get this out of proportion. There were a small number of articles where we thought content might cause issues. We could have left out all these articles with no sweat; no one would notice. There are plenty of places a 15 year old can go for things not in this collection. There is plenty of content which an 8 year old won't understand. We have taken out a small amount of content to allow the appeal to widen downwards in schools. You go get your list of archived articles chosen your way and we will knock off a static copy for your choice, with no section deletes: no problem. 3) I am happy to be guided on citations but part of the problem is that the formatting is so variable in Wikipedia itself we were struggling with it. WP chooses to nofollow citations so I guess we all agree this part of content is unreliable? Anyway its done so many different ways we thought it needed to come out.
BozMo ============ Matthew Brown wrote:
On 5/22/07, Andrew Cates andrew@catesfamily.org.uk wrote:
It contains all Good & Featured content (except adult content).
Not true, unless 'adult content' means not only content deemed unsuitable for children but also content deemed not interesting or using some other mechanism. Since I only could be bothered to go through the FA process once, of course I looked to see if "my" FA was included, and it wasn't.
Which is no problem, it's on a nerdy topic of little general interest, but this seemed to diverge from what you said, so I thought I'd bring it up before other people ;)
-Matt