Jimbo wrote:
What I'd like to find out is that we have a realistic chance of having a Wikipedia 1.0 release 1 year from now that rivals Britannica. But there's no need to hurry, if it will take 2 years or 5 years, that's how long it will take.
Hm. Is there a machine readable list of all 75,000 of Britannica's articles somewhere? If we had such a list then we could perform this analysis; cross-check all article titles in en.Wikipedia with Britannica articles. Then exclude all Wikipedia articles that are over some set limit in size (2000 bytes, for example). Then everything else will either be mismatched titles (which require redirects to fix), articles we have but are less than 2000 bytes, and articles we lack but Britannica has. If we really wanted to go crazy we could compare the size of every matched title and aim to surpass or match each Britannica article in size, but that would require scriptable access to all their articles which I'm not sure would be possible even if we had a Britannica CD.
Then we can have a queue of many thousands of priority articles to work on.
But size is only one part of the puzzle; we also need an approval system to measure quality. We can use the list of 75,000 Britannica articles as a priority list.
-- Daniel Mayer (aka mav)
Daniel Mayer wrote:
Hm. Is there a machine readable list of all 75,000 of Britannica's articles somewhere?
Presumably on the CD version and the online version, and I would imagine they could be matched up to the print version pretty well.
If we had such a list then we could perform this analysis;
(details of analysis snipped)
Yes, that's right. If we had computerized data on what Britannica contains, we could use it to guide our work.
Then we can have a queue of many thousands of priority articles to work on.
But size is only one part of the puzzle; we also need an approval system to measure quality. We can use the list of 75,000 Britannica articles as a priority list.
That's right. I think that's a good approach.
--Jimbo
wikipedia-l@lists.wikimedia.org