Jimmy Wales wrote:
From these pages, it should be possible to get a list of all their article titles.
These could be matched up against Wikipedia article titles.
Then we could ask the hypothetical: suppose Wikipedia just snagged the same 55,000 topics as Columbia? How big would the resulting text be?
I'm taking it!
Just today I've downloaded the en.wikipedia.org database dump. I don't have a very fast machine, so it took some time to decompress, and it's still busy importing it into the DB. Does anyone know approximately how long that takes? (Since it doesn't show any progress meter or anything)
But once that is done, the Perl script will be easy.
Timwi
wikitech-l@lists.wikimedia.org