Jimmy Wales wrote:
From these pages, it should be possible to get a list
of all their
article titles.
These could be matched up against Wikipedia article titles.
Then we could ask the hypothetical: suppose Wikipedia just snagged the
same 55,000 topics as Columbia? How big would the resulting text be?
I'm taking it!
Just today I've downloaded the
en.wikipedia.org database dump. I don't
have a very fast machine, so it took some time to decompress, and it's
still busy importing it into the DB. Does anyone know approximately how
long that takes? (Since it doesn't show any progress meter or anything)
But once that is done, the Perl script will be easy.
Timwi