Hi, I've been thinking about the early history of Wikipedia and about what which sort of topics got written early on. I'm wondering if there is an easy way to find the first N wikipedia topics (where N is say 100,000) in the order they were created.
On 02/22/2011 04:15 PM, Paul Houle wrote:
Hi, I've been thinking about the early history of Wikipedia and
about what which sort of topics got written early on. I'm wondering if there is an easy way to find the first N wikipedia topics (where N is say 100,000) in the order they were created.
Start reading from the beginning of the XML dump, http://dumps.wikimedia.org/backup-index.html
The order articles appear there is mostly the creation date, except for the first year (2001), where it is a little more random.
Paul Houle wrote:
Hi, I've been thinking about the early history of Wikipedia and
about what which sort of topics got written early on. I'm wondering if there is an easy way to find the first N wikipedia topics (where N is say 100,000) in the order they were created.
For which were born in phase3, it's quite easy as it will match with lower revision ids. Old projects which came from usemodwiki had an import of the then-current data, and history was imported later. So you have no other way than looking at edit time.
For enwiki, you may prefer looking at the old backup recently discovered http://lists.wikimedia.org/pipermail/foundation-l/2010-December/063088.html
wikitech-l@lists.wikimedia.org