A couple of learnings about article deletions from the ACTRIAL analysis:
1. The logging table does not appear to contain correct page IDs of deleted pages until some time in 2014[1]. If you're looking at historical data and want to combine earlier deletions with other information, following Aaron's lead and using the archive table is probably the way to go. 2. The article namespace doesn't just contain "articles", it also contains redirects and disambiguation pages. Particularly redirects can affect measurements of number of pages deleted[2] because there have been instances of cleanup of substantial numbers of redirects. There's no information about redirect status in the archive table, as far as I know, but the log comment can be used to identify a substantial number of such deletions.
The code I used in our analysis of deletion reasons, which also covers the article namespace, is on GitHub: https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py
Footnotes:
1. https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation... 2. https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation...
Cheers, Morten
On Fri, 16 Aug 2019 at 05:31, Samuel Klein meta.sj@gmail.com wrote:
Since but 26122 has been fixed, any reason not to use the deletion log instead?
On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker aaron.halfaker@gmail.com wrote:
Here's a related bit of work: https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
In this research project, I used a mix of both the deletion log and the archive table to get a sense for when pages were being deleted.
Ultimately, I found that the easiest deletion event to operationalize was to look at the most recent ar_timestamp for a page in the archive table. I could only go back to 2008 with this metric because the archive table didn't exist before then.
The archive table is available in quarry. See https://quarry.wmflabs.org/query/38414 for an example query that gets
the
timestamp of an article's last revision.
The logging table is also in quarry. See https://quarry.wmflabs.org/query/38415 for an example query that gets deletion events.
On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang haifeng1@andrew.cmu.edu wrote:
Dear all,
Is there an easy way to get the number of articles deleted over time (e.g., month) in Wikipedia?
Can I use Quarry? What tables should I use?
Thanks,
Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Samuel Klein @metasj w:user:sj +1 617 529 4266 _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l