A couple of learnings about article deletions from the ACTRIAL analysis:
1. The logging table does not appear to contain correct page IDs of
deleted pages until some time in 2014[1]. If you're looking at historical
data and want to combine earlier deletions with other information,
following Aaron's lead and using the archive table is probably the way to
go.
2. The article namespace doesn't just contain "articles", it also
contains redirects and disambiguation pages. Particularly redirects can
affect measurements of number of pages deleted[2] because there have been
instances of cleanup of substantial numbers of redirects. There's no
information about redirect status in the archive table, as far as I know,
but the log comment can be used to identify a substantial number of such
deletions.
The code I used in our analysis of deletion reasons, which also covers the
article namespace, is on GitHub:
https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py
Footnotes:
1.
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creatio…
2.
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creatio…
Cheers,
Morten
On Fri, 16 Aug 2019 at 05:31, Samuel Klein <meta.sj(a)gmail.com> wrote:
Since but 26122 has been fixed, any reason not to use
the deletion log
instead?
On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker <aaron.halfaker(a)gmail.com>
wrote:
Here's a related bit of work:
https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
In this research project, I used a mix of both the deletion log and the
archive table to get a sense for when pages were being deleted.
Ultimately, I found that the easiest deletion event to operationalize was
to look at the most recent ar_timestamp for a page in the archive table.
I could only go back to 2008 with this metric because the archive table
didn't exist before then.
The archive table is available in quarry. See
https://quarry.wmflabs.org/query/38414 for an example query that gets
the
timestamp of an article's last revision.
The logging table is also in quarry. See
https://quarry.wmflabs.org/query/38415 for an example query that gets
deletion events.
On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <haifeng1(a)andrew.cmu.edu>
wrote:
Dear all,
Is there an easy way to get the number of articles deleted over time
(e.g., month) in Wikipedia?
Can I use Quarry? What tables should I use?
Thanks,
Haifeng Zhang
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l