No way of searching the content of deleted pages. You can start with the `archive` table. You might find that you can identify edits that add 'hoax' templates by performing a regex match on `archive.ar_comment`.
-Aaron
On Thu, Jun 25, 2015 at 5:16 PM, Robert West robert.bob.west@gmail.com wrote:
Thanks, Aaron!
On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Ahh yes. Sorry for not responding sooner. The best way to get deleted article text is by getting the appropriate permission with a Wikimedia
user
account and then using that account to hit the web API. E.g.
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedre...
Looking at this page, it seems I need to supply the title, pageid, or revid of the deleted page (or page with deleted revisions) I'm interested in. However, I don't know yet what pages are relevant to me -- I only know this after having done a pass over the text of *all* deleted revisions. More concretely, my query is basically "all deleted revisions that contain the {{hoax}} template", but I don't know yet which deleted pages have such revisions.
Is there any way of doing this?
Thanks! Bob
The best way to get this permission is to contact Community Advocacy (pbeaudette@wikimedia.org and jalexander@wikimedia.org) to request that
they
supply you with the "wmf-research" right/group.
On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia leila@wikimedia.org wrote:
Aaron, any chance you know the answer to this question? I have a vague memory that we talked about deleted pages and their text some time back. This data should live somewhere, right? given that deleted pages can be restored.
Thanks, Leila
On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia leila@wikimedia.org wrote:
switching to the public list with Bob's permission.
On Wed, Jun 24, 2015 at 1:58 PM, Robert West <
robert.bob.west@gmail.com>
wrote:
Hi everyone,
I'd like to find all enwiki articles that were ever marked with the {{hoax}} template. Pages with that template mostly end up being
deleted, so
they're not available in the public revision dumps.
Hence my question: Is there a way of getting access to the full enwiki revision dump including all deleted pages? I don't know yet which deleted articles I'm interested in, but will
only
know that after having done a pass over the full revision history.
I know that viewing deleted content is problematic (hence I'm sending this request to this internal research list), but I signed an NDA and
have
access to data on HDFS via stat1002, so there might be a way for me to access that data?
I'm also aware of a list of archived hoaxes, but many shorter-lived hoaxes that got deleted fast are not included there.
Thanks -- any pointers welcome! Bob
-- Up for a little language game? -- http://www.unfun.me
Research-Internal mailing list Research-Internal@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/research-internal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Up for a little language game? -- http://www.unfun.me
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics