Thanks, Aaron!
On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
Ahh yes. Sorry for not responding sooner. The best
way to get deleted
article text is by getting the appropriate permission with a Wikimedia user
account and then using that account to hit the web API. E.g.
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedr…
Looking at this page, it seems I need to supply the title, pageid, or
revid of the deleted page (or page with deleted revisions) I'm
interested in.
However, I don't know yet what pages are relevant to me -- I only know
this after having done a pass over the text of *all* deleted
revisions.
More concretely, my query is basically "all deleted revisions that
contain the {{hoax}} template", but I don't know yet which deleted
pages have such revisions.
Is there any way of doing this?
Thanks!
Bob
The best way to get this permission is to contact
Community Advocacy
(pbeaudette(a)wikimedia.org and jalexander(a)wikimedia.org) to request that they
supply you with the "wmf-research" right/group.
On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <leila(a)wikimedia.org> wrote:
Aaron, any chance you know the answer to this question? I have a vague
memory that we talked about deleted pages and their text some time back.
This data should live somewhere, right? given that deleted pages can be
restored.
Thanks,
Leila
On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <leila(a)wikimedia.org> wrote:
switching to the public list with Bob's permission.
On Wed, Jun 24, 2015 at 1:58 PM, Robert West <robert.bob.west(a)gmail.com>
wrote:
Hi everyone,
I'd like to find all enwiki articles that were ever marked with the
{{hoax}} template. Pages with that template mostly end up being deleted, so
they're not available in the public revision dumps.
Hence my question:
Is there a way of getting access to the full enwiki revision dump
including all deleted pages?
I don't know yet which deleted articles I'm interested in, but will only
know that after having done a pass over the full revision history.
I know that viewing deleted content is problematic (hence I'm sending
this request to this internal research list), but I signed an NDA and have
access to data on HDFS via stat1002, so there might be a way for me to
access that data?
I'm also aware of a list of archived hoaxes, but many shorter-lived
hoaxes that got deleted fast are not included there.
Thanks -- any pointers welcome!
Bob
--
Up for a little language game? --
http://www.unfun.me
_______________________________________________
Research-Internal mailing list
Research-Internal(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Up for a little language game? --
http://www.unfun.me