Ahh yes.  Sorry for not responding sooner.  The best way to get deleted article text is by getting the appropriate permission with a Wikimedia user account and then using that account to hit the web API.  E.g. https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions

The best way to get this permission is to contact Community Advocacy (pbeaudette@wikimedia.org and jalexander@wikimedia.org) to request that they supply you with the "wmf-research" right/group.  

On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <leila@wikimedia.org> wrote:
Aaron, any chance you know the answer to this question? I have a vague memory that we talked about deleted pages and their text some time back. This data should live somewhere, right? given that deleted pages can be restored.

Thanks,
Leila

On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <leila@wikimedia.org> wrote:
switching to the public list with Bob's permission.

On Wed, Jun 24, 2015 at 1:58 PM, Robert West <robert.bob.west@gmail.com> wrote:
Hi everyone,

I'd like to find all enwiki articles that were ever marked with the {{hoax}} template. Pages with that template mostly end up being deleted, so they're not available in the public revision dumps.

Hence my question:
Is there a way of getting access to the full enwiki revision dump including all deleted pages?
I don't know yet which deleted articles I'm interested in, but will only know that after having done a pass over the full revision history.

I know that viewing deleted content is problematic (hence I'm sending this request to this internal research list), but I signed an NDA and have access to data on HDFS via stat1002, so there might be a way for me to access that data?

I'm also aware of a list of archived hoaxes, but many shorter-lived hoaxes that got deleted fast are not included there.

Thanks -- any pointers welcome!
Bob


--
Up for a little language game? -- http://www.unfun.me

_______________________________________________
Research-Internal mailing list
Research-Internal@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal