Aaron, any chance you know the answer to this question? I have a vague memory that we talked about deleted pages and their text some time back. This data should live somewhere, right? given that deleted pages can be restored.Thanks,LeilaOn Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <leila@wikimedia.org> wrote:switching to the public list with Bob's permission.On Wed, Jun 24, 2015 at 1:58 PM, Robert West <robert.bob.west@gmail.com> wrote:_______________________________________________Hi everyone,
I'd like to find all enwiki articles that were ever marked with the {{hoax}} template. Pages with that template mostly end up being deleted, so they're not available in the public revision dumps.Hence my question:Is there a way of getting access to the full enwiki revision dump including all deleted pages?
I don't know yet which deleted articles I'm interested in, but will only know that after having done a pass over the full revision history.
I know that viewing deleted content is problematic (hence I'm sending this request to this internal research list), but I signed an NDA and have access to data on HDFS via stat1002, so there might be a way for me to access that data?I'm also aware of a list of archived hoaxes, but many shorter-lived hoaxes that got deleted fast are not included there.Thanks -- any pointers welcome!
Bob
--
Up for a little language game? -- http://www.unfun.me
Research-Internal mailing list
Research-Internal@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal