No way of searching the content of deleted pages.  You can start with the `archive` table.  You might find that you can identify edits that add 'hoax' templates by performing a regex match on `archive.ar_comment`.

-Aaron

On Thu, Jun 25, 2015 at 5:16 PM, Robert West <robert.bob.west@gmail.com> wrote:
Thanks, Aaron!

On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
> Ahh yes.  Sorry for not responding sooner.  The best way to get deleted
> article text is by getting the appropriate permission with a Wikimedia user
> account and then using that account to hit the web API.  E.g.
> https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions

Looking at this page, it seems I need to supply the title, pageid, or
revid of the deleted page (or page with deleted revisions) I'm
interested in.
However, I don't know yet what pages are relevant to me -- I only know
this after having done a pass over the text of *all* deleted
revisions.
More concretely, my query is basically "all deleted revisions that
contain the {{hoax}} template", but I don't know yet which deleted
pages have such revisions.

Is there any way of doing this?

Thanks!
Bob

> The best way to get this permission is to contact Community Advocacy
> (pbeaudette@wikimedia.org and jalexander@wikimedia.org) to request that they
> supply you with the "wmf-research" right/group.
>
> On Thu, Jun 25, 2015 at 4:15 PM, Leila Zia <leila@wikimedia.org> wrote:
>>
>> Aaron, any chance you know the answer to this question? I have a vague
>> memory that we talked about deleted pages and their text some time back.
>> This data should live somewhere, right? given that deleted pages can be
>> restored.
>>
>> Thanks,
>> Leila
>>
>> On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia <leila@wikimedia.org> wrote:
>>>
>>> switching to the public list with Bob's permission.
>>>
>>> On Wed, Jun 24, 2015 at 1:58 PM, Robert West <robert.bob.west@gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'd like to find all enwiki articles that were ever marked with the
>>>> {{hoax}} template. Pages with that template mostly end up being deleted, so
>>>> they're not available in the public revision dumps.
>>>>
>>>> Hence my question:
>>>> Is there a way of getting access to the full enwiki revision dump
>>>> including all deleted pages?
>>>> I don't know yet which deleted articles I'm interested in, but will only
>>>> know that after having done a pass over the full revision history.
>>>>
>>>> I know that viewing deleted content is problematic (hence I'm sending
>>>> this request to this internal research list), but I signed an NDA and have
>>>> access to data on HDFS via stat1002, so there might be a way for me to
>>>> access that data?
>>>>
>>>> I'm also aware of a list of archived hoaxes, but many shorter-lived
>>>> hoaxes that got deleted fast are not included there.
>>>>
>>>> Thanks -- any pointers welcome!
>>>> Bob
>>>>
>>>>
>>>> --
>>>> Up for a little language game? -- http://www.unfun.me
>>>>
>>>> _______________________________________________
>>>> Research-Internal mailing list
>>>> Research-Internal@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/research-internal
>>>>
>>>
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Up for a little language game? -- http://www.unfun.me

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics