Thanks Oliver and Aaron. I want to look at the deleted revisions as described in the project meta page [1], which is not there in the XML dump. I know the revisions that I want to get the content for. What would you advice? Happy to take this off the list if it gets too specific.
Thanks! Srijan
[1] https://meta.wikimedia.org/wiki/Research:Understanding_hoax_articles_on_Engl...
---------- Forwarded message ---------- From: Aaron Halfaker ahalfaker@wikimedia.org Date: Wed, Jul 29, 2015 at 4:21 PM Subject: Re: [Wiki-research-l] How to read blobs in text table? To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org
That's right. I use the API and the XML dumps if I need text content. If you let me know about the type of analysis you are performing, I can advise about the best strategies.
On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes okeyes@wikimedia.org wrote:
If we're talking Wikimedia Mediawiki instances, yes, the API is your only way forward - for performance reasons the text content is stored in a totally different set of servers that (to my knowledge) even paid researchers don't get to mess around with. Alternately you could take a look at https://dumps.wikimedia.org if slightly outdated information is okay to you.
On 29 July 2015 at 18:58, Srijan Kumar srijankedia@gmail.com wrote:
Hi!
I want to read the text stored in the text tables[1], but the old_text
field
stores it as what seems to be the path to the blob. How can I get the content of the blob?
Alternately, is there any other way to access all text content
(including
deleted content) without requiring global rights to the API?
Thanks! Srijan
[1] https://www.mediawiki.org/wiki/Manual:Text_table
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org