That's right. I use the API and the XML dumps if I need text content. If you let me know about the type of analysis you are performing, I can advise about the best strategies.
On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes okeyes@wikimedia.org wrote:
If we're talking Wikimedia Mediawiki instances, yes, the API is your only way forward - for performance reasons the text content is stored in a totally different set of servers that (to my knowledge) even paid researchers don't get to mess around with. Alternately you could take a look at https://dumps.wikimedia.org if slightly outdated information is okay to you.
On 29 July 2015 at 18:58, Srijan Kumar srijankedia@gmail.com wrote:
Hi!
I want to read the text stored in the text tables[1], but the old_text
field
stores it as what seems to be the path to the blob. How can I get the content of the blob?
Alternately, is there any other way to access all text content (including deleted content) without requiring global rights to the API?
Thanks! Srijan
[1] https://www.mediawiki.org/wiki/Manual:Text_table
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l