On Wed, Jul 29, 2015 at 4:21 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
That's right. I use the API and the XML dumps if I need text content. If you let me know about the type of analysis you are performing, I can advise about the best strategies.
FYI, I've pinged James A. for appropriate access. Any other way you can help in the mean time is appreciated. Just want to make sure you don't spend time on restarting the request for access via James. :-)
Thanks, L
On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes okeyes@wikimedia.org wrote:
If we're talking Wikimedia Mediawiki instances, yes, the API is your only way forward - for performance reasons the text content is stored in a totally different set of servers that (to my knowledge) even paid researchers don't get to mess around with. Alternately you could take a look at https://dumps.wikimedia.org if slightly outdated information is okay to you.
On 29 July 2015 at 18:58, Srijan Kumar srijankedia@gmail.com wrote:
Hi!
I want to read the text stored in the text tables[1], but the old_text
field
stores it as what seems to be the path to the blob. How can I get the content of the blob?
Alternately, is there any other way to access all text content
(including
deleted content) without requiring global rights to the API?
Thanks! Srijan
[1] https://www.mediawiki.org/wiki/Manual:Text_table
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Oliver Keyes Research Analyst Wikimedia Foundation
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l