That's right. I use the API and the XML dumps if I need text content. If
you let me know about the type of analysis you are performing, I can advise
about the best strategies.
On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
If we're talking Wikimedia Mediawiki instances,
yes, the API is your
only way forward - for performance reasons the text content is stored
in a totally different set of servers that (to my knowledge) even paid
researchers don't get to mess around with. Alternately you could take
a look at
https://dumps.wikimedia.org if slightly outdated information
is okay to you.
On 29 July 2015 at 18:58, Srijan Kumar <srijankedia(a)gmail.com> wrote:
Hi!
I want to read the text stored in the text tables[1], but the old_text
field
stores it as what seems to be the path to the
blob. How can I get the
content of the blob?
Alternately, is there any other way to access all text content (including
deleted content) without requiring global rights to the API?
Thanks!
Srijan
[1]
https://www.mediawiki.org/wiki/Manual:Text_table
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l