On Wed, Jul 29, 2015 at 4:21 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
That's right.  I use the API and the XML dumps if I need text content.  If you let me know about the type of analysis you are performing, I can advise about the best strategies. 

FYI, I've pinged James A. for appropriate access. Any other way you can help in the mean time is appreciated. Just want to make sure you don't spend time on restarting the request for access via James. :-)

Thanks,
L
 

On Wed, Jul 29, 2015 at 6:14 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
If we're talking Wikimedia Mediawiki instances, yes, the API is your
only way forward - for performance reasons the text content is stored
in a totally different set of servers that (to my knowledge) even paid
researchers don't get to mess around with. Alternately you could take
a look at https://dumps.wikimedia.org if slightly outdated information
is okay to you.

On 29 July 2015 at 18:58, Srijan Kumar <srijankedia@gmail.com> wrote:
> Hi!
>
> I want to read the text stored in the text tables[1], but the old_text field
> stores it as what seems to be the path to the blob. How can I get the
> content of the blob?
>
> Alternately, is there any other way to access all text content (including
> deleted content) without requiring global rights to the API?
>
> Thanks!
> Srijan
>
> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l