Neta,

There are two ways to get revision text.  

1. Query the API.  See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions  Take special note of the "content" value of the rvprop parameter.  This strategy is good when you want to process only few revisions.

2. Process the XML dumps.  http://dumps.wikimedia.org/backup-index.html  If you are working in python, I have some nice utilities for processing the XML dump files.  See http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump  This strategy is good when you want to process the entire history of a wiki. 

-Aaron

On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh <neta.livneh@gmail.com> wrote:
Hi, 

I'm trying to reach the text table (for read only purposes), but it seems that I it is not available to me (It is not in the table when I run SHOW TABLES).

Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.

Thanks,
Neta






On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh <neta.livneh@gmail.com> wrote:
yeah, I do have access - Thanks!
I already used ssh, and also used the quarry tool for smaller quick queries. 

Cheers,
Neta


On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh <neta.livneh@gmail.com> wrote:


On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu <dandreescu@wikimedia.org> wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.


On Wednesday, December 24, 2014, Leila Zia <leila@wikimedia.org> wrote:
Hi Neta,

On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <neta.livneh@gmail.com> wrote:

Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.

Feel free to keep me in the loop for the latter.

Best,
Leila
 


On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Labs

If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.  

-Aaron

On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Neta,

On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
> For my project, we will need to sql queries on current wikipedia data
> (mostly revision history table).
>
> I already have a Gerrit account. Can I get SSH access for running such
> queries?

It sounds like the redacted labs databases would nicely fit your use
case. The easiest way to get access there is to apply for Tool Labs [1].

To get access, please file a request through

  https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request

(Many parts around the WMF are currently getting migrated to
phabricator.wikimedia.org, so if someone knows a phabricator procedure
for that please chime in!)


Once you've got Tool Labs [1] access you can ssh to

  tools-login.wmflabs.org

and running

  sql enwiki

on that host connects you to labsdb's enwiki database and you can run
your queries there (similar for other wikis).

Have fun,
Christian



[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs
has more information and links about Tool Labs.


--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics





_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics