Neta,
There are two ways to get revision text.
1. Query the API. See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions Take special note of the "content" value of the rvprop parameter. This strategy is good when you want to process only few revisions.
2. Process the XML dumps. http://dumps.wikimedia.org/backup-index.html If you are working in python, I have some nice utilities for processing the XML dump files. See http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump This strategy is good when you want to process the entire history of a wiki.
-Aaron
On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh neta.livneh@gmail.com wrote:
Hi,
I'm trying to reach the text table (for read only purposes), but it seems that I it is not available to me (It is not in the table when I run SHOW TABLES).
Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.
Thanks, Neta
On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh neta.livneh@gmail.com wrote:
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker < ahalfaker@wikimedia.org> wrote:
> Here's the instructions that Christian gave with some screenshots > and discussion: > https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab... > > If you're just looking to run a few queries, you might consider > http://quarry.wmflabs.org which requires no shell access -- just a > Wikimedia sites account. > > -Aaron > > On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < > christian@quelltextlich.at> wrote: > >> Hi Neta, >> >> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >> > For my project, we will need to sql queries on current wikipedia >> data >> > (mostly revision history table). >> > >> > I already have a Gerrit account. Can I get SSH access for running >> such >> > queries? >> >> It sounds like the redacted labs databases would nicely fit your use >> case. The easiest way to get access there is to apply for Tool Labs >> [1]. >> >> To get access, please file a request through >> >> >> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request >> >> (Many parts around the WMF are currently getting migrated to >> phabricator.wikimedia.org, so if someone knows a phabricator >> procedure >> for that please chime in!) >> >> >> Once you've got Tool Labs [1] access you can ssh to >> >> tools-login.wmflabs.org >> >> and running >> >> sql enwiki >> >> on that host connects you to labsdb's enwiki database and you can >> run >> your queries there (similar for other wikis). >> >> Have fun, >> Christian >> >> >> >> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >> has more information and links about Tool Labs. >> >> >> -- >> ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- >> Companies' registry: 360296y in Linz >> Christian Aistleitner >> Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at >> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >> Fax: +43 7946 / 20 5 81 >> Homepage: http://quelltextlich.at/ >> --------------------------------------------------------------- >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics