Hi everyone,
My name is Neta Livneh and I'm a FOSS OPW Intern, working with the language engineering team on estimating how many translated pages there are in different wikipedias. For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
Thanks, Neta
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks guys, I filed a request.
I think I will be working constantly with the database so it will require shell access.
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction. I will probably check in again in a month or so when we will have better understanding regarding the direction my project is headed. Until then, here is a link to the project in phabricator: article-translation-metrics https://phabricator.wikimedia.org/tag/article-translation-metrics/.
Merry Christmas,
Neta neta.livneh@gmail.com
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Access to production cluster, especially to databases that contain all kinds of user information, is an extremely big deal, so it's unlikely that this is going to happen, especially for a small one-time project.
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Thanks guys, I filed a request.
I think I will be working constantly with the database so it will require shell access.
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction. I will probably check in again in a month or so when we will have better understanding regarding the direction my project is headed. Until then, here is a link to the project in phabricator: article-translation-metrics https://phabricator.wikimedia.org/tag/article-translation-metrics/.
Merry Christmas,
Neta neta.livneh@gmail.com
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Max,
I imagine Neta was asking for analytics-store access, so not /really/ the production cluster ;p. But, yes: generally-speaking "I need access to some data for...something", is not likely to lead to access to dbs containing sensitive information.
On 24 December 2014 at 10:28, Max Semenik maxsem.wiki@gmail.com wrote:
Access to production cluster, especially to databases that contain all kinds of user information, is an extremely big deal, so it's unlikely that this is going to happen, especially for a small one-time project.
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Thanks guys, I filed a request.
I think I will be working constantly with the database so it will require shell access.
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction. I will probably check in again in a month or so when we will have better understanding regarding the direction my project is headed. Until then, here is a link to the project in phabricator: article-translation-metrics https://phabricator.wikimedia.org/tag/article-translation-metrics/.
Merry Christmas,
Neta neta.livneh@gmail.com
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
As Oliver assumed, I will be using the publicly accessed data that can be found in the dumps, so I was asking for analytics-store access. Oliver, you are right, I should have elaborated my intentions in my original mail. I asked for access because we will work with the revision histories tables of the large wikipedias, and it will be impossible for me to work on it locally.
BTW, I'm not sure it is such a small project :) Neta
On Wed, Dec 24, 2014 at 5:32 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Max,
I imagine Neta was asking for analytics-store access, so not /really/ the production cluster ;p. But, yes: generally-speaking "I need access to some data for...something", is not likely to lead to access to dbs containing sensitive information.
On 24 December 2014 at 10:28, Max Semenik maxsem.wiki@gmail.com wrote:
Access to production cluster, especially to databases that contain all kinds of user information, is an extremely big deal, so it's unlikely that this is going to happen, especially for a small one-time project.
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Thanks guys, I filed a request.
I think I will be working constantly with the database so it will require shell access.
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction. I will probably check in again in a month or so when we will have better understanding regarding the direction my project is headed. Until then, here is a link to the project in phabricator: article-translation-metrics https://phabricator.wikimedia.org/tag/article-translation-metrics/.
Merry Christmas,
Neta neta.livneh@gmail.com
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker <ahalfaker@wikimedia.org
wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running
such
queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Best regards, Max Semenik ([[User:MaxSem]])
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <neta.livneh@gmail.com javascript:_e(%7B%7D,'cvml','neta.livneh@gmail.com');> wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker <ahalfaker@wikimedia.org javascript:_e(%7B%7D,'cvml','ahalfaker@wikimedia.org');> wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at javascript:_e(%7B%7D,'cvml','christian@quelltextlich.at');> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running such queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at javascript:_e(%7B%7D,'cvml','christian@quelltextlich.at'); 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org javascript:_e(%7B%7D,'cvml','Analytics@lists.wikimedia.org'); https://lists.wikimedia.org/mailman/listinfo/analytics
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker <ahalfaker@wikimedia.org
wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote:
For my project, we will need to sql queries on current wikipedia data (mostly revision history table).
I already have a Gerrit account. Can I get SSH access for running
such
queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker < ahalfaker@wikimedia.org> wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: > For my project, we will need to sql queries on current wikipedia data > (mostly revision history table). > > I already have a Gerrit account. Can I get SSH access for running such > queries?
It sounds like the redacted labs databases would nicely fit your use case. The easiest way to get access there is to apply for Tool Labs [1].
To get access, please file a request through
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
(Many parts around the WMF are currently getting migrated to phabricator.wikimedia.org, so if someone knows a phabricator procedure for that please chime in!)
Once you've got Tool Labs [1] access you can ssh to
tools-login.wmflabs.org
and running
sql enwiki
on that host connects you to labsdb's enwiki database and you can run your queries there (similar for other wikis).
Have fun, Christian
[1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs has more information and links about Tool Labs.
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi,
I'm trying to reach the text table (for read only purposes), but it seems that I it is not available to me (It is not in the table when I run SHOW TABLES).
Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.
Thanks, Neta
On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh neta.livneh@gmail.com wrote:
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker < ahalfaker@wikimedia.org> wrote:
Here's the instructions that Christian gave with some screenshots and discussion: https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
If you're just looking to run a few queries, you might consider http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia sites account.
-Aaron
On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
> Hi Neta, > > On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: > > For my project, we will need to sql queries on current wikipedia > data > > (mostly revision history table). > > > > I already have a Gerrit account. Can I get SSH access for running > such > > queries? > > It sounds like the redacted labs databases would nicely fit your use > case. The easiest way to get access there is to apply for Tool Labs > [1]. > > To get access, please file a request through > > > https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request > > (Many parts around the WMF are currently getting migrated to > phabricator.wikimedia.org, so if someone knows a phabricator > procedure > for that please chime in!) > > > Once you've got Tool Labs [1] access you can ssh to > > tools-login.wmflabs.org > > and running > > sql enwiki > > on that host connects you to labsdb's enwiki database and you can run > your queries there (similar for other wikis). > > Have fun, > Christian > > > > [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs > has more information and links about Tool Labs. > > > -- > ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- > Companies' registry: 360296y in Linz > Christian Aistleitner > Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > Fax: +43 7946 / 20 5 81 > Homepage: http://quelltextlich.at/ > --------------------------------------------------------------- > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Neta,
There are two ways to get revision text.
1. Query the API. See https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions Take special note of the "content" value of the rvprop parameter. This strategy is good when you want to process only few revisions.
2. Process the XML dumps. http://dumps.wikimedia.org/backup-index.html If you are working in python, I have some nice utilities for processing the XML dump files. See http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump This strategy is good when you want to process the entire history of a wiki.
-Aaron
On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh neta.livneh@gmail.com wrote:
Hi,
I'm trying to reach the text table (for read only purposes), but it seems that I it is not available to me (It is not in the table when I run SHOW TABLES).
Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.
Thanks, Neta
On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh neta.livneh@gmail.com wrote:
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote:
Actually, this is a great opportunity to say that I would love to get you guys involved or at least hear insights from the analytics team regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker < ahalfaker@wikimedia.org> wrote:
> Here's the instructions that Christian gave with some screenshots > and discussion: > https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab... > > If you're just looking to run a few queries, you might consider > http://quarry.wmflabs.org which requires no shell access -- just a > Wikimedia sites account. > > -Aaron > > On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner < > christian@quelltextlich.at> wrote: > >> Hi Neta, >> >> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >> > For my project, we will need to sql queries on current wikipedia >> data >> > (mostly revision history table). >> > >> > I already have a Gerrit account. Can I get SSH access for running >> such >> > queries? >> >> It sounds like the redacted labs databases would nicely fit your use >> case. The easiest way to get access there is to apply for Tool Labs >> [1]. >> >> To get access, please file a request through >> >> >> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request >> >> (Many parts around the WMF are currently getting migrated to >> phabricator.wikimedia.org, so if someone knows a phabricator >> procedure >> for that please chime in!) >> >> >> Once you've got Tool Labs [1] access you can ssh to >> >> tools-login.wmflabs.org >> >> and running >> >> sql enwiki >> >> on that host connects you to labsdb's enwiki database and you can >> run >> your queries there (similar for other wikis). >> >> Have fun, >> Christian >> >> >> >> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >> has more information and links about Tool Labs. >> >> >> -- >> ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- >> Companies' registry: 360296y in Linz >> Christian Aistleitner >> Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at >> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >> Fax: +43 7946 / 20 5 81 >> Homepage: http://quelltextlich.at/ >> --------------------------------------------------------------- >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yup. For context; because of the scale of Wikimedia's MediaWiki instances, we actually store revision contents in their own cluster, not in the pertinent field within the MediaWiki database schema - that field instead acts as a pointer to where the content really lives. One of the consequences of this is that even the R&D analysts don't have direct access :/. If you're operating on python, I'd thoroughly recommend Aaron's proposed utility; it's probably my favourite way to process the dumps.
On 25 January 2015 at 19:18, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Neta,
There are two ways to get revision text.
- Query the API. See
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions Take special note of the "content" value of the rvprop parameter. This strategy is good when you want to process only few revisions.
- Process the XML dumps. http://dumps.wikimedia.org/backup-index.html If
you are working in python, I have some nice utilities for processing the XML dump files. See http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump This strategy is good when you want to process the entire history of a wiki.
-Aaron
On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh neta.livneh@gmail.com wrote:
Hi,
I'm trying to reach the text table (for read only purposes), but it seems that I it is not available to me (It is not in the table when I run SHOW TABLES).
Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.
Thanks, Neta
On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh neta.livneh@gmail.com wrote:
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org wrote:
Hi Neta,
On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh neta.livneh@gmail.com wrote: > > > Actually, this is a great opportunity to say that I would love to get > you guys involved or at least hear insights from the analytics team > regarding the project's direction.
Feel free to keep me in the loop for the latter.
Best, Leila
> > > > On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker > ahalfaker@wikimedia.org wrote: >> >> Here's the instructions that Christian gave with some screenshots >> and discussion: >> https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab... >> >> If you're just looking to run a few queries, you might consider >> http://quarry.wmflabs.org which requires no shell access -- just a Wikimedia >> sites account. >> >> -Aaron >> >> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner >> christian@quelltextlich.at wrote: >>> >>> Hi Neta, >>> >>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >>> > For my project, we will need to sql queries on current wikipedia >>> > data >>> > (mostly revision history table). >>> > >>> > I already have a Gerrit account. Can I get SSH access for running >>> > such >>> > queries? >>> >>> It sounds like the redacted labs databases would nicely fit your >>> use >>> case. The easiest way to get access there is to apply for Tool Labs >>> [1]. >>> >>> To get access, please file a request through >>> >>> >>> https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request >>> >>> (Many parts around the WMF are currently getting migrated to >>> phabricator.wikimedia.org, so if someone knows a phabricator >>> procedure >>> for that please chime in!) >>> >>> >>> Once you've got Tool Labs [1] access you can ssh to >>> >>> tools-login.wmflabs.org >>> >>> and running >>> >>> sql enwiki >>> >>> on that host connects you to labsdb's enwiki database and you can >>> run >>> your queries there (similar for other wikis). >>> >>> Have fun, >>> Christian >>> >>> >>> >>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >>> has more information and links about Tool Labs. >>> >>> >>> -- >>> ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- >>> Companies' registry: 360296y in Linz >>> Christian Aistleitner >>> Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at >>> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >>> Fax: +43 7946 / 20 5 81 >>> Homepage: http://quelltextlich.at/ >>> --------------------------------------------------------------- >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Aaron and Oliver!
Strategy 2 sounds like the right way to go.
By the way, I wrote a document [1] that describes the features to search for when trying to estimate whether a page was translated or is originally written. Your comments are highly appreciated.
[1] How_to_detect_translated_articles https://www.mediawiki.org/w/index.php?title=Wikipedia_article_translation_metrics/How_to_detect_translated_articles&redirect=no
Cheers, Neta
On Mon, Jan 26, 2015 at 5:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Yup. For context; because of the scale of Wikimedia's MediaWiki instances, we actually store revision contents in their own cluster, not in the pertinent field within the MediaWiki database schema - that field instead acts as a pointer to where the content really lives. One of the consequences of this is that even the R&D analysts don't have direct access :/. If you're operating on python, I'd thoroughly recommend Aaron's proposed utility; it's probably my favourite way to process the dumps.
On 25 January 2015 at 19:18, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Neta,
There are two ways to get revision text.
- Query the API. See
https://en.wikipedia.org/w/api.php?action=help&modules=query%2Brevisions Take special note of the "content" value of the rvprop parameter. This strategy is good when you want to process only few revisions.
- Process the XML dumps. http://dumps.wikimedia.org/backup-index.html
If
you are working in python, I have some nice utilities for processing the
XML
dump files. See
http://pythonhosted.org/mediawiki-utilities/core/xml_dump.html#mw-xml-dump
This strategy is good when you want to process the entire history of a
wiki.
-Aaron
On Sun, Jan 25, 2015 at 2:24 PM, Neta Livneh neta.livneh@gmail.com
wrote:
Hi,
I'm trying to reach the text table (for read only purposes), but it
seems
that I it is not available to me (It is not in the table when I run SHOW TABLES).
Does anybody know why I don't have access and if I can get one? It is crucial for my research as I need to analyse the text.
Thanks, Neta
On Thu, Jan 15, 2015 at 7:36 PM, Neta Livneh neta.livneh@gmail.com wrote:
yeah, I do have access - Thanks! I already used ssh, and also used the quarry tool for smaller quick queries.
Cheers, Neta
On Thu, Jan 15, 2015 at 7:35 PM, Neta Livneh neta.livneh@gmail.com wrote:
On Thu, Jan 15, 2015 at 4:42 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
Sorry, old thread, but I wanted to point out that http://quarry.wmflabs.org seems like a good tool for this use case.
On Wednesday, December 24, 2014, Leila Zia leila@wikimedia.org
wrote:
> > Hi Neta, > > On Wed, Dec 24, 2014 at 7:19 AM, Neta Livneh <neta.livneh@gmail.com
> wrote: >> >> >> Actually, this is a great opportunity to say that I would love to
get
>> you guys involved or at least hear insights from the analytics team >> regarding the project's direction. > > > Feel free to keep me in the loop for the latter. > > Best, > Leila > >> >> >> >> On Wed, Dec 24, 2014 at 4:39 PM, Aaron Halfaker >> ahalfaker@wikimedia.org wrote: >>> >>> Here's the instructions that Christian gave with some screenshots >>> and discussion: >>>
https://meta.wikimedia.org/wiki/Research:Labs2/Getting_started_with_Tool_Lab...
>>> >>> If you're just looking to run a few queries, you might consider >>> http://quarry.wmflabs.org which requires no shell access -- just
a Wikimedia
>>> sites account. >>> >>> -Aaron >>> >>> On Wed, Dec 24, 2014 at 7:22 AM, Christian Aistleitner >>> christian@quelltextlich.at wrote: >>>> >>>> Hi Neta, >>>> >>>> On Wed, Dec 24, 2014 at 11:28:33AM +0200, Neta Livneh wrote: >>>> > For my project, we will need to sql queries on current
wikipedia
>>>> > data >>>> > (mostly revision history table). >>>> > >>>> > I already have a Gerrit account. Can I get SSH access for
running
>>>> > such >>>> > queries? >>>> >>>> It sounds like the redacted labs databases would nicely fit your >>>> use >>>> case. The easiest way to get access there is to apply for Tool
Labs
>>>> [1]. >>>> >>>> To get access, please file a request through >>>> >>>> >>>>
https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request
>>>> >>>> (Many parts around the WMF are currently getting migrated to >>>> phabricator.wikimedia.org, so if someone knows a phabricator >>>> procedure >>>> for that please chime in!) >>>> >>>> >>>> Once you've got Tool Labs [1] access you can ssh to >>>> >>>> tools-login.wmflabs.org >>>> >>>> and running >>>> >>>> sql enwiki >>>> >>>> on that host connects you to labsdb's enwiki database and you can >>>> run >>>> your queries there (similar for other wikis). >>>> >>>> Have fun, >>>> Christian >>>> >>>> >>>> >>>> [1] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs >>>> has more information and links about Tool Labs. >>>> >>>> >>>> -- >>>> ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- >>>> Companies' registry: 360296y in Linz >>>> Christian Aistleitner >>>> Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at >>>> 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >>>> Fax: +43 7946 / 20 5 81 >>>> Homepage: http://quelltextlich.at/ >>>> --------------------------------------------------------------- >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> Analytics@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics