Hi Andrew,

thanks for the advice. Quick follow-up question: Which kind of access/account do I need for yarn.wikimedia.org? Neither my MediaWiki-Account nor my WikiTech-Account work.

Greetings,

Adrian

On 08/12/2017 09:58 PM, Andrew Otto wrote:

I have only 2 comments:

1. Please nice any heavy long running local processes, so that others can continue to use the machine.

2. For large data, consider using the Hadoop cluster! I think you are getting your data from the webrequest logs in Hadoop anyway, so you might as well continue to do processing there, no? If you do, you shouldn’t have to worry (too much) about resource contention: https://yarn.wikimedia.org/cluster/scheduler

:)

- Andrew Otto

Systems Engineer, WMF

On Sat, Aug 12, 2017 at 2:20 PM, Erik Zachte <ezachte@wikimedia.org> wrote:

I will soon start the two Wikistats jobs which run for about several weeks each month,

They might use two cores each, one for unzip, one for perl.

How many cores are there anyway?

Cheers,

Erik

From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Adrian Bielefeldt
Sent: Saturday, August 12, 2017 19:44
To: analytics@lists.wikimedia.org
Subject: [Analytics] Resources stat1005

Hello everyone,

I wanted to ask about resource allocation on stat1005. We need quite a bit since we process every entry in wdqs_extract and I was wondering how many cores and how much memory we can use without conflicting with anyone else.

Greetings,

Adrian

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics