Thank you very much, Luca!
To make this nice documentation easier to discover, I moved it to
Analytics/Systems/Clients
<https://wikitech.wikimedia.org/wiki/Analytics/Systems/Clients> along
with the other information on the clients from Analytics/Data access.
On Tue, 18 Feb 2020 at 17:11, Isaac Johnson <isaac(a)wikimedia.org> wrote:
Thanks for pulling together these directions
Luca! I did a little
clean-up and will try to remember to do so more routinely.
Adding to what Diego said, I also started using stat1007 because it has
the most access to resources (dumps, Hadoop, MariaDB), and then my virtual
environments, config files, etc. are there and so I tend to do all of my
work on stat1007 even when the other stat machines might work for other
projects. Putting the GPU on stat1005 helped me diversify a little but I'm
very excited to hear that the stat machines will be more standardized so it
matters less which machine I choose. While I have no desire to be spread
out across the machines (a few projects on stat1004, a few on stat1005,
etc.) because then I'll certainly lose track of where different projects
are, I would be open to trying to choose another host as my "main"
workspace.
Best,
Isaac
On Tue, Feb 18, 2020 at 10:53 AM Andrew Otto <otto(a)wikimedia.org> wrote:
I added a 'GPU?' column too. :) THANKS
LUCA!
On Tue, Feb 18, 2020 at 11:51 AM Luca Toscano <ltoscano(a)wikimedia.org>
wrote:
> Hey Diego,
>
> added a section at the end of the page with the info requested, let me
> know if anything is missing :)
>
> Luca
>
> Il giorno mar 18 feb 2020 alle ore 17:37 Diego Saez-Trumper <
> diego(a)wikimedia.org> ha scritto:
>
>> Thanks for this Luca.
>>
>> I tend to use stat1007 because I know that machine has a lot of
>> ram/cpu and HDFS access. From other statsX I'm not sure which of them have
>> what resources (I know at least one of them doesn't have HDFS access).
>> There is a table where I can look at a summary of resources per machine?
>>
>> Thanks again.
>>
>> On Tue, Feb 18, 2020 at 8:53 AM Luca Toscano <ltoscano(a)wikimedia.org>
>> wrote:
>>
>>> Hi everybody!
>>>
>>> I created the following doc:
>>>
https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Analytics_Client_No…
>>>
>>> It contains two FAQ:
>>> - How do I ensure that there is enough space on disk before storing
>>> big datasets/files ?
>>> - How do I check the space used by my files/data on stat/notebook
>>> hosts ?
>>>
>>> Please read them and let me know if anything is not clear or
>>> missing. We have plenty of space on stat100X hosts, but we tend to cluster
>>> on single machines like stat1007 for some reason, ending up in fighting for
>>> resources.
>>>
>>> On a related note, we are going to work on unifying stat/notebook
>>> puppet configs in
https://phabricator.wikimedia.org/T243934, so
>>> eventually all Analytics clients will be exactly the same.
>>>
>>> Thanks!
>>>
>>> Luca (on behalf of the Analytics team)
>>>
>>>
>>> _______________________________________________
>>> Research-Internal mailing list
>>> Research-Internal(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/research-internal
>>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> _______________________________________________
> Research-Internal mailing list
> Research-Internal(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/research-internal
>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org