Hi everybody,
as part of https://phabricator.wikimedia.org/T201165 the Analytics team thought to reach out to everybody to make it clear that all the home directories on the stat/notebook nodes are not backed up periodically. They run on a software RAID configuration spanning multiple disks of course, so we are resilient on a disk failure, but even if unlikely if might happen that a host could loose all its data. Please keep this in mind when working on important projects and/or handling important data that you care about.
I just added a warning to https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients. If you have really important data that is too big to backup, keep in mind that you can use your home directory (/user/your-username) on HDFS (that replicates data three times across multiple nodes).
Please let us know if you have comments/suggestions/etc.. in the aforementioned task.
Thanks in advance!
Luca (on behalf of the Analytics team)
Hi Luca,
Thanks for the heads up. Isaac is coordinating a response from the Research side.
I have one question for you: As you allow/encourage for more copies of the files to exist, what is the mechanism you'd like to put in place for reducing the chances of PII to be copied in new folders that then will be even harder (for your team) to keep track of? Having an explicit process/understanding about this will be very helpful.
Thanks, Leila
On Thu, Jul 4, 2019 at 3:14 AM Luca Toscano ltoscano@wikimedia.org wrote:
Hi everybody,
as part of https://phabricator.wikimedia.org/T201165 the Analytics team thought to reach out to everybody to make it clear that all the home directories on the stat/notebook nodes are not backed up periodically. They run on a software RAID configuration spanning multiple disks of course, so we are resilient on a disk failure, but even if unlikely if might happen that a host could loose all its data. Please keep this in mind when working on important projects and/or handling important data that you care about.
I just added a warning to https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients. If you have really important data that is too big to backup, keep in mind that you can use your home directory (/user/your-username) on HDFS (that replicates data three times across multiple nodes).
Please let us know if you have comments/suggestions/etc.. in the aforementioned task.
Thanks in advance!
Luca (on behalf of the Analytics team) _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org