I second Leila's question. The issue of how
we flag PII data and ensure
it's appropriately scrubbed came up in our team meeting yesterday. We're
discussing team practices for data/project backups tomorrow and plan to
come out with some proposals, at least for the short term.
Are there any existing processes or guidelines I should be aware of?
Kate Zimmerman (she/they)
Head of Product Analytics
On Wed, Jul 10, 2019 at 9:00 AM Leila Zia <leila(a)wikimedia.org> wrote:
Thanks for the heads up. Isaac is coordinating a response from the
I have one question for you: As you allow/encourage for more copies of
the files to exist, what is the mechanism you'd like to put in place
for reducing the chances of PII to be copied in new folders that then
will be even harder (for your team) to keep track of? Having an
explicit process/understanding about this will be very helpful.
On Thu, Jul 4, 2019 at 3:14 AM Luca Toscano <ltoscano(a)wikimedia.org>
thought to reach out to everybody to make it
clear that all the home
directories on the stat/notebook nodes are not backed up periodically.
run on a software RAID configuration spanning
multiple disks of
we are resilient on a disk failure, but even if
unlikely if might
that a host could loose all its data. Please keep
this in mind when
on important projects and/or handling important
data that you care
I just added a warning to
If you have really important data that is too big
to backup, keep in
that you can use your home directory
(/user/your-username) on HDFS
replicates data three times across multiple
Please let us know if you have comments/suggestions/etc.. in the
Thanks in advance!
Luca (on behalf of the Analytics team)
Wiki-research-l mailing list