Hi,
Admin stuff: We are very aware of the sensitivity of the data. We've got
a group working on improving privacy protection in data sharing
(
http://privacytools.seas.harvard.edu/), which is a tacit
acknowledgement that data sharing as it exists often doesn't protect
privacy. With that said, we've designed our workflow so that all the
analysis happens on Wikimedia's servers, and everyone that needs access
to that analysis right now has access to stat1002, so for the
foreseeable future everything is staying at Wikimedia. I think it's a
bit premature to talk about formats for publishing because we don't know
what our findings are going to look like and many of these sensitivities
depend on the social, political, numerical, etc. contexts. Once we reach
that stage we'll absolutely make sure everyone is comfortable with what
we'd be publishing.
Tech stuff: This is incredibly helpful - thank you both very much. This
is exactly what I was looking for, and way easier than starting from
scratch.
- Justin