On Sep 16, 2010, at 6:44 AM, Bod Notbod bodnotbod@gmail.com wrote:
On Thu, Sep 16, 2010 at 11:14 AM, Aude aude.wiki@gmail.com wrote:
Surely there are third parties with such experience and interested in this. [...] Surely google has or should have copy?
It would be interesting to know what Google has. I recently began a new article and was stunned to see that Google had indexed, given a high ranking to, and (IIRC) had a cache of the article within the day.
I see new articles & edits appear in Google searches almost immediately.
I'm not technical, so I speak from ignorance, but I imagine they wouldn't have article histories.
Probably. If i remember correctly, WMF gets some modest income from google (& others?) for providing priority feeds of recent changes (as feeds or through API) whereas normal users have API limits. Please clarify if someone knows better!
The notion that Wikipedia was currently vulnerable to data loss had honestly never occurred to me; I thought that the reference sites that use our content meant that back-ups are ubiquitous. You've all given me the fear.
I don't fear anything bad but concerned.
But suppose (very very very unlikely) there was some massive scandal and fundraising dried up or some massive lawsuit or other scenario and WMF ceased to exist? Not impossible? (what's the reserve? How long can wmf survive if fundraising dried up today?)
Distributed mirrors and database dumps are in my view fundamental top priority, providing peace of mind, right along w/ keeping servers running. All the other WMF staff programs (awesome that they are!) are far secondary
Would also be cool to see more innovative uses of wikipedia content, made possible with good dumps
@aude
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
I entirely agree that full, distributed backups of all content in Wikimedia projects are a top priority.
This shouldn't only include the publicly available dumps, but also a regular secure off-site backup of "Wikimedia in a box" (essentially everything needed to restore a fully operating network of sites -- all data, software, documentation). This is already part of our operations planning, but it doesn't exist yet.
For privacy reasons, we can't back up all data everywhere (e.g. user account information) -- it might be worth thinking about longer term strategies for portability of that data (e.g. a group of unaffiliated entrusted individuals who hold encryption keys). But, for the publicly available dumps, I don't see a list of mirrors prominently linked from http://dumps.wikimedia.org/backup-index.html -- I think starting a page at http://meta.wikimedia.org/wiki/Data_dumps/Mirrors with mirroring instructions (if such a page doesn't already exist somewhere), prominently highlighting it at dumps.wikimedia.org, and spreading the word would be a good start. We are already generating MD5s, so it shouldn't be hard for engaged community members to help with standard/policy setting, verification of mirror status, etc.
On Sep 16, 2010, at 12:58 PM, Erik Moeller erik@wikimedia.org wrote:
I entirely agree that full, distributed backups of all content in Wikimedia projects are a top priority.
This shouldn't only include the publicly available dumps, but also a regular secure off-site backup of "Wikimedia in a box" (essentially everything needed to restore a fully operating network of sites -- all data, software, documentation). This is already part of our operations planning, but it doesn't exist yet.
For privacy reasons, we can't back up all data everywhere (e.g. user account information) -- it might be worth thinking about longer term strategies for portability of that data (e.g. a group of unaffiliated entrusted individuals who hold encryption keys). But, for the publicly available dumps, I don't see a list of mirrors prominently linked from http://dumps.wikimedia.org/backup-index.html -- I think starting a page at http://meta.wikimedia.org/wiki/Data_dumps/Mirrors with mirroring instructions (if such a page doesn't already exist somewhere), prominently highlighting it at dumps.wikimedia.org, and spreading the word would be a good start. We are already generating MD5s, so it shouldn't be hard for engaged community members to help with standard/policy setting, verification of mirror status, etc.
Thank you Erik!
@aude
-- Erik Möller Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On 16 September 2010 17:58, Erik Moeller erik@wikimedia.org wrote:
For privacy reasons, we can't back up all data everywhere (e.g. user account information) -- it might be worth thinking about longer term strategies for portability of that data (e.g. a group of unaffiliated entrusted individuals who hold encryption keys). But, for the publicly available dumps, I don't see a list of mirrors prominently linked from http://dumps.wikimedia.org/backup-index.html -- I think starting a page at http://meta.wikimedia.org/wiki/Data_dumps/Mirrors with mirroring instructions (if such a page doesn't already exist somewhere), prominently highlighting it at dumps.wikimedia.org, and spreading the word would be a good start. We are already generating MD5s, so it shouldn't be hard for engaged community members to help with standard/policy setting, verification of mirror status, etc.
Surely dumps would be a natural for the Internet Archive and the Library of Congress.
- d.
2010/9/16 David Gerard dgerard@gmail.com:
Surely dumps would be a natural for the Internet Archive and the Library of Congress.
As Tomasz noted in [1], we're already talking to the LOC about keeping mirrors. But lots of copies keep stuff safe, and it's something that the community can easily help with, by creating clear instructions for mirroring and reaching out to the kinds of organizations that would happily pull regular copies.
[1] http://lists.wikimedia.org/pipermail/wikitech-l/2010-September/049433.html
On 16 September 2010 22:16, Erik Moeller erik@wikimedia.org wrote:
2010/9/16 David Gerard dgerard@gmail.com:
Surely dumps would be a natural for the Internet Archive and the Library of Congress.
As Tomasz noted in [1], we're already talking to the LOC about keeping mirrors. But lots of copies keep stuff safe, and it's something that the community can easily help with, by creating clear instructions for mirroring and reaching out to the kinds of organizations that would happily pull regular copies. [1] http://lists.wikimedia.org/pipermail/wikitech-l/2010-September/049433.html
Absolutely, I meant as well!
- d.
wikimedia-l@lists.wikimedia.org