Planning for the full redesign to kick off early 2016. When I say "full redesign" I mean it, so let's not talk small stuff about whether json or xml is a batter format; let's talk about a design that will allow us to plug in various formats easily, for example.
Starting point for discussion here:
https://phabricator.wikimedia.org/T114019
See the last link on the description.
What we can start on now: are these the right components to break the dumps into? Can we indeed cover all sorts of dumps this way? What existing software can we re-use? WHat sort of grid computing or other package can we use for the "black box" in the diagram? Etc. etc. We want to get as much of this hashed out ahead of time so that we can get the maximum out of the session.
Ariel
i would like to participate in this from a community perspective, i work in the BT backup team for daytime job so have some experience in such matters. however can i ask that some serious consideration be put into the retention period information i mentioned a while ago since if articles are deleted for good reason then it is important to know when the backups will expire of that information as well
colin
Sent from my iPhone
On 29 Oct 2015, at 16:13, "Ariel T. Glenn" aglenn@wikimedia.org wrote:
Planning for the full redesign to kick off early 2016. When I say "full redesign" I mean it, so let's not talk small stuff about whether json or xml is a batter format; let's talk about a design that will allow us to plug in various formats easily, for example.
Starting point for discussion here:
https://phabricator.wikimedia.org/T114019
See the last link on the description.
What we can start on now: are these the right components to break the dumps into? Can we indeed cover all sorts of dumps this way? What existing software can we re-use? WHat sort of grid computing or other package can we use for the "black box" in the diagram? Etc. etc. We want to get as much of this hashed out ahead of time so that we can get the maximum out of the session.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Hello,
The retention period is virtually indefinite. All the data available for download are archived to the Internet Archive. [1] More specifically, all dumps are available for as long as the Internet Archive is alive. Hence, the dumps provided are moved to the Internet Archive to relieve the downloads server of space for the generation of newer dumps.
You can search for any dump of any wiki from 2014 onwards. There are a couple of enwiki dumps that date even further back as well.
Hope this helps!
[1]: https://archive.org/details/wikimediadownloads
On 30 Oct 2015, at 18:52, Colin Johnston colinj@gt86car.org.uk wrote:
i would like to participate in this from a community perspective, i work in the BT backup team for daytime job so have some experience in such matters. however can i ask that some serious consideration be put into the retention period information i mentioned a while ago since if articles are deleted for good reason then it is important to know when the backups will expire of that information as well
colin
Sent from my iPhone
On 29 Oct 2015, at 16:13, "Ariel T. Glenn" aglenn@wikimedia.org wrote:
Planning for the full redesign to kick off early 2016. When I say "full redesign" I mean it, so let's not talk small stuff about whether json or xml is a batter format; let's talk about a design that will allow us to plug in various formats easily, for example.
Starting point for discussion here:
https://phabricator.wikimedia.org/T114019
See the last link on the description.
What we can start on now: are these the right components to break the dumps into? Can we indeed cover all sorts of dumps this way? What existing software can we re-use? WHat sort of grid computing or other package can we use for the "black box" in the diagram? Etc. etc. We want to get as much of this hashed out ahead of time so that we can get the maximum out of the session.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org