Hi all
I'm happy to let you know that new hardware has been ordered by Wikimedia Deutschland and will arrive probably in about two weeks. We will get two new systems:
* A more powerful web server, to replace hemlock: Sun Fire X4150, 2x Quad-Core Xeon, 8GB RAM, 2x73GB SAS HDD. The current web server only has two cores. * Another database server, to be used for S1 (english wikipedia), so S1 and S3 no longer have to share a server: Sun Fire X4250, 2x Quad-Core Xeon, 32GB RAM, 16x146GB SAS RAID.
This should improve performance and give us some head space for growth. Once the new servers arrive, S3 will be re-imported too, so we will have live data again.
Any ideas for names? To stay with the nightshade theme, how about Jurubeba and Erubia? Or perhaps we go the "witches' weed" way, with Datura and Mandrake? Henbane is taken, i think. Amanita sounds nice, too :)
A third server has been ordered, which will also be installed in Amsterdam, but will not be part of the toolserver cluster. It's a storage server (X4540, 24TB RAID) that will keep a live backup of all media files.
Cheers, Daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Kinzler schrieb:
Any ideas for names?
What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :)
Cheers, Arne - -- Arne Nordmann (GPG 0x55EA6EDC)
The toolserver servers have traditionally been named after poisonous plants. I like Amanita; Erubia is nice too. Not sure about Mandrake :-D
HM
-------------------------------------------------- From: "Arne Nordmann" mail@arne-nordmann.de Sent: Friday, January 16, 2009 10:36 AM To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] New hardware ordered
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Kinzler schrieb:
Any ideas for names?
What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :)
Cheers, Arne
Arne Nordmann (GPG 0x55EA6EDC) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFJcGNJRawDj1XqbtwRAmEsAJ91oeRU4aGx6XtR8dunLh/bhdFb9wCeIqVP 2NIe1Puz5wk6ZjLknrjROec= =FIrB -----END PGP SIGNATURE-----
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
I would like erubia, snakeroot, daphne, or tephrosia. (Just from looking at an online list of poison plants)
X!/Soxred93
On Jan 16, 2009, at 6:09 AM [Jan 16, 2009 ], Happy-melon wrote:
The toolserver servers have traditionally been named after poisonous plants. I like Amanita; Erubia is nice too. Not sure about Mandrake :-D
HM
From: "Arne Nordmann" mail@arne-nordmann.de Sent: Friday, January 16, 2009 10:36 AM To: toolserver-l@lists.wikimedia.org Subject: Re: [Toolserver-l] New hardware ordered
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Kinzler schrieb:
Any ideas for names?
What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :)
Cheers, Arne
Arne Nordmann (GPG 0x55EA6EDC) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFJcGNJRawDj1XqbtwRAmEsAJ91oeRU4aGx6XtR8dunLh/bhdFb9wCeIqVP 2NIe1Puz5wk6ZjLknrjROec= =FIrB -----END PGP SIGNATURE-----
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. ( http://www.nabble.com/Massive-image-loss-td19328360.html)
Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers?
-Aude
On Fri, Jan 16, 2009 at 4:50 AM, Daniel Kinzler daniel@brightbyte.dewrote:
Hi all
I'm happy to let you know that new hardware has been ordered by Wikimedia Deutschland and will arrive probably in about two weeks. We will get two new systems:
- A more powerful web server, to replace hemlock: Sun Fire X4150, 2x
Quad-Core Xeon, 8GB RAM, 2x73GB SAS HDD. The current web server only has two cores.
- Another database server, to be used for S1 (english wikipedia), so S1 and
S3 no longer have to share a server: Sun Fire X4250, 2x Quad-Core Xeon, 32GB RAM, 16x146GB SAS RAID.
This should improve performance and give us some head space for growth. Once the new servers arrive, S3 will be re-imported too, so we will have live data again.
Any ideas for names? To stay with the nightshade theme, how about Jurubeba and Erubia? Or perhaps we go the "witches' weed" way, with Datura and Mandrake? Henbane is taken, i think. Amanita sounds nice, too :)
A third server has been ordered, which will also be installed in Amsterdam, but will not be part of the toolserver cluster. It's a storage server (X4540, 24TB RAID) that will keep a live backup of all media files.
Cheers, Daniel
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Aude schrieb:
I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. (http://www.nabble.com/Massive-image-loss-td19328360.html)
Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers?
I have been thinking about this too. The current idea is indeed to have a live mirror (based on ZFS replication, I think), so any accidental deletion will be mirrored too. It just protects against t data loss by hardware failure (think: fire in the data center).
In order to protect against accidental deletions, an intentionally lagged mirror would be useful. But I have no idea how to implement something like that.
A more conventional solution would be to have a two more copies of the files, on the same server, which are synced every, say, 24 hours: backup a -> backup b, live mirror -> back a. But this would require three times the space. Considering we have 5TB worth of media files currenlty (does this include thumbnails?), and the new server will have 24TB of space, this could work for a while. But taking into account exponential growth, it wouldn't last long.
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
--daniel
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
Bryan Tong Minh schrieb:
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
I'm clueless. Tell me more... how does it work? How much space does it require?
-- daniel
On Sat, Jan 17, 2009 at 11:01 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Bryan Tong Minh schrieb:
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
I'm clueless. Tell me more... how does it work? How much space does it require?
Snapshots take virtually no space when created. They start to consume space as soon as a file that is present in a snapshot is changed in the real filesystem. Undoubtly river would know more about it.
Bryan
Bryan Tong Minh schrieb:
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
Hm, to quote the relevant section from wikipedia:
An advantage of copy-on-write is that when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored; they are also space efficient, since any unchanged data is shared among the file system and its snapshots.
Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist.
So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect!
-- daniel
On Saturday 17 January 2009 05:04:37 Daniel Kinzler wrote:
Bryan Tong Minh schrieb:
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler daniel@brightbyte.de
wrote:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
Hm, to quote the relevant section from wikipedia:
An advantage of copy-on-write is that when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored; they are also space efficient, since any unchanged data is shared among the file system and its snapshots.
Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist.
So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect!
-- daniel
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Actually it would be n-backup-stages. Snapshots need never be deleted. You can have millions of them and they don't use up much space at all ...
Cobi Carter schrieb:
So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect!
-- daniel
Actually it would be n-backup-stages. Snapshots need never be deleted. You can have millions of them and they don't use up much space at all ...
sounds too good to be true :)
-- daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Bryan Tong Minh:
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
ZFS snapshots?
we already use snapshots on the primary image server (ms1) for this reason. it's probably pointless to duplicate it on the third replica.
- river.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler wrote:
I have been thinking about this too. The current idea is indeed to have a live mirror (based on ZFS replication, I think), so any accidental deletion will be mirrored too. It just protects against t data loss by hardware failure (think: fire in the data center).
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
Why not disable deletions on the mirror totally so nothing can be deleted?
Marco
Marco Schuster schrieb:
Why not disable deletions on the mirror totally so nothing can be deleted?
Marco
replacing deletion by maving the file to some different location would be nice. i have no idea if it's possible to hook into zfs that way. i'm not in the business of writing file systems :)
-- daniel
Daniel Kinzler wrote:
A more conventional solution would be to have a two more copies of the files, on the same server, which are synced every, say, 24 hours: backup a -> backup b, live mirror -> back a. But this would require three times the space. Considering we have 5TB worth of media files currenlty (does this include thumbnails?), and the new server will have 24TB of space, this could work for a while. But taking into account exponential growth, it wouldn't last long.
Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?
--daniel
Seems worth mentioning how I am currently replicating commons files.
First, there's a bot watching file uploads to scan them, so all files are usually already at the box. The I run a script to make quasi-snapshots of commons. They aren't real snapshots, as I use the api and thus not an exact point in time. Toolserver doesn't have that problem, as it keeps a commons db copy, it can directly query a snapshot of the image table.
For each image, the scripts look for a copy on previous snapshots as well as the uploads copy (verifying by the hash). Only a few iamges are not found an thus need to be downloaded. All others are hardlinked.
As each download is done on a different folder, i get snapshots of different points of time. Deleted images are simply not hardlinked. The system has xfs, but the stript doesn't require special abilities on the filesystem other than typical unix hardlinks, although a filesystem without a fixed inode block is really encouraged.
You may spent some GB per snapshot in inodes (1GB per 4M files given a inode size of 256) and some for folder contents, but that's completely aceptable as size of new files you find per snapshot is one order of magnitude greater.
Some caveats: oldimage table has 'unexpected' entries. Don't make assumptions such as "a filename can't be twice" or "there will always be a file".
Of course, the code is available. If I can be of help... just ask :)
Yours, Platonides
On Sat, Jan 17, 2009 at 4:30 AM, Daniel Kinzler daniel@brightbyte.de wrote:
live mirror -> back a. But this would require three times the space.
What? No.
Hardlinks—my friend— hardlinks. …Which is what I did here when things were being rsync pushed. You'll still have a risk of corruption but unlinks won't be a problem.
I do have a copy of all the images now, through a kludgey process that platonides created an operates (…and I had most of the 400 lost).
(and I had an offer from RobH to restart the rsync replication; though I hadn't taken him up on it because I'm unsure if I have enough freespace).
(There are now some discussions about making mirrors of uploads, so I'm bringing up this old thread)
On 16/01/09 10:50, Daniel Kinzler wrote:
A third server has been ordered, which will also be installed in Amsterdam, but will not be part of the toolserver cluster. It's a storage server (X4540, 24TB RAID) that will keep a live backup of all media files.
On 17/01/09 06:06, Aude schrieb:
I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. (http://www.nabble.com/Massive-image-loss-td19328360.html)
Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers?
-Aude
What's the status of this server? It doesn't seem to match any on https://wiki.toolserver.org/view/Servers Is toolserver cluster currently syncing wikimedia uploads? How is it doing so?
toolserver-l@lists.wikimedia.org