New hardware ordered

List overview All Threads
Download

newer

older

SSH login issue

I make a vector skins for...

Daniel Kinzler

16 Jan 2009 16 Jan '09

10:50 a.m.

Hi all I'm happy to let you know that new hardware has been ordered by Wikimedia Deutschland and will arrive probably in about two weeks. We will get two new systems: * A more powerful web server, to replace hemlock: Sun Fire X4150, 2x Quad-Core Xeon, 8GB RAM, 2x73GB SAS HDD. The current web server only has two cores. * Another database server, to be used for S1 (english wikipedia), so S1 and S3 no longer have to share a server: Sun Fire X4250, 2x Quad-Core Xeon, 32GB RAM, 16x146GB SAS RAID. This should improve performance and give us some head space for growth. Once the new servers arrive, S3 will be re-imported too, so we will have live data again. Any ideas for names? To stay with the nightshade theme, how about Jurubeba and Erubia? Or perhaps we go the "witches' weed" way, with Datura and Mandrake? Henbane is taken, i think. Amanita sounds nice, too :) A third server has been ordered, which will also be installed in Amsterdam, but will not be part of the toolserver cluster. It's a storage server (X4540, 24TB RAID) that will keep a live backup of all media files. Cheers, Daniel

Show replies by date

Arne Nordmann

16 Jan 16 Jan

11:36 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Daniel Kinzler schrieb:

...

Any ideas for names?

What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :) Cheers, Arne - -- Arne Nordmann (GPG 0x55EA6EDC)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJcGNJRawDj1XqbtwRAmEsAJ91oeRU4aGx6XtR8dunLh/bhdFb9wCeIqVP 2NIe1Puz5wk6ZjLknrjROec= =FIrB -----END PGP SIGNATURE-----

Happy-melon

12:09 p.m.

The toolserver servers have traditionally been named after poisonous plants. I like Amanita; Erubia is nice too. Not sure about Mandrake :-D HM -------------------------------------------------- From: "Arne Nordmann" <mail(a)arne-nordmann.de> Sent: Friday, January 16, 2009 10:36 AM To: <toolserver-l(a)lists.wikimedia.org> Subject: Re: [Toolserver-l] New hardware ordered

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Daniel Kinzler schrieb:

Any ideas for names?

What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :) Cheers, Arne - -- Arne Nordmann (GPG 0x55EA6EDC)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJcGNJRawDj1XqbtwRAmEsAJ91oeRU4aGx6XtR8dunLh/bhdFb9wCeIqVP 2NIe1Puz5wk6ZjLknrjROec= =FIrB -----END PGP SIGNATURE-----

_______________________________________________ Toolserver-l mailing list Toolserver-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Soxred93

1 p.m.

I would like erubia, snakeroot, daphne, or tephrosia. (Just from looking at an online list of poison plants) X!/Soxred93 On Jan 16, 2009, at 6:09 AM [Jan 16, 2009 ], Happy-melon wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Daniel Kinzler schrieb:

Any ideas for names?

What about "pipewrench" and "screwdriver". Those are /tool/ servers, right? :) Cheers, Arne - -- Arne Nordmann (GPG 0x55EA6EDC)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJcGNJRawDj1XqbtwRAmEsAJ91oeRU4aGx6XtR8dunLh/bhdFb9wCeIqVP 2NIe1Puz5wk6ZjLknrjROec= =FIrB -----END PGP SIGNATURE-----

_______________________________________________ Toolserver-l mailing list Toolserver-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Aude

17 Jan 17 Jan

6:06 a.m.

I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. ( http://www.nabble.com/Massive-image-loss-td19328360.html) Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers? -Aude On Fri, Jan 16, 2009 at 4:50 AM, Daniel Kinzler <daniel(a)brightbyte.de>wrote;wrote:

...

-- Aude

Daniel Kinzler

10:30 a.m.

Aude schrieb:

...

I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. (http://www.nabble.com/Massive-image-loss-td19328360.html) Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers?

I have been thinking about this too. The current idea is indeed to have a live mirror (based on ZFS replication, I think), so any accidental deletion will be mirrored too. It just protects against t data loss by hardware failure (think: fire in the data center). In order to protect against accidental deletions, an intentionally lagged mirror would be useful. But I have no idea how to implement something like that. A more conventional solution would be to have a two more copies of the files, on the same server, which are synced every, say, 24 hours: backup a -> backup b, live mirror -> back a. But this would require three times the space. Considering we have 5TB worth of media files currenlty (does this include thumbnails?), and the new server will have 24TB of space, this could work for a while. But taking into account exponential growth, it wouldn't last long. Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas? --daniel

Bryan Tong Minh

10:51 a.m.

On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

...

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

Daniel Kinzler

11:01 a.m.

Bryan Tong Minh schrieb:

...

On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

I'm clueless. Tell me more... how does it work? How much space does it require? -- daniel

Bryan Tong Minh

11:06 a.m.

On Sat, Jan 17, 2009 at 11:01 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

...

Bryan Tong Minh schrieb:

On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

I'm clueless. Tell me more... how does it work? How much space does it require?

Snapshots take virtually no space when created. They start to consume space as soon as a file that is present in a snapshot is changed in the real filesystem. Undoubtly river would know more about it. Bryan

Daniel Kinzler

11:04 a.m.

Bryan Tong Minh schrieb:

...

On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

Hm, to quote the relevant section from wikipedia:

...

An advantage of copy-on-write is that when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored; they are also space efficient, since any unchanged data is shared among the file system and its snapshots. Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist.

So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect! -- daniel

Cobi Carter

11:07 a.m.

On Saturday 17 January 2009 05:04:37 Daniel Kinzler wrote:

...

Bryan Tong Minh schrieb:

On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

Hm, to quote the relevant section from wikipedia:

So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect! -- daniel _______________________________________________ Toolserver-l mailing list Toolserver-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Actually it would be n-backup-stages. Snapshots need never be deleted. You can have millions of them and they don't use up much space at all ...

Daniel Kinzler

11:11 a.m.

Cobi Carter schrieb:

...

So... this means we can have the two-backup-stages solution i suggested, without wasting space, because the unchanged data is shared by between tween the copies? That would be perfect! -- daniel

Actually it would be n-backup-stages. Snapshots need never be deleted. You can have millions of them and they don't use up much space at all ...

sounds too good to be true :) -- daniel

River Tarnell

11:28 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bryan Tong Minh:

...

Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas?

ZFS snapshots?

we already use snapshots on the primary image server (ms1) for this reason. it's probably pointless to duplicate it on the third replica. - river.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- iD8DBQFJcbLDIXd7fCuc5vIRAmRwAKCLDlNfI+KJBjzvEnTBKkbFObfFSgCguXHN tcyaC/qIH/gPtCQK/L6PXMc= =4oiv -----END PGP SIGNATURE-----

Marco Schuster

10:54 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Jan 17, 2009 at 10:30 AM, Daniel Kinzler wrote:

...

Why not disable deletions on the mirror totally so nothing can be deleted? Marco

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: http://getfiregpg.org iD8DBQFJcarwW6S2GapJUuQRAgCQAJ0XfYMSDXRdlv35cqEugYCMitXmDgCfcYy+ YTC5vfDsyIEEli6pzB2/Kkc= =T7Sy -----END PGP SIGNATURE-----

Daniel Kinzler

11 a.m.

Marco Schuster schrieb:

...

Why not disable deletions on the mirror totally so nothing can be deleted? Marco

replacing deletion by maving the file to some different location would be nice. i have no idea if it's possible to hook into zfs that way. i'm not in the business of writing file systems :) -- daniel

Platonides

11:54 p.m.

Daniel Kinzler wrote:

...

A more conventional solution would be to have a two more copies of the files, on the same server, which are synced every, say, 24 hours: backup a -> backup b, live mirror -> back a. But this would require three times the space. Considering we have 5TB worth of media files currenlty (does this include thumbnails?), and the new server will have 24TB of space, this could work for a while. But taking into account exponential growth, it wouldn't last long. Tripling space requirements seems a bit of overkill. Maybe there's a smarter solution. Ideas? --daniel

Seems worth mentioning how I am currently replicating commons files. First, there's a bot watching file uploads to scan them, so all files are usually already at the box. The I run a script to make quasi-snapshots of commons. They aren't real snapshots, as I use the api and thus not an exact point in time. Toolserver doesn't have that problem, as it keeps a commons db copy, it can directly query a snapshot of the image table. For each image, the scripts look for a copy on previous snapshots as well as the uploads copy (verifying by the hash). Only a few iamges are not found an thus need to be downloaded. All others are hardlinked. As each download is done on a different folder, i get snapshots of different points of time. Deleted images are simply not hardlinked. The system has xfs, but the stript doesn't require special abilities on the filesystem other than typical unix hardlinks, although a filesystem without a fixed inode block is really encouraged. You may spent some GB per snapshot in inodes (1GB per 4M files given a inode size of 256) and some for folder contents, but that's completely aceptable as size of new files you find per snapshot is one order of magnitude greater. Some caveats: oldimage table has 'unexpected' entries. Don't make assumptions such as "a filename can't be twice" or "there will always be a file". Of course, the code is available. If I can be of help... just ask :) Yours, Platonides

Gregory Maxwell

18 Jan 18 Jan

12:01 a.m.

On Sat, Jan 17, 2009 at 4:30 AM, Daniel Kinzler <daniel(a)brightbyte.de> wrote:

...

live mirror -> back a. But this would require three times the space.

What? No. Hardlinks—my friend— hardlinks. …Which is what I did here when things were being rsync pushed. You'll still have a risk of corruption but unlinks won't be a problem. I do have a copy of all the images now, through a kludgey process that platonides created an operates (…and I had most of the 400 lost). (and I had an offer from RobH to restart the rsync replication; though I hadn't taken him up on it because I'm unsure if I have enough freespace).

Platonides

29 Jun 29 Jun

12:37 a.m.

(There are now some discussions about making mirrors of uploads, so I'm bringing up this old thread) On 16/01/09 10:50, Daniel Kinzler wrote:

...

A third server has been ordered, which will also be installed in Amsterdam, but will not be part of the toolserver cluster. It's a storage server (X4540, 24TB RAID) that will keep a live backup of all media files.

On 17/01/09 06:06, Aude schrieb:

...

I'm intrigued by the third server, which will keep copies of all media files. Right now, as I understand it, copies of images and media files on commons and elsewhere are not backed up. In September, 496 files were lost from commons, and the developers were asking around if anyone had backup copies and asking uploaders to re-upload. (http://www.nabble.com/Massive-image-loss-td19328360.html) Will the new media file backup server address this sort of problem? Or being a live backup, will it suffer from the same issues/mistakes that affect the main Wikimedia servers? -Aude

What's the status of this server? It doesn't seem to match any on https://wiki.toolserver.org/view/Servers Is toolserver cluster currently syncing wikimedia uploads? How is it doing so?

4686

days inactive

5579

days old

toolserver-l@lists.wikimedia.org

Manage subscription

17 comments

11 participants

tags (0)

participants (11)

Arne Nordmann
Aude
Bryan Tong Minh
Cobi Carter
Daniel Kinzler
Gregory Maxwell
Happy-melon
Marco Schuster
Platonides
River Tarnell
Soxred93