Mark Bergsma wrote:
Hi Marcin,
Thank you for your answers re OS/upgrades/kernel - 100% agreed.
(Partitioning): we know that it's traditional to separate /usr /var etc, but we have found that this usually has very little use in practice, and is more often a nuisance. These days we put everything in one large enough / and only split off data partitions on servers where it matters. Of course your databases should be running off a special partition, but for the rest there is probably no real need. If you think otherwise and have good arguments, we can surely change it, of course. We do tend to use LVM for everything non-root in those cases.
Please excuse my 1995-era UNIX thinking :)
The same holds for the RAID setup: on our databases and big storage systems, most often we just run it off the same big RAID-10 array. It's more convenient and flexible and if well-configured the rest of the OS is not hitting that array much at all. If you feel there is a need, we can of course change it - but we'll need to reinstall the OS. A different RAID level would be totally fine as well of course - this is very much dependent on your needs. I picked RAID-10 as neither Aevar nor Katie knew what was necessary, and RAID-10 tends to be the best choice for databases and high performance I/O systems.
The issue is not about separating OS away from the rest, it's about testing how we can split two different usage patterns on the databases we might have.
The best solution would be to have an extra pair of small drives in RAID#1 so that we can check whether 2x or 3xRAID-10 does change anything in the picture indeed. I am somehow not confident about extNfs doing stuff optimally.
As soon as we confirm that we do not run out of space by removing two drives from RAID-10 I would definitely go for reinstall on a separate RAID#1 pair (taken out of the current RAID-10 if we have nothing small available).
Serial console/LOM access cannot easily be handed out, but should also not be necessary usually. In the unlikely event that the system becomes unmanageable in-band, just contact us directly (ask on #wikimedia-tech for example) and we'll restore it quickly.
If you handle the whole OS/hardware part - fine with me. One trouble less. :)
(re multicast from the other email)
However, Switches/routers handle multicast traffic specially, have group/port membership limits for them and we've also found several bugs. So before you start using it heavily, I'd like to know what for. :) With only 2 servers communicating, would unicast not be a better idea?
Spread (the tool I am thinking of) requires basically either broadcast or multicast. The choice is yours :) Should (a) this model prove as workable and (b) we will quickly find out we need to start to grow a farm of rendering servers (hopefully not) - you might very well decide that WMF might need to carry mcast traffic for example across Atlantic. For now, we are just our little family of few boxes in The Netherlands.
This is not something to even *think* about now - I would like to see how it works with our 2 or 3 servers (yes, including Cassini *for now* - see below), so multicast would certainly be an advantage.
I will probably get back to you re virtual IP addresses anyway once my ideas mature and will be ready to be put into action.
(re-arranged order below)
I really want to stress that these systems need to be *separate*, they cannot be used together at all. Ideally there is no traffic between those servers at all, except in the form of cassini generating visitor traffic like the rest of the Internet. Cassini is meant for playing around where lots of people have access, the other two are (in the end) really meant for production use with limited access.
We are now in the middle of the internal discussion about the future role of Cassini. It has been raised (and I share this view) that we might not really need another toolserver box (we have now one underutilized Sun and one Linux anyway) and remote access to the databases and rendering infrastructure from existing toolservers might be enough.
As I prefer to build this architecture bottom-to-the-top (i.e. ptolemy first, rendering later, user access at the end), we still need to find out what the exact role of Cassini will be.
Stable operation is simply not possible when arbitrary users can do arbitrary things on a system, and that's why we intended these systems to be very isolated from the start.
One of my ideas (this is only mine and other project members might certainly disagree) would be to have Cassini as the box that runs newer/experimental versions of production stuff from ortelius/ptolemy. This can still benefit toolserver users (so that they have the infrastructure to test their stylesheets for example), but will be definitely more under control unlike "playing around a lot". It can be very useful to share some functions with ortelius *before we go into production* just to test feasibility of a distributed rendering engine I am envisioning. This might mean that cassini will be much more closely coupled with prolomy/ortelius than with users and their stuff.
*I* would rather have another box coupled with the two *now* to test our load distribution concepts then another toolserver. Daniel, feel free to bash me for that :)
So, from WMF perspective, I would rather promote Cassini to be treated like almost-production box for now (as ptolemy is) and under same administration processes we have for WMF *until* the rendering infrastructure will be ironed out to go live. After this it can be a perfect staging box to test updates to the WMF production environment - with a software setup that could be promoted to the production boxes once tested.
Cassini is also managed by WMDE / Toolserver, ptolemy and ortelius are Wikimedia Foundation managed. So I'm afraid that we really cannot use those servers in one resource pool...
Having said above, nothing will change with Cassini without prior written consent from Wikimedia Deutschland. That's why we try to work together to have a final architecture ironed out.
If those separate clusters do not have enough resources/space to do what we need, I think we should look into buying more hardware. That is really not impossible. :)
Before we do that, I'd like to check whether how we can max out what we have. And I'd like to know, for example, do I need more smaller machines or just one big? And what exactly are our storage requirements (thinking about i18n-zed tiles for example)? I think we should be prepared for a higher demand than OSM currently has - that's where my concerns come from. I'd like to avoid unnecessary duplication of infrastructure where we could have just more power. Maps in many ways different than casual PHP/Mediawiki bot stuff run on Toolserver - we have much more power to control the environment (like putting users' rendering requests at the lowest priority).
To sum up:
(1) we will be working on architecture with the goal to make cassini work as optimal as possible for the project (2) as soon as we find out how much PostgreSQL space we need, I would ask you to reinstall ptolemy for us (3) at least multicast group would be fine for now