The French donation is a 4U 6CPU Pentium III-750 system with 2GB of RAM and three or four 9GB SCSI drives in a RAID setup; and three 1U Celerons. 2GB for the 4U because it comes with one but the donor is upgrading it to two for us. The other equipment mentioned was for us to consider and the donor would then allocate it among all of the projects which were competing for it. We didn't get it all.
On the technical side, it was decided that the Celerons just aren't suitable for the US site because the US Squids require roughly comparable specifications and the Celerons aren't really fast enough to be page builders, given the shipping costs to get them to the US. So, European Squid caches is the role they best fit. They will need a fairly cheap RAM upgrade to be Squids. These squids will speed things up for those in France and possibly more of Europe who are not logged in, by avoiding the sometimes slow or broken transatlantic links, and for everyone by taking some of the load from the US-based Squids.
The 4U 6CPU is best for the US side. That's the main page building site and putting it there will help everyone who's logged in or getting a non-cached page. Because peak load times are different in different parts of the world, the same amount of resources for the central site delivers more benefit than dedicating them to any one place. Also, the lag from locking and unlocking the database records across a slow link (the internet instead of a LAN) would hurt performance for everyone, by keeping locks for longer than necessary.
The donor requested that the celerons be named chloe, bleuenn and ennael. That seems like a reasonable request to accept, since we don't yet have a naming convention for remote or donated equipment. I doubt that a corporate donor would want their name associated with problem reports, so I expect that we could discourage corporate names if we wanted to.
user_Jamesday wrote:
The French donation is a 4U 6CPU Pentium III-750 system with 2GB of RAM and three or four 9GB SCSI drives in a RAID setup; and three 1U Celerons. 2GB for the 4U because it comes with one but the donor is upgrading it to two for us. The other equipment mentioned was for us to consider and the donor would then allocate it among all of the projects which were competing for it. We didn't get it all.
On the technical side, it was decided that the Celerons just aren't suitable for the US site because the US Squids require roughly comparable specifications and the Celerons aren't really fast enough to be page builders, given the shipping costs to get them to the US. So, European Squid caches is the role they best fit. They will need a fairly cheap RAM upgrade to be Squids. These squids will speed things up for those in France and possibly more of Europe who are not logged in, by avoiding the sometimes slow or broken transatlantic links, and for everyone by taking some of the load from the US-based Squids.
That's excellent! Is there currently a colo in Europe, or someone willing to host one? If there isn't one, perhaps we should consider where is a) cheapest to buy colocation (or find a donor?) b) cheapest to buy bandwidth (or find a donor?) c) best-connected to the rest of Europe?
At the moment, I would imagine the best candidates to be London or Amsterdam.
-- Neil
Hi,
Le Wednesday 21 July 2004 17:03, Neil Harris a écrit :
That's excellent! Is there currently a colo in Europe, or someone willing to host one? If there isn't one, perhaps we should consider where is a) cheapest to buy colocation (or find a donor?) b) cheapest to buy bandwidth (or find a donor?) c) best-connected to the rest of Europe?
At the moment, I would imagine the best candidates to be London or Amsterdam.
-- Neil
We already found a sponsor willing to host the 3 Celeron for free in France.
Yann
Neil Harris wrote:
That's excellent! Is there currently a colo in Europe, or someone willing to host one? If there isn't one, perhaps we should consider where is a) cheapest to buy colocation (or find a donor?) b) cheapest to buy bandwidth (or find a donor?) c) best-connected to the rest of Europe?
At the moment, I would imagine the best candidates to be London or Amsterdam.
-- Neil
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
Ashar Voultoiz wrote:
Neil Harris wrote:
That's excellent! Is there currently a colo in Europe, or someone willing to host one? If there isn't one, perhaps we should consider where is a) cheapest to buy colocation (or find a donor?) b) cheapest to buy bandwidth (or find a donor?) c) best-connected to the rest of Europe?
At the moment, I would imagine the best candidates to be London or Amsterdam.
-- Neil
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
Hence my choice of London and NL. There's plenty of capacity between the two, as well.
-- Neil
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
Hence my choice of London and NL. There's plenty of capacity between the two, as well.
-- Neil
Perhaps. But hosting servers is one thing. Having human technical help now and then is necessary as well. There are people to do this in Paris.
Anthere wrote:
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
Hence my choice of London and NL. There's plenty of capacity between the two, as well.
-- Neil
Perhaps. But hosting servers is one thing. Having human technical help now and then is necessary as well. There are people to do this in Paris.
Then Paris is fine by me! Please don't take my comments as denigrating Paris; it's on the same pan-European fibre rings as most of the other big western EU cities.
-- Neil
Neil Harris wrote:
Anthere wrote:
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
Hence my choice of London and NL. There's plenty of capacity between the two, as well.
-- Neil
Perhaps. But hosting servers is one thing. Having human technical help now and then is necessary as well. There are people to do this in Paris.
Then Paris is fine by me! Please don't take my comments as denigrating Paris; it's on the same pan-European fibre rings as most of the other big western EU cities.
-- Neil
I was certainly not taking them as denigrating :-) just pointing out to the most logical choice :-)
I would *not* recommand my city as hosting place... the net is regularly down :-)
Ashar Voultoiz wrote:
Neil Harris wrote:
That's excellent! Is there currently a colo in Europe, or someone willing to host one? If there isn't one, perhaps we should consider where is
At the moment, I would imagine the best candidates to be London or Amsterdam.
For c) point, the biggest european countries internet wise are really well connected. Just restrict to De / Fr / Nl / Uk. Other countries might be less connected (aka Es, Cz, Ee).
Also lot of ISPs just got their transatlantic links connected to London, while NL is really the central place of the european internet.
If the idea of European colo were being pursued, one in the UK would not be recommended. Doing so would tend to reinforce the view that this is an English language dominated venture.
Ec
On Wed, 21 Jul 2004 10:26:46 -0400, user_Jamesday user_jamesday@myrealbox.com wrote:
The 4U 6CPU is best for the US side. That's the main page building site and putting it there will help everyone who's logged in or getting a non-cached page. Because peak load times are different in different parts of the world, the same amount of resources for the central site delivers more benefit than dedicating them to any one place.
Yes, resource sharing is the most efficient scheme overall, but there are other reasons to consider splitting off the European Wikis onto their own dedicated hardware:
1) Politics/Psychology - Europeans might be more willing to donate if they knew that their contributions were going straight to the Euro servers, rather than contributing to "all" Wikis (which really means contributing to the English Wikis more than anything else).
2) Design Flexibility - Having a completely separate setup on the other side of the pond would allow (at least in theory) two completely different configurations. This could be useful for testing and comparing different architectures in the future.
3) Redundancy - What happens if Something Awful (tm) happens in Florida? Although we have an army of volunteers making regular off-site backups of the DB, it would still be nice to have an already-up-and-running duplicate site in place.
I'm sure there are more reasons.... although it's certainly an open question whether they outweigh the benefits of resource sharing. Thoughts?
-Bill Clark
Bill Clark wrote: Europeans might be more willing to donate if
they knew that their contributions were going straight to the Euro servers, rather than contributing to "all" Wikis (which really means contributing to the English Wikis more than anything else).
Thoughts?
-Bill Clark
I do not get well this argument. Florida is not the english wiki. It is OUR wiki. Improving our hardware situation is helping all of us. When the servers are down, we are all down. When they are up and happy, we are all happy. What is your argument to say it *help* english wikis more than others ? Explain please
On Wed, 21 Jul 2004 20:01:49 +0200, Anthere anthere9@yahoo.com wrote:
I do not get well this argument. Florida is not the english wiki. It is OUR wiki. Improving our hardware situation is helping all of us. When the servers are down, we are all down. When they are up and happy, we are all happy. What is your argument to say it *help* english wikis more than others ? Explain please
The English Wikis use a greater portion of the available resources. That means the new servers would see a similar proportion of their resources used for serving up English pages.
Or, put another way, an across-the-board 10% increase in the number of pages served translates into a far larger number for the English Wikis than anyone else.
I don't really agree that it's a very good argument (after all, everybody would still see the same percentage performance increase) but it does mean that the English Wikipedia will see more absolute benefit from the new servers than anyone else.
-Bill Clark
Hi,
Le Wednesday 21 July 2004 20:23, Bill Clark a écrit :
On Wed, 21 Jul 2004 20:01:49 +0200, Anthere anthere9@yahoo.com wrote:
I do not get well this argument. Florida is not the english wiki. It is OUR wiki. Improving our hardware situation is helping all of us. When the servers are down, we are all down. When they are up and happy, we are all happy. What is your argument to say it *help* english wikis more than others ? Explain please
The English Wikis use a greater portion of the available resources. That means the new servers would see a similar proportion of their resources used for serving up English pages.
Or, put another way, an across-the-board 10% increase in the number of pages served translates into a far larger number for the English Wikis than anyone else.
I don't really agree that it's a very good argument (after all, everybody would still see the same percentage performance increase) but it does mean that the English Wikipedia will see more absolute benefit from the new servers than anyone else.
-Bill Clark
I think there is a misunderstanding because the European proxies will mirror the WHOLE of the Wikimedia projects: all projects in all languages, not only some European languages. And because many European people also read and contribute to English projects, even if English is not their mother tongue.
Yann
I think there is a misunderstanding because the European proxies will mirror the WHOLE of the Wikimedia projects: all projects in all languages, not only some European languages. And because many European people also read and contribute to English projects, even if English is not their mother tongue.
That sounds excellent, but isn't that technically difficult? Maybe I've missed some discussion of how to do this.
It's easy to fix it so that fr, es, de, are all routed through the European squid proxies: we just point the dns entries at the new ip numbers for the French squid cluster, no problem.
It's much harder (right?) to identify where a user is coming from and route them through the proxy that's best for them. This is what Akamai and people like this charge big bucks for doing.
Or, is there a free (or cheap) way to approximate that?
--Jimbo
On Wed, 21 Jul 2004 14:11:37 -0700, Jimmy (Jimbo) Wales jwales@wikia.com wrote:
It's much harder (right?) to identify where a user is coming from and route them through the proxy that's best for them. This is what Akamai and people like this charge big bucks for doing.
Or, is there a free (or cheap) way to approximate that?
I believe BIND is supposed to have support for topology-based answers to queries, but I'm not sure how well it works (I switched over to djbdns some time ago... and djbdns definitely doesn't support such fancy features).
Normally it would be a pain to (efficiently) determine location based on IP, but since the distinction between ARIN and RIPE addresses should be easier to determine, it might be possible to set up such mappings.
I'll look into this some and report back on it.
-Bill Clark
Normally it would be a pain to (efficiently) determine location based on IP, but since the distinction between ARIN and RIPE addresses
A few years ago, I took the databases of RIPE and the APNIC (ARIN's were unavailable) and, using a little bit of C and Perl, converted them into multiple-terminal binary decision diagrams mapping each IP address to a country. The resulting database fit on a floppy disk. Checking one IP address needed at most 32 pointer dereferences in a table that easily fits into a small part of the RAM of modern computers.
I haven't updated the database since then but this may be done by downloading tables from RIPE and APNIC and processing them.
Regards
Le Wednesday 21 July 2004 23:11, Jimmy (Jimbo) Wales a écrit :
I think there is a misunderstanding because the European proxies will mirror the WHOLE of the Wikimedia projects: all projects in all languages, not only some European languages. And because many European people also read and contribute to English projects, even if English is not their mother tongue.
That sounds excellent, but isn't that technically difficult? Maybe I've missed some discussion of how to do this.
I was refering to the point 3 in this: http://mail.wikipedia.org/pipermail/wikitech-l/2004-July/011452.html
--Jimbo
Yann
Jimmy (Jimbo) Wales wrote:
I think there is a misunderstanding because the European proxies will mirror the WHOLE of the Wikimedia projects: all projects in all languages, not only some European languages. And because many European people also read and contribute to English projects, even if English is not their mother tongue.
That sounds excellent, but isn't that technically difficult? Maybe I've missed some discussion of how to do this.
It's easy to fix it so that fr, es, de, are all routed through the European squid proxies: we just point the dns entries at the new ip numbers for the French squid cluster, no problem.
It's much harder (right?) to identify where a user is coming from and route them through the proxy that's best for them. This is what Akamai and people like this charge big bucks for doing.
Or, is there a free (or cheap) way to approximate that?
--Jimbo
== IP --> location mapping ==
The key thing here is that we don't need to be very accurate, providing we are reasonably consistent.
The simplest way to handle things is to assume that all RIPE-registered IP addresses are somewhere in Europe, and all other addresses (ARIN, APNIC,. LACNIC...) are somewhere else "nearer" the US. Then we can use Wikimedia's own DNS servers to serve different lists of server IP addresses depending on the source network of the request.
IANA allocate /8 blocks of IP addresses to the regional registries, and maintain a helpful up-to-date machine-readable list at http://www.iana.org/assignments/ipv4-address-space
That will probably get you 80%-90% of the goodness of a commercial solution.
For another alternative, you might want to look at http://countries.nerd.dk/ which seems to be taken from more fine-grained inspecting of registry data.
== Serving the DNS responses ==
I'm still not sure about what the best way is to do this. Running a couple of instances of BIND on different ports on the authoritative servers for wikipedia.org and doing reverse-NAT based on source addresses for packets that arrive on port 53 is a crude way of doing things, but it's the first that comes to mind.
Another is to hack an existing small DNS server implementation: unless someone has done it already?
-- Neil
On Wed, 21 Jul 2004, Jimmy (Jimbo) Wales wrote: ...
That sounds excellent, but isn't that technically difficult? ...
...
It's much harder (right?) to identify where a user is coming from and route them through the proxy that's best for them. This is what Akamai and people like this charge big bucks for doing.
Ok. Others have commented on this (and at least one has done some homework), but as one who has done this sort of thing (*is* doing this sort of thing) and knows first hand how Akamai's magic works, I'm gonna unmuddy the waters a bit...
First, difficulty is open to discussion. It's rather simple to configure "views" within BIND. It becomes a management hassle on the scale being proposed, but it's still very much doable.
Identifying the physical location of a user will never be 100% accurate. And technically, it doesn't matter where they *physically* are; the ip layer path between them and any farm(s) is the only important part. One can assume IANA address deligations are going to fall within the correct region. However, we all know this is not always the case. It'll have to do. (the more accurate alternative(s) are a nightmare of complexity.)
Akamai's foo (and yes, I'm literally wearing an Akamai hat *grin*) is based mostly on the localized caches. Each ISP hosting a cache has reported their announced address space to Akamai which is used to aim requests to the local cache(s). If your IP isn't covered by a local cache, you'll be aimed to a "close" public cache. For example, if you ask one of BTI's dns servers for an akadns host, you'll get back a bti address for one of the cache servers on the edge of their network. (because akamai will see the request coming from within bti's network.)
===
In short, this can be done. It'll take a bit of tweaking to get a happy balance. And it'll require some maintenance, but what doesn't around here :-)
--Ricky
PS: For my money, grab that 16GB beast and beg nforce.nl to host it for us. It'd make a nice memcache and fileserver.
On Thu, 22 Jul 2004 14:01:21 -0400 (EDT), Ricky Beam jfbeam@bluetronic.net wrote:
First, difficulty is open to discussion. It's rather simple to configure "views" within BIND. It becomes a management hassle on the scale being proposed, but it's still very much doable.
user_Jamesday pointed out another solution using the Supersparrow libraries, which I initially poo-pood as potential vaporware but upon further investigation find to be rather interesting.
They have a patch for BIND9 that allows some BGP-type magic to happen as a replacement for views, which should improve the capabilities of such a system tremendously.
I got it mostly installed (their patch for BIND9 was broken, but mostly easily fixed, although I'm still having a few problems I probably won't have the time to sort out until after this weekend... going to a wedding).
They also have an implementation that uses the Dents nameserver, but I'm not familiar with that so I don't know if that's useful at all.
-Bill Clark
Yann Forget wrote:
I think there is a misunderstanding because the European proxies will mirror the WHOLE of the Wikimedia projects: all projects in all languages, not only some European languages. And because many European people also read and contribute to English projects, even if English is not their mother tongue.
Yann
Mmmm, are you sure this is the plan and that will be globaly efficient?
I think the best will be to use these squids for the wikis in language mostly spoken in Europe and maybe Near-Middle-East and Africa. And NOT use anymore the squids in Florida for these language.
But as far as I know nothing have already be decided for this.
-- Looxix
Luc Van Oostenryck wrote:
I think the best will be to use these squids for the wikis in language mostly spoken in Europe and maybe Near-Middle-East and Africa. And NOT use anymore the squids in Florida for these language.
I agree with Luc on this, but I freely admit that to me this is the only option realistically possible anyway, unless we have some way that I have not learned about to figure out where people are just by their ip number.
--Jimbo
On Wed, 21 Jul 2004 14:49:53 -0700, Jimmy (Jimbo) Wales jwales@wikia.com wrote:
I agree with Luc on this, but I freely admit that to me this is the only option realistically possible anyway, unless we have some way that I have not learned about to figure out where people are just by their ip number.
68.32.0.0 - 68.63.255.255 = US 68.20.0.0 - 68.23.255.255 = US 4.0.0.0 - 4.255.255.255 = US 68.64.0.0 - 68.71.255.255 = US 210.10.0.0 - 210.10.127.255 = AU 12.0.0.0 - 12.255.255.255 = US 195.224.0.0 - 195.224.255.255 = UK 211.10.20.0 - 211.10.20.255 = JP 211.13.128.0 - 211.13.159.255 = JP 35.0.0.0 - 35.255.255.255 = US 64.0.0.0 - 64.3.255.255 = US 65.0.0.0 - 65.6.255.255 = US 67.43.144.0 - 67.43.159.255 = US 67.43.160.0 - 67.43.175.255 = US 68.96.0.0 - 68.111.255.255 = US 69.0.128.0 - 69.0.255.255 = US 69.132.0.0 - 69.135.255.255 = US 69.30.192.0 - 69.30.223.255 = US 83.226.0.0 - 83.227.255.255 = SE 84.128.0.0 - 84.135.255.255 = DE 84.64.0.0 - 84.71.255.255 = GB
It's fairly straight-forward to get more information like this. The ARIN/RIPE (and APNIC for Asia) breakdown is fairly clean. Things only get messy once you're inside a particular range (trying to figure out how the ARIN blocks break down, or even how ATT distributes its blocs geographically, is a total nightmare -- but not impossible.. that's basically what Akamai does for all that money).
It looks like the "sortlist" option in BIND might do what's required... but a (perhaps) better way occurred to me as well -- do source-based NAT before requests reach the nameservers.
It's simple to set up two completely different nameservers that return different RRsets (I do this all the time so that machines on my internal networks use internal IPs for machines, and outsiders get the outside addresses). We could simply do the same thing by configuring a router to forward requests from a RIPE bloc to one nameserver (which returns the European address) and to forward requests from everywhere else to the other nameserver (which would return the Florida addresses).
I think the routing-based magic would be preferable to a solution in BIND because I trust routers more than I trust BIND.
-Bill Clark
On Wed, 21 Jul 2004 18:07:05 -0400, Bill Clark wclarkxoom@gmail.com wrote:
It's fairly straight-forward to get more information like this. The ARIN/RIPE (and APNIC for Asia) breakdown is fairly clean.
Here's the full breakdown:
http://spamid.servebeer.com:8081/utils/include/spamid/registry_mappings.jsp
-Bill Clark
On Wed, 21 Jul 2004 18:13:05 -0400, Bill Clark wclarkxoom@gmail.com wrote:
Here's the full breakdown:
http://spamid.servebeer.com:8081/utils/include/spamid/registry_mappings.jsp
...and here's how to implement the proposed geographic load balancing in BIND:
http://sysadmin.oreilly.com/news/views_0501.html
-Bill Clark
On Wed, 21 Jul 2004 18:45:18 -0400, Bill Clark wclarkxoom@gmail.com wrote:
...and here's how to implement the proposed geographic load balancing in BIND:
(Oh, and for the record, djbdns CAN do this as well, I'd just never used that feature before.)
-Bill Clark
Bill Clark wrote:
On Wed, 21 Jul 2004 18:13:05 -0400, Bill Clark wclarkxoom@gmail.com wrote:
Here's the full breakdown:
http://spamid.servebeer.com:8081/utils/include/spamid/registry_mappings.jsp
...and here's how to implement the proposed geographic load balancing in BIND:
http://sysadmin.oreilly.com/news/views_0501.html
-Bill Clark
Ah. That's nifty. Now all we need is a "master" global zone file, with a perl or python cron job that uses the geo-data to munch this into a set of sorted zone files, one for each geographic region, as well as a matching bind.conf with the views information in it, and kicks bind to restart it every day keep up to date with changes in the reference file.
-- Neil
In a discussion with French developers, it was apparent that the proposed colocation choice implied that, so as to minimize bandwidth to the US, the logged-in users should *not* be routed through the European squid proxies, but should be redirected to the US proxies.
The simplest solution for this would be to have two sets of hostnames:
language.wikipedia.org for non logged-in users (where language = en, fr, de, jp, whatever...) language.logged.wikipedia.org for logged-in users
The former would be redirected to the latter when they log-in, and the latter to the former at log-out.
Only the former would be subject to DNS answers depending on the geographical region.
This seems a very easy solution to implement.
On Sat, 24 Jul 2004 23:51:42 +0200, David Monniaux david.monniaux@ens.fr wrote:
In a discussion with French developers, it was apparent that the proposed colocation choice implied that, so as to minimize bandwidth to the US, the logged-in users should *not* be routed through the European squid proxies, but should be redirected to the US proxies.
The simplest solution for this would be to have two sets of hostnames:
language.wikipedia.org for non logged-in users (where language = en, fr, de, jp, whatever...) language.logged.wikipedia.org for logged-in users
Noooo, please, that would look ugly as hell and it would piss off logged in users (myself included :)
Am Wed, 21 Jul 2004 14:49:53 -0700 hat Jimmy (Jimbo) Wales jwales@wikia.com geschrieben:
Luc Van Oostenryck wrote:
I think the best will be to use these squids for the wikis in language mostly spoken in Europe and maybe Near-Middle-East and Africa. And NOT use anymore the squids in Florida for these language.
I agree with Luc on this, but I freely admit that to me this is the only option realistically possible anyway, unless we have some way that I have not learned about to figure out where people are just by their ip number.
I see it regurlarly that german people are carrying IP's from other countries depending on their service provider in my shop. AOL for instance assigns US registered IP's to whoever they feel fit.
--Manfred
--Jimbo
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Bill Clark wrote:
Yes, resource sharing is the most efficient scheme overall, but there are other reasons to consider splitting off the European Wikis onto their own dedicated hardware:
- Politics/Psychology - Europeans might be more willing to donate if
they knew that their contributions were going straight to the Euro servers, rather than contributing to "all" Wikis (which really means contributing to the English Wikis more than anything else).
Although I do think that under the general heading "Politics/Psychology" there is something to be said for European hardware, I disagree with the way that you put it, in two major respects:
a. "contributions... going straight to the Euro servers, rather than contributing to 'all' wikis" -- this is a mindset that I want to strongly discourage, the mindset of nationalism or regionalism. We are a global project, and I don't want people to start thinking of "our" wikipedia versus "their" wikipedia.
b. "which really means contributing to the English Wikis more than anything else" -- if someone contributes to fr wikipedia, it is just not true that because the servers are in America, this amounts to contributing to en more than anything else.
What I would say is that it enhances the global nature of our mission if some servers are in Europe (and Asia) *whenever technical matters warrant it*. But the project itself is not and must not become a series of disconnected regional projects operating independently and perhaps in conflict.
- Design Flexibility - Having a completely separate setup on the
other side of the pond would allow (at least in theory) two completely different configurations. This could be useful for testing and comparing different architectures in the future.
But geographical remoteness is *less* flexible in this regard. For any N servers, we are more flexible with them in one location rather than 2, because servers could be pulled or added to a test cluster with a different architecture as we see fit.
- Redundancy - What happens if Something Awful (tm) happens in
Florida? Although we have an army of volunteers making regular off-site backups of the DB, it would still be nice to have an already-up-and-running duplicate site in place.
Yes, it would be nice, to be sure, but there are a couple of things to realize:
First, in terms of having redundancy, it makese sense to first look at the most likely points of failure. Since we are colocation in an excellent professional facility with tons of redundancy, the chances of the colo itself going down are very low. I lie awake at nights worrying about zwinger, not about the facility itself.
Second, it certainly "would be nice" to have a fully redundant duplicate site somewhere, but the cost would be exorbitant compared to the likelihood of ever needing it.
---
I fully support using these celerons as squids in Europe, because it might have some tiny performance benefits, and because it doesn't lead down a path of complexity, and because I think it is a nice political gesture.
--Jimbo
On Wed, 21 Jul 2004 11:34:41 -0700, Jimmy (Jimbo) Wales jwales@wikia.com wrote:
a. "contributions... going straight to the Euro servers, rather than contributing to 'all' wikis" -- this is a mindset that I want to strongly discourage, the mindset of nationalism or regionalism. We are a global project, and I don't want people to start thinking of "our" wikipedia versus "their" wikipedia.
I'd just assumed that mindset was already there, and thought that it might impact the willingness of some to donate time and/or resources. However, I see your point about wanting to discourage that attitude, and so I guess that should trump other concerns.
b. "which really means contributing to the English Wikis more than anything else" -- if someone contributes to fr wikipedia, it is just not true that because the servers are in America, this amounts to contributing to en more than anything else.
Sure it's true: Those new servers (except the Euro squid boxes) are going to spend more cycles serving up English pages than any other kind, just like all the other servers.
Granted, if they weren't serving up English pages then many of those cycles would be wasted anyway -- but some people would probably rather see some cycles wasted if it meant slightly better performance for the pages in their own language.
That's the same attitude you've already said you want to strongly discourage, though.
But geographical remoteness is *less* flexible in this regard. For any N servers, we are more flexible with them in one location rather than 2, because servers could be pulled or added to a test cluster with a different architecture as we see fit.
I guess I was arguing more for using different clusters for different Wikis, and you're right that this doesn't really have anything directly to do with geography.
Truth be told, I've always been an opponent of centralization in any form. I just don't trust the idea of having all of the machines in the same location, watched over by the same engineers, configured the same way, etc. Although it might be easier to reconfigure clusters if they're in the same location, it means that the same people with the same biases would be doing so.
So I guess my point was that geographic diversity would also imply procedural and cultural diversity in the operation and configuration of the equipment, and THAT could provide for more flexibility in testing different ways of doing things. Maybe the French systems administrators and network engineers would have a different way of approaching a problem that wouldn't ever be tried if all of the servers remained in Florida.
First, in terms of having redundancy, it makese sense to first look at the most likely points of failure. Since we are colocation in an excellent professional facility with tons of redundancy, the chances of the colo itself going down are very low. I lie awake at nights worrying about zwinger, not about the facility itself.
I just worry about the eggs-in-one-basket scenarios: Airplanes crashes, hurricanes and earthquakes (yes I know it's Florida :) and things of that nature. Like I said, I'm fundamentally opposed to centralization (and probably more than a bit paranoid).
If nobody else really sees much benefit to having an independent datacenter setup, then I guess I'll stop trying to think up reasons to justify it. :)
-Bill Clark
Bill Clark wrote:
Maybe the French systems administrators and network engineers would have a different way of approaching a problem that wouldn't ever be tried if all of the servers remained in Florida.
I find it difficult to imagine why this would be the case.
At least among the developers, I don't know of *any* "nationalistic chauvanism" that would prevent the acceptance and testing of any ideas based on geography. Indeed, I think the majority of developers are European, anyway.
--Jimbo
On Wed, 21 Jul 2004 13:28:11 -0700, Jimmy (Jimbo) Wales jwales@wikia.com wrote:
Bill Clark wrote:
Maybe the French systems administrators and network engineers would have a different way of approaching a problem that wouldn't ever be tried if all of the servers remained in Florida.
I find it difficult to imagine why this would be the case.
Different local policies. I've had servers in many different datacenters at different times, and no two places seem to do things the same way. The same goes for engineering teams.
This is part of the reason a lot of large companies set up R&D shops that are geographically and culturally distinct from their main engineering teams. You don't want everyone thinking the same, because it invariably leads to stagnation. If everyone responsible for physical management of the clusters is in the same place, they're likely to end up thinking the same way (which is a good thing too, at least in terms of everyone getting along).
I'm probably wrong in this instance, though. There are enough people who can simply download the software and build their own custom clusters if they wanted, so this situation isn't really comparable to an in-house R&D team. In a sense, we ALREADY have many, many independent R&D shops, each with their own hardware.
(Sorry, I correct and/or contradict myself all the time, because I like brainstorming ideas more than I do editing them... which ties in with my whole anti-centralization philosophy -- I don't want to unilaterally censor any ideas that pop into my head, but would rather voice them even if I have my doubts about their viability.)
I'll cheerfully withdraw my suggestion that there's any significant benefit to keeping the servers separate, in this case. :)
-Bill Clark
wikitech-l@lists.wikimedia.org