Hello colleagues and shareholders (community :)!
Has been a while since my last review of operations (aka hosting report) - so I will try to overview some of things we've been doing =) First of all, I'd like to thank mr.Moore for his fabulous law. It allowed Wikipedia to stay alive - even though we had to grow again in all directions.
We still have Septembers. Well, it is a nice name to describe the recurring pattern, which provides Shock and Awe to us - after a period of stable usage, every autumn number of users suddenly goes up and stays there - to allow us think we've finally reached some saturation and will never grow more. Until next September. We still have World Events. People rush to us to read about conflicts and tragedies, joys and celebrations. Sometimes because we had information for ages, sometimes because it all matured in seconds or minutes. Nowhere else document can require that much of concurrent collaboration, and nowhere else it can provide as much value immediately. We still have history. From day one of the project, we can see people going into dramas, discussing, evolving and revolving every idea on the site. Every edit stays there - accumulating not only final pieces of information, but the whole process of assembling the content. We still advance. Tools to facilitate the community get more complex, we start growing ecosystem of tools and processes inside and outside core software and platform. Users are the actual developers of the project, core technology just lags behind assisting.
Our operation becomes more and more demanding - and thats quite a bit of work to handle.
Ok, enough of such poetic introduction :)
== Growth ==
Over second half of 2006 traffic and reqeuests to our cluster doubled (actually, that happened just in few months) Over 2007 traffic and requests to our cluster doubled.
Pics: http://www.nedworks.org/~mark/reqstats/trafficstats-yearly.png http://www.nedworks.org/~mark/reqstats/reqstats-yearly.png
== Hardware expansion ==
Back in September 2006 we had quite huge load increase, and we went for capacity expansion, which included: * 20 new Squid servers ($66k) * 2 storage servers ($24k) * 60 application servers ($232k)
German foundation additionally assisted with purchasing 15 Squid servers in November for Amsterdam facility.
Later in January 2007 we added 6 more database servers (for $39k), three additional application servers for auxiliary tasks (such as mail), and some network and datacenter gear.
The growth over autumn/winter led us to quite big ($240k) capacity expansion back in March, which included: * 36 very capable 8-core application servers (thank you Moore yet again :) - that was around $120k * 20 Squid servers for Tampa facility * Router for Amsterdam facility * Additional networking gear (switches, linecards, etc) for Tampa
The only serious capacity increase afterwards was another 'German' (thanks yet again, Verein) batch of 15 Squid servers for Amsterdam in December 2007.
We do plan to improve on database and storage servers soon - that would add to stability of our dumps building and processing, as well as better support for various batch jobs.
We have been especially pushy about exploiting warranties on all servers, and nearly all machines ever purchased are in working state, doing one or another kind of workload. All the veterans of 2005 are still running at amazing speeds doing the important jobs :) Rob joining to help us with datacenter operations has allowed to have really nice turnarounds with pretty much every datacenter work - as volunteer remote hands became not available during critical moments anymore. Oh, and look how tidy cabling is: http://flickr.com/photos/ midom/2134991985/ !
== Networking ==
This has been mainly in capable Mark's and River's hands - where we underwent transition from hosting customer to internet service provider (or at least - equal peer to ISPs) ourselves. We have our independent autonomous systems both in Europe and US - allowing to pick best available connectivity options, resolve routing glitches, and get free traffic peering at internet exchanges. That provides quite lots of flexibility, of course, at the cost of more work and skills required.
This is also part of overall well-managed powerful datacenter strategy. Instead of low-efficiency small datacenters scattered around the world, core facility like one in Amsterdam provides high availability, close proximity to major Internet hubs and carriers, and is generally in center of region's inter-tubes. Though it would be possible to reach out into multiple donated hosting places, that would just lead to slower service for our users, and someone would still have to pay for the bandwidth. As we are pushing nearly 4 Gbps of traffic, there're not much donors who wouldn't feel such traffic.
== Software ==
There has been lots of overall engineering effort, that was often behind the scenes. Various bits had to be rewritten to act properly on user activity. The most prominent example of such work is Tim's rewrite of parser to more efficiently handle huge template hierarchies. In perfect case, users will not see any visible change, except multiple-factor faster performance at expensive operations. In past year, lots of activities - how people use customized software - bots, javascript extensions, etc - have changed performance profile, and nowadays lots of performance work at backend is to handle various fresh activities - and anomalies. One of core activities was polishing caching of our content, so we could have our application layer to concentrate on most important process - collaboration, instead of content delivery. Lots and lots of small things have been added or fixed - though some developments where quite demanding - like multimedia integration, which was challenging due to our freedom requirements. Still, there was constant tradeoff management, as not every feature was worth the performance sacrifice and costs, and on the other hand - having the best possible software for collaboration is also important :) Introducing new features, or migrating them from outside to the core platform has been always serious engineering effort. Besides, there would be quite a lot of communication - explaining how things have to be built for them not to collapse at live site, discussing security implications, change of usage patterns, ... Of course, MediaWiki is still one of most actively developed web software - and here Brion and Tim lead the volunteers, as well, as spend their days and nights in the code.
At the overall stack, we have worked at every layer - tuning kernels for our high-performance networking, experimenting with database software (some servers are running our own fork of MySQL, based on Google changes), perfecting Squid (Mark and Tim ended up in authors list) - our web caching software, digging into problems and specialties of PHP engine. Quite a lot of problems we hit are very huge-site-specific, and even if other huge shops hit them, we're the ones that are always free to release our changes and fixes. Still, colleagues from other shops are willing to assist us too :)
There were lots of tiny architecture tweaks - that allowed us to use resources more efficiently, but none of them are any major - pure engineering all the time. It seems, that lately we stabilized on lots of things in how Wikipedia works - and it seems to work quite fluently. Of course, one must mention Jens' keen eye, taking care of various especially important but easily overlooked things.
River has dedicated lots of attention to supporting the community tools infrastructure at the Toolserver - and also maintaining off- site copies of projects.
Site doesn't fall down the very same minute nobody is looking at it, and it is quite an improvement over the years :)
== Notes ==
People have been discussing if running a popular site is really a mission of WMF. Well, users created magnificent resource, we try to support it, we do what we can. Thanks to everyone involved - though it has been far less stressful ride than previous years, still, nice work. ;-)
== More reading ==
May hurt your eyes: https://wikitech.leuksman.com/view/Server_admin_log Platform description: http://dammit.lt/uc/workbook2007.pdf
== Disclaimer ==
Some numbers can be wrong, as this review was based not on audit, but on vague memories :)
On 04/01/2008, Domas Mituzas midom.lists@gmail.com wrote:
Hello colleagues and shareholders (community :)! Has been a while since my last review of operations (aka hosting report) - so I will try to overview some of things we've been doing =)
Submitted to Slashdot, please vote per your feelings on Slashdotting dammit.lt ;-p
http://slashdot.org/firehose.pl?op=view&id=450788
- d.
Hi,
I e-mailed this before to some individuals but since I good no answers.
No doubt Amsterdam have good peering, but the American ASN (14907) is listed only on TampaIX, which is a poorly connected.
I suggest you ask a donation or buy transport from Tampa to Miami (Nota) and/or Atlanta (56 Marietta St). Both places are much better connected. From my POV (South America) Miami would be the best.
Here (Brazil), data needs to go first to Washington/Nyc then Tampa, returning the same path. In some cases (Telefonica telco company) data goes to the Amsterdam datacenter instead. With a lot of people using already lagged dial-up connections that's not a good thing.
I suggest you ask a donation (network and Miami/Atlanta rackspace) to Sago Networks, they recently released their fiber backbone which should have plenty of free space: http://www.techlinks.net/CommunityAnnouncements/tabid/55/articleType/Article...
If you decide to go the buy route I estimate the cost of a 2.5gig wave to Miami or Atlanta would be around 5k monthly (plus cross-connect fees, taxes, rackspace, remote hands).
I think Cogent 1gig IP transport (to anywhere they have POP) is 10k mothly (plus same extra costs).
On Jan 3, 2008 11:07 PM, Domas Mituzas midom.lists@gmail.com wrote:
== Networking ==
This has been mainly in capable Mark's and River's hands - where we underwent transition from hosting customer to internet service provider (or at least - equal peer to ISPs) ourselves. We have our independent autonomous systems both in Europe and US - allowing to pick best available connectivity options, resolve routing glitches, and get free traffic peering at internet exchanges. That provides quite lots of flexibility, of course, at the cost of more work and skills required.
This is also part of overall well-managed powerful datacenter strategy. Instead of low-efficiency small datacenters scattered around the world, core facility like one in Amsterdam provides high availability, close proximity to major Internet hubs and carriers, and is generally in center of region's inter-tubes. Though it would be possible to reach out into multiple donated hosting places, that would just lead to slower service for our users, and someone would still have to pay for the bandwidth. As we are pushing nearly 4 Gbps of traffic, there're not much donors who wouldn't feel such traffic.
Fernando,
No doubt Amsterdam have good peering, but the American ASN (14907) is listed only on TampaIX, which is a poorly connected.
Thanks for your suggestions - they have always floated around, but there're many issues to resolve. First of all, peering traditions in US are completely different, than in Europe or anywhere else - especially if we're not a huge telco.
I suggest you ask a donation or buy transport from Tampa to Miami (Nota) and/or Atlanta (56 Marietta St). Both places are much better connected. From my POV (South America) Miami would be the best.
Getting to those locations doesn't mean immediately all South American providers will be peering (especially large telcos).
I suggest you ask a donation (network and Miami/Atlanta rackspace) to Sago Networks, they recently released their fiber backbone which should have plenty of free space: http://www.techlinks.net/CommunityAnnouncements/tabid/55/ articleType/ArticleView/articleId/181087/Sago-Networks-Completes- Private-Fiber-Network-from-Tampa-to-Atlanta-and-Miami.aspx
We've done really good research about all companies doing any networking activities in Tampa area. Sure, if there's a way for us to do more efficient stuff, we will do so eventually :-)
If you decide to go the buy route I estimate the cost of a 2.5gig wave to Miami or Atlanta would be around 5k monthly (plus cross-connect fees, taxes, rackspace, remote hands).
There're more issues involved - like our routing capacity, network engineering hours, etc. I'm not in any authority to speak about this, but really, if there're good things to do, we generally do them :)
BR,
I'm aware the peer agreements of US big companies (tier 1) are harsh, but wikipedia is not commercial and have a lot of respect.
Google and Akamai got good peering agreements in the past because people respected them. Wikipedia shouldn't put itself in the same place of commercial companies when thinking about peering.
At&t is said to be more open to peering than others Tier 1s, Cogent peers anyone and Globalcrossing is your friend. There's also education networks and Latin America/Canadian telcos. But any of those will peer only where they have backbone capacity. Near Tampa that place would be Atlanta for US companies and Miami for Latin America ones. Go to the next Nanog and let people meet you.
For brazilian companies at Miami you should have no problems peering 8167 and 7738. You already have an open channel with 12956 and a lot of people here use Globalcrossing. I don't have any idea about 4230 policy, but they're at Miami as well.
Don't know about the other latin american companies but they should also be open to peering because they usually buy their traffic.
About the other costs did you already already asked for donations or discounts from Sago, or any other company located at Nota or Marietta? If you did I don't want to use that argument anymore.
When I mentioned Sago rackspace that was about space for network gear at Miami and Atlanta. I didn't mean you should move from your servers current location.
Best regards, Fernando.
On Jan 6, 2008 4:40 PM, Domas Mituzas midom.lists@gmail.com wrote:
Fernando,
No doubt Amsterdam have good peering, but the American ASN (14907) is listed only on TampaIX, which is a poorly connected.
Thanks for your suggestions - they have always floated around, but there're many issues to resolve. First of all, peering traditions in US are completely different, than in Europe or anywhere else - especially if we're not a huge telco.
I suggest you ask a donation or buy transport from Tampa to Miami (Nota) and/or Atlanta (56 Marietta St). Both places are much better connected. From my POV (South America) Miami would be the best.
Getting to those locations doesn't mean immediately all South American providers will be peering (especially large telcos).
I suggest you ask a donation (network and Miami/Atlanta rackspace) to Sago Networks, they recently released their fiber backbone which should have plenty of free space: http://www.techlinks.net/CommunityAnnouncements/tabid/55/ articleType/ArticleView/articleId/181087/Sago-Networks-Completes- Private-Fiber-Network-from-Tampa-to-Atlanta-and-Miami.aspx
We've done really good research about all companies doing any networking activities in Tampa area. Sure, if there's a way for us to do more efficient stuff, we will do so eventually :-)
If you decide to go the buy route I estimate the cost of a 2.5gig wave to Miami or Atlanta would be around 5k monthly (plus cross-connect fees, taxes, rackspace, remote hands).
There're more issues involved - like our routing capacity, network engineering hours, etc. I'm not in any authority to speak about this, but really, if there're good things to do, we generally do them :)
On Jan 6, 2008 2:49 PM, Fernando Fagundes ffisnotfirefox@gmail.com wrote:
I'm aware the peer agreements of US big companies (tier 1) are harsh, but wikipedia is not commercial and have a lot of respect.
Google and Akamai got good peering agreements in the past because people respected them. Wikipedia shouldn't put itself in the same place of commercial companies when thinking about peering.
At&t is said to be more open to peering than others Tier 1s, Cogent peers anyone and Globalcrossing is your friend. There's also education networks and Latin America/Canadian telcos. But any of those will peer only where they have backbone capacity. Near Tampa that place would be Atlanta for US companies and Miami for Latin America ones. Go to the next Nanog and let people meet you.
For brazilian companies at Miami you should have no problems peering 8167 and 7738. You already have an open channel with 12956 and a lot of people here use Globalcrossing. I don't have any idea about 4230 policy, but they're at Miami as well.
Don't know about the other latin american companies but they should also be open to peering because they usually buy their traffic.
About the other costs did you already already asked for donations or discounts from Sago, or any other company located at Nota or Marietta? If you did I don't want to use that argument anymore.
When I mentioned Sago rackspace that was about space for network gear at Miami and Atlanta. I didn't mean you should move from your servers current location.
Best regards, Fernando.
On Jan 6, 2008 4:40 PM, Domas Mituzas midom.lists@gmail.com wrote:
Fernando,
No doubt Amsterdam have good peering, but the American ASN (14907) is listed only on TampaIX, which is a poorly connected.
Thanks for your suggestions - they have always floated around, but there're many issues to resolve. First of all, peering traditions in US are completely different, than in Europe or anywhere else - especially if we're not a huge telco.
I suggest you ask a donation or buy transport from Tampa to Miami (Nota) and/or Atlanta (56 Marietta St). Both places are much better connected. From my POV (South America) Miami would be the best.
Getting to those locations doesn't mean immediately all South American providers will be peering (especially large telcos).
I suggest you ask a donation (network and Miami/Atlanta rackspace) to Sago Networks, they recently released their fiber backbone which should have plenty of free space: http://www.techlinks.net/CommunityAnnouncements/tabid/55/ articleType/ArticleView/articleId/181087/Sago-Networks-Completes- Private-Fiber-Network-from-Tampa-to-Atlanta-and-Miami.aspx
We've done really good research about all companies doing any networking activities in Tampa area. Sure, if there's a way for us to do more efficient stuff, we will do so eventually :-)
If you decide to go the buy route I estimate the cost of a 2.5gig wave to Miami or Atlanta would be around 5k monthly (plus cross-connect fees, taxes, rackspace, remote hands).
There're more issues involved - like our routing capacity, network engineering hours, etc. I'm not in any authority to speak about this, but really, if there're good things to do, we generally do them :)
Is there serious interest in looking into peering?
I have multiple resources which could help with this - I've been on the NANOG mailing list since it was formed, and I know a bunch of peering and transit experts, some of whom own or founded ISPs.
On Jan 11, 2008 4:27 PM, George Herbert george.herbert@gmail.com wrote:
Is there serious interest in looking into peering?
I have multiple resources which could help with this - I've been on the NANOG mailing list since it was formed, and I know a bunch of peering and transit experts, some of whom own or founded ISPs.
Wikimedia already has over 100 peers. WMF networking strategy is well under control.
Wikimedia already has over 100 peers. WMF networking strategy is well under control.
awesome summary :)
wikitech-l@lists.wikimedia.org