I found out about this list a few days ago, and I've read back through some of the archives. I have a few comments.
Why are the list archives private? Why isn't it listed on the mail.wikipedia.org index page? Why isn't it in gmane? Why have I never heard of it before? Gregory Maxwell has been whinging that nobody is listening to him on a list that nobody can read.
Kate can be a bit secretive at times, and this was at least at one time her pet project, but maybe now that she seems to have abandoned it, then it's time to change the structure.
Neither the e.V. nor Kate made any particular attempt to involve the other Wikimedia system administrators in this project from its conception. I was certainly sceptical about zedler's value as a tool server compared to the use we could have made of it as part of the core cluster. I've now heard about one project that I'm interested in, and I have an open mind about the rest, but you still have to make the case. Specifically: how does your project benefit Wikipedia? Why should I support it?
Daniel Kinzler wrote:
Yesterday, Kate told me that the problem with replication from the Asian cluster is that mysql can only connect to one replication master. I have googeled a bit, and it appears that that is not true (at least for MySQL 5.1): http://dev.mysql.com/doc/refman/5.1/en/replication-intro.html says:
Multiple-master replication is possible, but raises issues not present in single-master replication. See Section 6.15, “Auto-Increment in Multiple-Master Replication”.
Multiple-master replication in this context could more aptly be called circular replication. This is where you have say 3 servers, A replicating B, B replicating C, C replicating A. Then you can write to any of the three servers, and the writes will be propagated to the other 2 servers. This is quite useless for the toolserver, where we have 5 masters which will never replicate from each other in a circle.
It should be possible to set up 5 MySQL instances and have each of them replicating from a different master. Is anyone volunteering to set up those instances? Maybe we need to give root access to someone who actually cares about this stuff.
It would be easier if we had a VLAN, so that we didn't have to set up 5 ssh tunnels. Does anyone know anything about VLANs? Does anyone care enough about this project to research it?
Regarding Daniel's WikiProxy: I have reviewed the code, and I have the following comments:
* use curl, not file_get_contents() * With curl you can set a short timeout, with file_get_contents() it will be 3 minutes. Set a timeout of a few seconds, and then use exponential backoff. Requests get lost sometimes, retries help. * Tell curl to proxy the request via rr.pmtpa.wikimedia.org:80. This will skip the knams squid cluster and save a few milliseconds
For applications using it: if it's too slow, use a few parallel threads. Anything up to about 5 requests per second should be OK.
Who here needs more than 5 requests per second? Who needs a latency of less than a few hundred milliseconds? What exactly do you want full text replication for?
-- Tim Starling