Today^W Yesterday, I was asked about some file numbers, which involved subcategory traversing, which is an "inefficient" problem. It seemed a good problem for comparing toolserver and labs. And toolserver db sucks:
willow: 31m5.157s (user 0m4.038s) labs: 0m4.271s (user 2.488)
Toolserver was *436 times slower*.
Surely, the labs server is better (in hardware) than the one in TS. I don't know how many scripts were hitting the TS db, while the labs one would be almost-idle. Still, it seems a really big gap. Do we have something wrongly configured? Did mariadb somehow massively improve vs mysql? Are some parameters too small? Is it just a problem that the mysql servers are underprovisioned of ram?
(anonymous) wrote:
Today^W Yesterday, I was asked about some file numbers, which involved subcategory traversing, which is an "inefficient" problem. It seemed a good problem for comparing toolserver and labs. And toolserver db sucks:
willow: 31m5.157s (user 0m4.038s) labs: 0m4.271s (user 2.488)
Toolserver was *436 times slower*.
Surely, the labs server is better (in hardware) than the one in TS. I don't know how many scripts were hitting the TS db, while the labs one would be almost-idle. Still, it seems a really big gap. Do we have something wrongly configured? Did mariadb somehow massively improve vs mysql? Are some parameters too small? Is it just a problem that the mysql servers are underprovisioned of ram?
IIRC, the replicated databases on Labs are hosted on SSDs so it's not really fair to compare them :-). What would proba- bly be a better benchmark are user databases on Toolserver and tools-db on Labs; the latter (different credentials than replicated databases) is on a VM with storage on a (IIRC spinning) NFS server, but that would of course neglect that the Toolserver databases have to cope with replication as well, while tools-db only holds the user databases. So I don't think an adequate comparison can be made.
Tim
The problem starts at the servers abilities to categorize each discrepancy. If it cant it dumps without hesitating. Overloading the server is one thing it wont do to itself on both negative and positive sides. When server programs start having issues their discrepancies affect the servers log which affects the servers progress in its own job. If u have one server give it 2 input and output nodes ONLY otherwise all ur programs will back the server db up. On May 23, 2013 5:40 PM, "Platonides" platonides@gmail.com wrote:
Today^W Yesterday, I was asked about some file numbers, which involved subcategory traversing, which is an "inefficient" problem. It seemed a good problem for comparing toolserver and labs. And toolserver db sucks:
willow: 31m5.157s (user 0m4.038s) labs: 0m4.271s (user 2.488)
Toolserver was *436 times slower*.
Surely, the labs server is better (in hardware) than the one in TS. I don't know how many scripts were hitting the TS db, while the labs one would be almost-idle. Still, it seems a really big gap. Do we have something wrongly configured? Did mariadb somehow massively improve vs mysql? Are some parameters too small? Is it just a problem that the mysql servers are underprovisioned of ram?
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On Fri, 24 May 2013 00:42:22 +0200 Platonides platonides@gmail.com wrote:
Today^W Yesterday, I was asked about some file numbers, which involved subcategory traversing, which is an "inefficient" problem. It seemed a good problem for comparing toolserver and labs. And toolserver db sucks:
willow: 31m5.157s (user 0m4.038s) labs: 0m4.271s (user 2.488)
Toolserver was *436 times slower*.
Surely, the labs server is better (in hardware) than the one in TS. I don't know how many scripts were hitting the TS db, while the labs one would be almost-idle. Still, it seems a really big gap. Do we have something wrongly configured? Did mariadb somehow massively improve vs mysql? Are some parameters too small? Is it just a problem that the mysql servers are underprovisioned of ram?
Almost nobody is using the replicated Labs DB yet so it's not really a surprise. Wait half a year or so then try again. I expect the Labs DB to be faster still because of better hardware, but probably not *that* much faster.
BTW: If you're doing recursive traversal of categories, you may be interested in CatGraph: http://tools.wmflabs.org/render-tests/catgraph/ Ask me if you want to know more about it. This address or JohannesK_WMDE on freenode. :)
Yesterday many were playing with big SSD DB queries. :) https://gist.github.com/brion/5652302 https://twitter.com/mdammers/status/338652420362092544 Thanks, Platonides, for this post; it's the kind of stuff we need, reasons and examples of why [some] people may *want* to use [also] Labs, resulting in an orderly increase of willing users without traumas.
Nemo
"Federico Leva (Nemo)" nemowiki@gmail.com wrote:
Yesterday many were playing with big SSD DB queries. :) https://gist.github.com/brion/5652302 https://twitter.com/mdammers/status/338652420362092544 [...]
Doesn't show at http://ganglia.wmflabs.org/latest/?r=hour&cs=&ce=&m=load_one&... (click for week graphs).
Tim
toolserver-l@lists.wikimedia.org