Sorry for my complete ignorance, but because I am unsure if this is a bug on MediaWiki or simple the need to poke sysadmins to take a look at the s3 cluster more frequently, I am reporting this here.
A n00b at Portuguese Wikisource had decided to insert a brazilian salutation slang in the second most used template on that wiki for unknown reasons. A admin with poor English (me) reverted it. [1]. All have happened 9 days ago but since it the job queue is floating between ~1,080 / ~2,400. What is happening?
[1] - http://pt.wikisource.org/w/index.php?title=Predefini%C3%A7%C3%A3o:Hino&a...
A n00b at Portuguese Wikisource had decided to insert a brazilian salutation slang in the second most used template on that wiki for unknown reasons. A admin with poor English (me) reverted it. [1]. All have happened 9 days ago but since it the job queue is floating between ~1,080 / ~2,400. What is happening?
What was the queue like before? Those don't seem like particularly large values (not that I know how big pt.wikisource is, so it's a little hard to judge). If it's been going both up and down, then I doubt it's a result of an event 9 days ago. There have probably been other edits to templates.
What was the queue like before? Those don't seem like particularly large values (not that I know how big pt.wikisource is, so it's a little hard to judge). If it's been going both up and down, then I doubt it's a result of an event 9 days ago. There have probably been other edits to templates.
PS I've just looked at the recent changes and there haven't been any template edits since the one you mention. Do you transclude pages from other namespaces (userboxes in the user namespace, perhaps)?
On 7/31/07, Thomas Dalton thomas.dalton@gmail.com wrote:
What was the queue like before? Those don't seem like particularly large values (not that I know how big pt.wikisource is, so it's a little hard to judge). If it's been going both up and down, then I doubt it's a result of an event 9 days ago. There have probably been other edits to templates.
These things aren't uncommon. If you got to the [[Special:Statistics]] for English Wikisource (at s2 cluster) and refresh it a few times, some of these the job queue is only "5", some others "6,569" (for me, the HTML code shows srv107, srv121, srv149 for 6,569 value and srv83, srv40, srv94, srv140 for the 5 as value).
PS I've just looked at the recent changes and there haven't been any
template edits since the one you mention. Do you transclude pages from other namespaces (userboxes in the user namespace, perhaps)?
Yes, Portuguese Wikisource have page transclusion involving practically all namespaces. Some userboxes (the highest majority are babel-boxes) are available, but all of them at user namespace ([[Usuário:Box/<something>]]). Like others small Wikimedia communities, the social interaction is made outside of wikis (IRC, MSN, email, phone etc), and because it no significant edits can be found on user namespace.
Another type of transclusion is between main namespace and main namespace, to generate the "print versions" (based on the "print versions" at English Wikibooks). I can't found any edit that force to regenerate a large or medium "print version" since the edit mentioned on {{hino}} (and no "print versions" transcludes {{hino}}, this is a information box for single page works with no need for "print version")
Normally the job queue is from ~3 to ~10 due to {{NUMBEROFARTICLES}} magicwords (and this value is absorb at the current variant of job queues)
Luiz Augusto wrote:
Sorry for my complete ignorance, but because I am unsure if this is a bug on MediaWiki or simple the need to poke sysadmins to take a look at the s3 cluster more frequently, I am reporting this here.
A n00b at Portuguese Wikisource had decided to insert a brazilian salutation slang in the second most used template on that wiki for unknown reasons. A admin with poor English (me) reverted it. [1]. All have happened 9 days ago but since it the job queue is floating between ~1,080 / ~2,400. What is happening?
The job length on ptwikisource is 0 at the moment.
-- brion vibber (brion @ wikimedia.org)
On 8/1/07, Brion Vibber brion@wikimedia.org wrote:
Luiz Augusto wrote:
Sorry for my complete ignorance, but because I am unsure if this is a
bug on
MediaWiki or simple the need to poke sysadmins to take a look at the s3 cluster more frequently, I am reporting this here.
A n00b at Portuguese Wikisource had decided to insert a brazilian
salutation
slang in the second most used template on that wiki for unknown reasons.
A
admin with poor English (me) reverted it. [1]. All have happened 9 days
ago
but since it the job queue is floating between ~1,080 / ~2,400. What is happening?
The job length on ptwikisource is 0 at the moment.
weird... http://www.mediawiki.org/wiki/Image:Bug_on_job_queue_lenght_uselang%3Den.png
On 8/1/07, Luiz Augusto lugusto@gmail.com wrote:
On 8/1/07, Brion Vibber brion@wikimedia.org wrote:
Luiz Augusto wrote:
Sorry for my complete ignorance, but because I am unsure if this is a
bug on
MediaWiki or simple the need to poke sysadmins to take a look at the
s3
cluster more frequently, I am reporting this here.
A n00b at Portuguese Wikisource had decided to insert a brazilian
salutation
slang in the second most used template on that wiki for unknown
reasons.
A
admin with poor English (me) reverted it. [1]. All have happened 9
days
ago
but since it the job queue is floating between ~1,080 / ~2,400. What
is
happening?
The job length on ptwikisource is 0 at the moment.
weird...
http://www.mediawiki.org/wiki/Image:Bug_on_job_queue_lenght_uselang%3Den.png
It changes depending on which server sent you the page.
_______________________________________________
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
MinuteElectron.
Minute Electron wrote:
Luiz Augusto wrote:
Brion Vibber wrote:
The job length on ptwikisource is 0 at the moment.
weird...
http://www.mediawiki.org/wiki/Image:Bug_on_job_queue_lenght_uselang%3Den.png
It changes depending on which server sent you the page.
Yes, it changes... but between 1,080 and 2,427:
1,080 kluge srv100 srv101 srv102 srv103 srv104 srv105 srv106 srv107 srv108 srv109 srv11 srv110 srv111 srv112 srv113 srv114 srv115 srv116 srv117 srv118 srv119 srv12 srv120 srv121 srv122 srv123 srv124srv125 srv126 srv127 srv128 srv129 srv130 srv132 srv136 srv137 srv138 srv139 srv14 srv140 srv141 srv143 srv144 srv145 srv146 srv147 srv148 srv149 srv15 srv16 srv19 srv20 srv32 srv33 srv34 srv35srv36 srv37 srv38 srv39 srv4 srv40 srv41 srv43 srv44 srv45 srv46 srv47 srv48 srv49 srv50 srv52 srv53 srv55 srv60 srv61 srv62 srv63 srv64 srv65 srv67 srv68 srv69 srv70 srv71 srv72 srv73 srv74 srv75srv76 srv76 srv81 srv82 srv83 srv84 srv85 srv86 srv87 srv88 srv89 srv90 srv91 srv93 srv94 srv95 srv96 srv97 srv98 srv99
2,427 humboldt kluge srv100 srv101 srv102 srv103 srv104 srv105 srv106 srv107 srv108 srv109 srv109 srv11 srv110 srv111 srv112 srv113 srv114 srv115 srv116 srv117 srv118 srv119 srv12 srv120 srv121 srv122 srv123 srv124 srv125 srv126 srv127 srv128 srv129 srv13 srv130 srv132 srv136 srv137 srv138 srv139 srv14 srv140 srv141 srv143 srv144 srv145 srv146 srv147 srv148 srv149 srv15 srv16 srv18 srv19 srv2 srv20 srv32 srv33 srv34 srv35 srv36 srv37 srv38 srv39 srv4 srv40 srv41 srv43 srv44 srv45 srv46 srv47 srv48 srv49 srv50 srv52 srv53 srv55 srv60 srv61 srv62 srv63 srv64 srv65 srv67 srv68 srv69 srv71 srv72 srv73 srv74 srv75 srv76 srv82 srv83 srv84 srv85 srv86 srv87 srv88 srv89 srv90 srv91 srv93 srv94 2,427 srv95 srv96 srv97 srv98 srv99
And as the repeated valeus show, what is important is not the server, but the slave they connected to, which is not available to us.
Minute Electron wrote:
http://www.mediawiki.org/wiki/Image:Bug_on_job_queue_lenght_uselang%3Den.png
It changes depending on which server sent you the page.
No, it changes depending on which database server was used to make the *ESTIMATE* of the job queue length. Which is why we shouldn't be showing an estimate of job queue length anyway, since it's often HUGELY wrong and it freaks people out. :)
mysql> select count(*) from job; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.46 sec)
mysql> explain select * from job; +-------+------+---------------+------+---------+------+------+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+------+---------+------+------+-------+ | job | ALL | NULL | NULL | NULL | NULL | 1080 | | +-------+------+---------------+------+---------+------+------+-------+ 1 row in set (0.37 sec)
Kinda sucks.
-- brion vibber (brion @ wikimedia.org)
mysql> select count(*) from job; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.46 sec)
mysql> explain select * from job; +-------+------+---------------+------+---------+------+------+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+------+---------+------+------+-------+ | job | ALL | NULL | NULL | NULL | NULL | 1080 | | +-------+------+---------------+------+---------+------+------+-------+ 1 row in set (0.37 sec)
Kinda sucks.
Kinda? That's a 100% error! MySQL's count() function is worse than I thought... how much less efficient is the accurate method? Could we cache the accurate value every 10 minutes or so and display that?
On 02/08/07, Thomas Dalton thomas.dalton@gmail.com wrote:
Kinda? That's a 100% error! MySQL's count() function is worse than I thought... how much less efficient is the accurate method? Could we cache the accurate value every 10 minutes or so and display that?
The issue here is that COUNT(*) is an horrendous operation on a large InnoDB table, owing to the fact that the value isn't stored, and has to be recalculated on demand. The estimate comes from running a quick EXPLAIN SELECT and represents the number of rows the engine believes might be involved, offhand.
Caching the accurate value would be of no benefit.
Rob Church
On 8/1/07, Rob Church robchur@gmail.com wrote:
Caching the accurate value would be of no benefit.
In terms of performance and accuracy it would be, surely.
On 8/1/07, Rob Church robchur@gmail.com wrote:
I've long advocated removing this from the Special:Statistics page; it's not information which has relevance to the site, and users seem to use it as a metric for all sorts of things, when it's not.
That I agree with.
On 02/08/07, Simetrical Simetrical+wikilist@gmail.com wrote:
In terms of performance and accuracy it would be, surely.
If caching something makes it outdated to the point of being useless, then there is no benefit. Even Domas will admit this. ;)
Rob Church
On 8/1/07, Rob Church robchur@gmail.com wrote:
If caching something makes it outdated to the point of being useless, then there is no benefit. Even Domas will admit this. ;)
The suggestion was ten minutes. Surely the job queue doesn't change that much over ten minutes.
On 02/08/07, Simetrical Simetrical+wikilist@gmail.com wrote:
The suggestion was ten minutes. Surely the job queue doesn't change that much over ten minutes.
It doesn't? So users don't edit all that much, and our job queue runner sits there for massive idle periods on the large wikis?
I call bull shit.
Rob Church
On 8/1/07, Rob Church robchur@gmail.com wrote:
On 02/08/07, Simetrical Simetrical+wikilist@gmail.com wrote:
The suggestion was ten minutes. Surely the job queue doesn't change that much over ten minutes.
It doesn't? So users don't edit all that much, and our job queue runner sits there for massive idle periods on the large wikis?
I call bull shit.
I did say "change *that much*". Sufficiently ambiguous for me to weasel my way out of any evidence you can come up with. ;) But seriously, the only thing that matters is whether the job queue is persistently high, and if it is, that will be . . . well, persistent. If it spikes for two minutes and then drops back (how quickly do those job queue runners work anyway?) then who cares if that's not reflected in the stat? Average all the ten-minute samplings over the last day if you like.
On 02/08/07, Simetrical Simetrical+wikilist@gmail.com wrote:
I did say "change *that much*". Sufficiently ambiguous for me to weasel my way out of any evidence you can come up with. ;) But seriously, the only thing that matters is whether the job queue is persistently high, and if it is, that will be . . . well, persistent. If it spikes for two minutes and then drops back (how quickly do those job queue runners work anyway?) then who cares if that's not reflected in the stat? Average all the ten-minute samplings over the last day if you like.
Let's commission a report. Change the world, that will, job queue sizes.
Rob Church
Simetrical wrote:
I did say "change *that much*". Sufficiently ambiguous for me to weasel my way out of any evidence you can come up with. ;) But seriously, the only thing that matters is whether the job queue is persistently high, and if it is, that will be . . . well, persistent. If it spikes for two minutes and then drops back (how quickly do those job queue runners work anyway?) then who cares if that's not reflected in the stat? Average all the ten-minute samplings over the last day if you like.
Unless you specified that 96 seconds ago the job queue was X, 10 minute sampling will confuse even more. After changing a tempalte used on hundreds of pages, i'll go to see how much work i added to the job queue, if the data is older than 30-60 seconds i may need to go there, i'll have the dangerous think "it wasn't so much load".
Unless you specified that 96 seconds ago the job queue was X, 10 minute sampling will confuse even more. After changing a tempalte used on hundreds of pages, i'll go to see how much work i added to the job queue, if the data is older than 30-60 seconds i may need to go there, i'll have the dangerous think "it wasn't so much load".
That brings up another issue - how do people interpret the number? Perhaps it would be better to give an estimate of how long the queue will take to process (I think it's processed at one job per request, so jobs divided by request rate should do it). While it wouldn't be any more accurate, it would be more useful. (And I expect the length of time on most wikis most of the time will be a matter of a few seconds tops).
Rob Church wrote:
On 02/08/07, Thomas Dalton thomas.dalton@gmail.com wrote:
Kinda? That's a 100% error! MySQL's count() function is worse than I thought... how much less efficient is the accurate method? Could we cache the accurate value every 10 minutes or so and display that?
The issue here is that COUNT(*) is an horrendous operation on a large InnoDB table, owing to the fact that the value isn't stored, and has to be recalculated on demand. The estimate comes from running a quick EXPLAIN SELECT and represents the number of rows the engine believes might be involved, offhand.
Unfortunately that result is wildly inaccurate, but is reported as an exact integer down to the ones place. :)
It might not hurt to do some basic approximations here: do the EXPLAIN estimate, then if it's a smallish value, go ahead and grab an exact count.
If it's a very large value, then take the estimated count *and pass back the fact that it's an estimate* and display accordingly.
I've long advocated removing this from the Special:Statistics page; it's not information which has relevance to the site, and users seem to use it as a metric for all sorts of things, when it's not.
What would be more useful would be a historical graph showing the rise and fall of the queue size. That would help people to avoid freaking out when it's briefly large due to activity.
-- brion vibber (brion @ wikimedia.org)
On 02/08/07, Brion Vibber brion@wikimedia.org wrote:
It might not hurt to do some basic approximations here: do the EXPLAIN estimate, then if it's a smallish value, go ahead and grab an exact count.
Proof that great minds think alike; Tim Starling proposed the exact same thing less than two months ago when I was pondering "out loud" removing the count altogether.
What would be more useful would be a historical graph showing the rise and fall of the queue size. That would help people to avoid freaking out when it's briefly large due to activity.
I'm going to assume you mean this in the manner of a casual Wikimedia project, perhaps something for the Toolserver...?
Rob Church
"Brion Vibber" brion@wikimedia.org wrote in message news:46B1773E.6060608@wikimedia.org...
Rob Church wrote:
On 02/08/07, Thomas Dalton
thomas.dalton@gmail.com wrote:
Kinda? That's a 100% error! MySQL's count() function is worse than I thought... how much less efficient is the accurate method? Could we cache the accurate value every 10 minutes or so and display that?
The issue here is that COUNT(*) is an horrendous operation on a large InnoDB table, owing to the fact that the value isn't stored, and has to be recalculated on demand. The estimate comes from running a quick EXPLAIN SELECT and represents the number of rows the engine believes might be involved, offhand.
Unfortunately that result is wildly inaccurate, but is reported as an exact integer down to the ones place. :)
It might not hurt to do some basic approximations here: do the EXPLAIN estimate, then if it's a smallish value, go ahead and grab an exact count.
If it's a very large value, then take the estimated count *and pass back the fact that it's an estimate* and display accordingly.
I've long advocated removing this from the Special:Statistics page; it's not information which has relevance to the site, and users seem to use it as a metric for all sorts of things, when it's not.
What would be more useful would be a historical graph showing the rise and fall of the queue size. That would help people to avoid freaking out when it's briefly large due to activity.
Can't we use the site_stats table. Have a new column 'ss_job_queue_length' which is populated with the current correct value in the rebuildall script, and then update this value whenever jobs are added or processed? Or are the figures in this table approximations as well?
- Mark Clements (HappyDog)
On 02/08/07, Brion Vibber brion@wikimedia.org wrote:
No, it changes depending on which database server was used to make the *ESTIMATE* of the job queue length. Which is why we shouldn't be showing an estimate of job queue length anyway, since it's often HUGELY wrong and it freaks people out. :)
I've long advocated removing this from the Special:Statistics page; it's not information which has relevance to the site, and users seem to use it as a metric for all sorts of things, when it's not.
Rob Church
On 8/2/07, Brion Vibber brion@wikimedia.org wrote:
Minute Electron wrote:
http://www.mediawiki.org/wiki/Image:Bug_on_job_queue_lenght_uselang%3Den.png
It changes depending on which server sent you the page.
No, it changes depending on which database server was used to make the
I was being intentionally ambiguous as I wasn't sure whether it was due to caching (I could not be sure since for all I know Special:Statistics could be uncacheable like Special:Watchlist) or due to the database server that processed your request. In this case the later was correct, sorry for my useless contribution.
*ESTIMATE* of the job queue length. Which is why we shouldn't be showing
an estimate of job queue length anyway, since it's often HUGELY wrong and it freaks people out. :)
mysql> select count(*) from job; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.46 sec)
mysql> explain select * from job; +-------+------+---------------+------+---------+------+------+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+---------------+------+---------+------+------+-------+ | job | ALL | NULL | NULL | NULL | NULL | 1080 | | +-------+------+---------------+------+---------+------+------+-------+ 1 row in set (0.37 sec)
Kinda sucks.
-- brion vibber (brion @ wikimedia.org)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
MinuteElectron.
wikitech-l@lists.wikimedia.org