Hi all,
The MediaWiki Core team is having the first of its fortnightly scoping meetings. The intent of these meetings is to take a piece of work that the MediaWiki Core team is considering undertaking and discussing it.
By the end of the meeting, the project should have: * A description of the project and how it affects its end users * A list of well defined stakeholders * A proposed solution (roughly speaking; this will not be binding) * An assessment of the difficulty of the project * An assessment of the priority of the project
Some of these (e.g. difficulty assessment) are relative to other projects, and only start to make sense once you've got a few projects on the list. That's okay, we'll get to that stage eventually.
In the first meeting, we will be discussing the recent problems with the job queue length. The length of the job queue has been increasing, and this has a lot of effects on the end users of our wikis. What can we do about it? Let's discuss.
The meeting will be on Tuesday 6th May at 4pm. I'll invite a few relevant parties, but ultimately the meeting is open to everyone so if you want to come then please do! If you're remote, ping me and I can invite you to the hangout.
Thanks, Dan
On Wed, Apr 30, 2014 at 3:44 PM, Dan Garry dgarry@wikimedia.org wrote:
The meeting will be on Tuesday 6th May at 4pm.
I'm assuming that's SF time? ;)
Regarding the job queue, another item of note is that there's a general perception on enwiki that the job queue is either unreliable or too slow to the point of uselessness when it comes to updating the links tables (e.g. categorylinks) when time-related parser functions are used (e.g. using #if to test if the current date is past a threshold). So people run bots to do forcelinkupdate purges.
There are also sometimes complaints that links tables aren't being updated in a timely manner after edits to templates, as in things aren't updated weeks later and people reply that the job queue is just really slow. Then usually someone does null edits on the pages transcluding the template.
I don't know if either of those issues are related or otherwise in-scope, but I mention them Just In Case.
On 05/01/2014 06:36 AM, Brad Jorsch (Anomie) wrote:
On Wed, Apr 30, 2014 at 3:44 PM, Dan Garry <dgarry@wikimedia.org mailto:dgarry@wikimedia.org> wrote:
The meeting will be on Tuesday 6th May at 4pm.
I'm assuming that's SF time? ;)
Yeah. I won't be able to attend then as I'll be in Europe.
Regarding the job queue, another item of note is that there's a general perception on enwiki that the job queue is either unreliable or too slow to the point of uselessness when it comes to updating the links tables (e.g. categorylinks) when time-related parser functions are used (e.g. using #if to test if the current date is past a threshold). So people run bots to do forcelinkupdate purges.
There are also sometimes complaints that links tables aren't being updated in a timely manner after edits to templates, as in things aren't updated weeks later and people reply that the job queue is just really slow. Then usually someone does null edits on the pages transcluding the template.
I don't know if either of those issues are related or otherwise in-scope, but I mention them Just In Case.
I think both are not directly the job queue's fault. There is some logic to skip large template updates (more than 200k uses IIRC) which might lead to the impression that the job queue is unreliable. This logic is only there as processing such large template updates (up to 8 million uses on enwiki) is very slow.
The reason for this is that template updates currently trigger a re-render of the entire page. With information about which templates were involved in the expansion of a given transclusion fragment, re-rendering those fragments & updating the relevant link tables would be much more efficient and thus faster.
Re the missing time-based invalidation: Fragment TTLs are currently not tracked in MediaWiki. The only option for dynamic content is to disable (or drastically shorten) caching for the entire page. Afaik we don't do this for performance reasons, so time-based templates will go stale & need to be manually purged. We have plans to track TTLs per transclusion and extension in Parsoid, but have not implemented this yet. This will let us re-render only the timed-out transclusion/extension, which is more efficient than re-rendering the entire page from scratch.
So I'd propose to focus on tweaking the queue length monitoring for now. The main job processor issues are already on the roadmap of other teams (Parsoid for example), and replacing the job queue itself with something more reliable & scalable (Kafka?) is probably not the highest priority at this point.
Gabriel
mediawiki-core@lists.wikimedia.org