On 05/01/2014 06:36 AM, Brad Jorsch (Anomie) wrote:
On Wed, Apr 30, 2014 at 3:44 PM, Dan Garry <dgarry@wikimedia.org mailto:dgarry@wikimedia.org> wrote:
The meeting will be on Tuesday 6th May at 4pm.
I'm assuming that's SF time? ;)
Yeah. I won't be able to attend then as I'll be in Europe.
Regarding the job queue, another item of note is that there's a general perception on enwiki that the job queue is either unreliable or too slow to the point of uselessness when it comes to updating the links tables (e.g. categorylinks) when time-related parser functions are used (e.g. using #if to test if the current date is past a threshold). So people run bots to do forcelinkupdate purges.
There are also sometimes complaints that links tables aren't being updated in a timely manner after edits to templates, as in things aren't updated weeks later and people reply that the job queue is just really slow. Then usually someone does null edits on the pages transcluding the template.
I don't know if either of those issues are related or otherwise in-scope, but I mention them Just In Case.
I think both are not directly the job queue's fault. There is some logic to skip large template updates (more than 200k uses IIRC) which might lead to the impression that the job queue is unreliable. This logic is only there as processing such large template updates (up to 8 million uses on enwiki) is very slow.
The reason for this is that template updates currently trigger a re-render of the entire page. With information about which templates were involved in the expansion of a given transclusion fragment, re-rendering those fragments & updating the relevant link tables would be much more efficient and thus faster.
Re the missing time-based invalidation: Fragment TTLs are currently not tracked in MediaWiki. The only option for dynamic content is to disable (or drastically shorten) caching for the entire page. Afaik we don't do this for performance reasons, so time-based templates will go stale & need to be manually purged. We have plans to track TTLs per transclusion and extension in Parsoid, but have not implemented this yet. This will let us re-render only the timed-out transclusion/extension, which is more efficient than re-rendering the entire page from scratch.
So I'd propose to focus on tweaking the queue length monitoring for now. The main job processor issues are already on the roadmap of other teams (Parsoid for example), and replacing the job queue itself with something more reliable & scalable (Kafka?) is probably not the highest priority at this point.
Gabriel