Yes, if for certain files we use a different pool counter type and key (that is bucketed) that could work; as long as there is no nesting. Using a short prefix of the file name hash could work for bucketing. Membership in the "expensive" group could be determined by the media handler, since only it really knows how efficient the rendering will be (you can't just use large vs small files for example). Some config variable could decide the prefix length.

Collisions do kind of suck since they serialize operations that are known to be slow needlessly, but that's a lessor problem that what we have to deal with now.


On Tue, May 13, 2014 at 4:08 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Tue, May 13, 2014 at 3:29 PM, Aaron Schulz <aschulz@wikimedia.org> wrote:
We already pool counter thumbnails on a per-file level (e.g. no more than 2 processes at a time for any thumbnails having to do with an original file at a time). Since pool counter calls cannot be nested, we can't add another layer of pool countering based on file type grouping anything of the sort.

For files which belong to a priority group, instead of using the file name as a key, couldn't we just hash the filename into one of the available slots for that group? That would still ensure only one process per file, but it would also limit the number of files processed from the whole group. The pool would be underused due to hash conflicts when the number of files waiting is not significantly larger than the pool itself, but that doesn't seem like a huge problem for a small pool.



--
-Aaron S