Yes, if for certain files we use a different pool counter type and key
(that is bucketed) that could work; as long as there is no nesting. Using a
short prefix of the file name hash could work for bucketing. Membership in
the "expensive" group could be determined by the media handler, since only
it really knows how efficient the rendering will be (you can't just use
large vs small files for example). Some config variable could decide the
prefix length.
Collisions do kind of suck since they serialize operations that are known
to be slow needlessly, but that's a lessor problem that what we have to
deal with now.
On Tue, May 13, 2014 at 4:08 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:
On Tue, May 13, 2014 at 3:29 PM, Aaron Schulz
<aschulz(a)wikimedia.org>wrote;wrote:
We already pool counter thumbnails on a per-file
level (e.g. no more than
2 processes at a time for any thumbnails having to do with an original file
at a time). Since pool counter calls cannot be nested, we can't add another
layer of pool countering based on file type grouping anything of the sort.
For files which belong to a priority group, instead of using the file name
as a key, couldn't we just hash the filename into one of the available
slots for that group? That would still ensure only one process per file,
but it would also limit the number of files processed from the whole group.
The pool would be underused due to hash conflicts when the number of files
waiting is not significantly larger than the pool itself, but that doesn't
seem like a huge problem for a small pool.
--
-Aaron S