On 08/30/2011 01:17 AM, Platonides wrote:
Ilmari Karonen wrote:
While tweaking my MIMEStatBot[1] to handle random
Query Killer attacks
better, I got to thinking...
Would it be practical to set up a local index on the toolserver for
(img_major_mime, img_minor_mime, img_media_type) on the image table for
commonswiki (and maybe others too)?
Such an index would hopefully cut the runtime of my weekly queries from
hours to seconds or less. It might also enable other tools to make
better use of the MIME type info stored in the database.
Of course, populating and maintaining such an index would consume some
time and storage space. But I wonder if it might not be worth the
tradeoff. What do the TS admins think?
[1]
http://commons.wikimedia.org/wiki/User:MIMEStatBot
Your bot doesn't need to do a full query on the mime fields. You could
just keep track of uploads and deletions, and update the counters
accordingly.
That's actually a pretty good idea... hmm.
I do see a few problems, though. For example, uploads and deletions are
not the only things that can change the MIME type of a file; changes to
the MIME type detection code in MediaWiki can also cause that, possibly
with some delay since it may take a long time for something to trigger a
metadata update. This is not just a theoretical scenario -- I've seen
it happen.
Also, I'm not sure what happens if a file is oversighted without going
through normal deletion first. And I've also observed several instances
where MediaWiki bugs or server errors have led to missing log entries.
--
Ilmari Karonen