I do a lot of maintenance tasks on Commons, and many tasks require some sort of database
query to find the oddball cases. The queries can be done through one of several ways:
1) Using CatScan and CatScan2[2] tools
2) Database query service [3]
3) Weekly Database reports [4]
Unfortunately lately some of those ways are breaking down. CatScan and CatScan2 rarely
work failing in many different ways: usually due to exceeding the
'max_user_connections' (30 for magnus's CatScan2, and 15 for Daniel's
CatScan), but otherwise with some timeout or no-connection errors, or can work on a query
for hours (or days if you let it) and never returns anything. I developed some CatScan2
based queries for Creator template maintenance, that worked fine 2-3 years ago, but always
time-out since. That might be due more and more images on Commons. Similarly, Database
query service seems also very inactive. There are many requests and few replies, like my
request from April 2 [5].
For example, lately I was searching for images on Commons that do not have any license
templates (sometimes since 2007 or earlier), see [5]. At some point
Magnus was helping me with that query, however after it failed several times with
"server not found" error we gave up. It seems like less and less can be done
with current infrastructure.
So are there any non-toolserver based alternatives for database queries? I was trying to
read about Wikimedia Labs looking for tools based on them. Ideally there would be some
CatScan2 like tool that is based on different database, with higher number of users
allowed.
Jarek T.
User:jarekt [6]
[1]
http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php
[2]
http://toolserver.org/~magnus/catscan_rewrite.php
[3]
https://jira.toolserver.org/browse/DBQ
[4]
http://commons.wikimedia.org/wiki/Commons:Database_reports
[5]
https://jira.toolserver.org/browse/DBQ-201
[6]
http://commons.wikimedia.org/wiki/User:Jarekt