I do a lot of maintenance tasks on Commons, and many tasks require some sort of database query to find the oddball cases. The queries can be done through one of several ways: 1) Using CatScan and CatScan2[2] tools 2) Database query service [3] 3) Weekly Database reports [4]
Unfortunately lately some of those ways are breaking down. CatScan and CatScan2 rarely work failing in many different ways: usually due to exceeding the 'max_user_connections' (30 for magnus's CatScan2, and 15 for Daniel's CatScan), but otherwise with some timeout or no-connection errors, or can work on a query for hours (or days if you let it) and never returns anything. I developed some CatScan2 based queries for Creator template maintenance, that worked fine 2-3 years ago, but always time-out since. That might be due more and more images on Commons. Similarly, Database query service seems also very inactive. There are many requests and few replies, like my request from April 2 [5].
For example, lately I was searching for images on Commons that do not have any license templates (sometimes since 2007 or earlier), see [5]. At some point Magnus was helping me with that query, however after it failed several times with "server not found" error we gave up. It seems like less and less can be done with current infrastructure.
So are there any non-toolserver based alternatives for database queries? I was trying to read about Wikimedia Labs looking for tools based on them. Ideally there would be some CatScan2 like tool that is based on different database, with higher number of users allowed.
Jarek T. User:jarekt [6]
[1] http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php [2] http://toolserver.org/~magnus/catscan_rewrite.php [3] https://jira.toolserver.org/browse/DBQ [4] http://commons.wikimedia.org/wiki/Commons:Database_reports [5] https://jira.toolserver.org/browse/DBQ-201 [6] http://commons.wikimedia.org/wiki/User:Jarekt
This is probably not what you want to hear, but one way would be to get a Toolserver account. That way, you wouldn't need the query service, you could run those queries by yourself.
Petr Onderka [[en:User:Svick]]
On Wed, May 22, 2013 at 10:03 PM, Tuszynski, Jaroslaw W. < JAROSLAW.W.TUSZYNSKI@saic.com> wrote:
I do a lot of maintenance tasks on Commons, and many tasks require some sort of database query to find the oddball cases. The queries can be done through one of several ways:
- Using CatScan and CatScan2[2] tools
- Database query service [3]
- Weekly Database reports [4]
Unfortunately lately some of those ways are breaking down. CatScan and CatScan2 rarely work failing in many different ways: usually due to exceeding the 'max_user_connections' (30 for magnus's CatScan2, and 15 for Daniel's CatScan), but otherwise with some timeout or no-connection errors, or can work on a query for hours (or days if you let it) and never returns anything. I developed some CatScan2 based queries for Creator template maintenance, that worked fine 2-3 years ago, but always time-out since. That might be due more and more images on Commons. Similarly, Database query service seems also very inactive. There are many requests and few replies, like my request from April 2 [5].
For example, lately I was searching for images on Commons that do not have any license templates (sometimes since 2007 or earlier), see [5]. At some point Magnus was helping me with that query, however after it failed several times with "server not found" error we gave up. It seems like less and less can be done with current infrastructure.
So are there any non-toolserver based alternatives for database queries? I was trying to read about Wikimedia Labs looking for tools based on them. Ideally there would be some CatScan2 like tool that is based on different database, with higher number of users allowed.
Jarek T. User:jarekt [6]
[1] http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php [2] http://toolserver.org/~magnus/catscan_rewrite.php [3] https://jira.toolserver.org/browse/DBQ [4] http://commons.wikimedia.org/wiki/Commons:Database_reports [5] https://jira.toolserver.org/browse/DBQ-201 [6] http://commons.wikimedia.org/wiki/User:Jarekt _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Feel free to drop me a mail off list, Ive got a TS account and will gladly lend a hand with the reports
On Wed, May 22, 2013 at 4:24 PM, Petr Onderka gsvick@gmail.com wrote:
This is probably not what you want to hear, but one way would be to get a Toolserver account. That way, you wouldn't need the query service, you could run those queries by yourself.
Petr Onderka [[en:User:Svick]]
On Wed, May 22, 2013 at 10:03 PM, Tuszynski, Jaroslaw W. < JAROSLAW.W.TUSZYNSKI@saic.com> wrote:
I do a lot of maintenance tasks on Commons, and many tasks require some sort of database query to find the oddball cases. The queries can be done through one of several ways:
- Using CatScan and CatScan2[2] tools
- Database query service [3]
- Weekly Database reports [4]
Unfortunately lately some of those ways are breaking down. CatScan and CatScan2 rarely work failing in many different ways: usually due to exceeding the 'max_user_connections' (30 for magnus's CatScan2, and 15
for
Daniel's CatScan), but otherwise with some timeout or no-connection
errors,
or can work on a query for hours (or days if you let it) and never
returns
anything. I developed some CatScan2 based queries for Creator template maintenance, that worked fine 2-3 years ago, but always time-out since. That might be due more and more images on Commons. Similarly, Database query service seems also very inactive. There are many requests and few replies, like my request from April 2 [5].
For example, lately I was searching for images on Commons that do not
have
any license templates (sometimes since 2007 or earlier), see [5]. At some point Magnus was helping me with that query, however after it failed several times with "server not found" error we gave up. It seems like less and
less
can be done with current infrastructure.
So are there any non-toolserver based alternatives for database queries?
I
was trying to read about Wikimedia Labs looking for tools based on them. Ideally there would be some CatScan2 like tool that is based on different database, with higher number of users allowed.
Jarek T. User:jarekt [6]
[1] http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php [2] http://toolserver.org/~magnus/catscan_rewrite.php [3] https://jira.toolserver.org/browse/DBQ [4] http://commons.wikimedia.org/wiki/Commons:Database_reports [5] https://jira.toolserver.org/browse/DBQ-201 [6] http://commons.wikimedia.org/wiki/User:Jarekt _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 05/22/2013 04:03 PM, Tuszynski, Jaroslaw W. wrote:
So are there any non-toolserver based alternatives for database queries?
Well, there are always the Tool Labs[1]. Database replica access is still experimental/in trial but it works.
-- Marc [1] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access
There's also stat1.wikimedia.org, but I'm not sure who all has access to that. I think just WMF staff and WMF-approved researchers, but I could be wrong.
Ryan Kaldari
On 5/22/13 2:28 PM, Marc A. Pelletier wrote:
On 05/22/2013 04:03 PM, Tuszynski, Jaroslaw W. wrote:
So are there any non-toolserver based alternatives for database queries?
Well, there are always the Tool Labs[1]. Database replica access is still experimental/in trial but it works.
-- Marc [1] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks for all the advice and for Tim.Landscheidt who run for me https://jira.toolserver.org/browse/DBQ-201, the query that was the catalyst for my email.
I wish there were clones or alternatives to toolserver based CatScan2 (and CatScan) that run on Tool Labs. Ideally they would be able to handle queries like mine.
Jarek T. User:jarekt
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Ryan Kaldari Sent: Wednesday, May 22, 2013 8:06 PM To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Querying the database
There's also stat1.wikimedia.org, but I'm not sure who all has access to that. I think just WMF staff and WMF-approved researchers, but I could be wrong.
Ryan Kaldari
On 5/22/13 2:28 PM, Marc A. Pelletier wrote:
On 05/22/2013 04:03 PM, Tuszynski, Jaroslaw W. wrote:
So are there any non-toolserver based alternatives for database queries?
Well, there are always the Tool Labs[1]. Database replica access is still experimental/in trial but it works.
-- Marc [1] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_ access
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
"Tuszynski, Jaroslaw W." JAROSLAW.W.TUSZYNSKI@saic.com wrote:
Thanks for all the advice and for Tim.Landscheidt who run for me https://jira.toolserver.org/browse/DBQ-201, the query that was the catalyst for my email.
I wish there were clones or alternatives to toolserver based CatScan2 (and CatScan) that run on Tool Labs. Ideally they would be able to handle queries like mine.
[...]
Replicated databases have only been (partially) activated on Tools this Tuesday, so there hasn't been a lot of time to port CatScan :-). Also its code isn't that easy to under- stand, so it might take a while for someone to tackle it.
Tim
wikitech-l@lists.wikimedia.org