Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query

7 Sep 2014

      I don't know the MySQL syntax super well but that query implied that it
will always use the time index and then iterate in time order looking for a
link to the category you want. Unless that category is an appreciable
fraction of the total links I don't imagine that is a good plan.
On Sep 6, 2014 3:58 PM, "Brad Jorsch (Anomie)" bjorsch@wikimedia.org
wrote:
...
The database query for that is simple enough:
SELECT /* ApiQueryCategoryMembers::run Anomie */
cl_from,cl_sortkey,cl_type,page_namespace,page_title,cl_timestamp  FROM
`page`,`categorylinks`  FORCE INDEX (cl_timestamp) WHERE cl_to =
'Copy_to_Wikimedia_Commons_(bot-assessed)' AND (cl_from=page_id)  ORDER BY
cl_timestamp,cl_from LIMIT 501;
And the PHP code doesn't do anything complicated either. Maybe Sean can
give us more insight if there's some subtle database thing going on here.
Note, though, that you're not getting anything at all random here; you're
always getting the files that have been in the category longest first.
On Fri, Sep 5, 2014 at 11:52 PM, This, that and the other <
at.light@live.com.au> wrote:
...
A tool I have written, For the Common Good [1], uses the following type
of query to fetch a list of "random" files that users may like to transfer
to Commons. The category name may differ but the structure is the same:
https://en.wikipedia.org/w/api.php?format=xml&cmnamespace=6&cmtitle=
Category%3ACopy%20to%20Wikimedia%20Commons%20(bot-
assessed)&action=query&list=categorymembers&cmsort=
timestamp&cmprop=title&cmlimit=500
In 2011 when I was first writing FtCG, this query ran at an acceptable
speed. Recently, though, it has become extremely slow, to the point where
timeouts are now a regular occurrence. It sometimes takes 4 or 5 tries (and
several minutes) before results are returned. From then on, however, it
works quickly. If you run this exact query now, there's a good chance it
will work quickly because others have been running the query before you.
The cause seems to be the "cmsort=timestamp" portion of the request. If
this is removed, it works essentially instantaneously. However, I don't
really want the files in alphabetical order, as it doesn't seem very
"random".
Four questions:

Why does this query take so long?
Can anything be done on the server side to make it faster?
Why does it take so much longer now than it did in 2011?
Is there a better way to fetch a random cross-section of files in a

particular category?
TTO
[1] https://en.wikipedia.org/wiki/User:This,_that_and_the_other/
For_the_Common_Good

Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Mediawiki-api] Severe slowness with a particular English Wikipedia API query