According to http://noc.wikimedia.org/conf/CommonSettings.php.html , all Wikinews, all Wikiquote, all Wikibooks, English & German Wiktionary and Serbian Wikipedia have the extension DynamicPageLists enabled.
In Bugzilla there are 16 specific projects with open requsts to use this extension. Probably more would request it if they knew about it.
What is the difference between WN, WQ, WB, en.WIKT, de.WIKT, sr.WP, and the projects below?
Which metric can be used to judge if a project can or can't use DPLs?
At the moment it seems whether or not a project has this enabled is arbitrary, and it's quite frustrating if users have no idea if this functionality will ever be enabled.
The bugs and some relevant comments are listed below.
thanks, Brianna user:pfctdayelise
Bug 3169: Enable DynamicPageList for Wikibooks [presumably en] Bug 3533: Install the DynamicPageList2 extension on Latvian Wikipedia Bug 4468: Install DynamicPageList2 on Japanese Wikinews Bug 4847: Enable DynamicPageList on nlwiki Bug 6163: Install DynamicPageList on Dutch Wikipedia Bug 6758: Activation of DynamicPageList2 on de-Wikisource Bug 7952: Install DynamicPageList extension on Icelandic and Spanish Wiktionaries Bug 8240: Set up DynamicPageList on the Bosnian Wikipedia (bs) Bug 8261: Install DynamicPageList for Commons Bug 8563: Activate DynamicPageList2 or DynamicPageList on en-Wikisource Bug 8672: Install DynamicPageList2 on all Wikinews projects (requests specifically from FR, IT, PL, Bosnian) closed WONTFIX Bug 8886: Install DynamicPageList extension for Vietnamese Wiktionary
Kellen (2005-09-22): I was informed in #wikimedia-tech that DPLs are inefficient and experimental and therefore would not be enabled on WB as the site is too large. =( Resolving as WONTFIX.
Brion (2006-07-05) in reply to Kellen: It should be ok on Wikibooks; we can always turn it off later.
Rob (2006-12-18): We don't normally enable it all over the place since it's quite expensive to run and not necessarily appropriate for an encyclopaedia.
Brion (2006-12-18): We probably will never install DPL2. DynamicPageList itself is kind of unreliable. Most likely a replacement for both which is more targeted and works more cleanly would be better.
Brion (2007-04-24): DPL2 will not be installed anywhere. If specific things are required, small, clean, purpose-built extensions which have known performance characteristics are preferred.
On 05/05/07, Brianna Laugher brianna.laugher@gmail.com wrote:
At the moment it seems whether or not a project has this enabled is arbitrary, and it's quite frustrating if users have no idea if this functionality will ever be enabled.
"DPL2 will not be installed anywhere. If specific things are required, small, clean, purpose-built extensions which have known performance characteristics are preferred."
Please provide examples of uses of the Dynamic Page List extensions which would be needed so we have an idea of whether or not we can write said purpose-built extensions to fit those needs.
Rob Church
On 05/05/07, Rob Church robchur@gmail.com wrote:
Please provide examples of uses of the Dynamic Page List extensions which would be needed so we have an idea of whether or not we can write said purpose-built extensions to fit those needs.
The main use is much greater flexibility in working with categories. I believe Wikinews uses it on portals to show the newest stories added to a category that corresponds to that portal topic. Almost all projects could use DPLs like this, to display lists of new pages on particular topics. That saves a lot of fussing that is currently done manually. The other main use relates to maintenance.
DPLs also allow category combining (AND, OR, NOT). I believe some category intersection tool is in the works but I don't know how far that's progressed. (IMO having this functionality in a Special page would suffice - would that be preferred to DPL's current implementation?)
For my opinion, the great strength in DPLs is flexibility in order of display, e.g.: * name * date of last change * popularity * user who changed them last * size * restrict the output to the first n articles or to a random sample * use descending or ascending sequence
Whereas categories display first 200 items ordered by name only [give or take a category sort key].
The sort keys I care about most, and that would be most useful for category maintenance, are: * date of creation * date added to a particular category probably also * size (for media).
Ideally I would like to be able to change how category pages are sorted - e.g. have a drop down box "Sort by name" "Sort by date of creation" "Sort by name of last editor" "Sort by size" etc etc.
I understand that DPL2 won't be installed anywhere. But the original DPL seems to have been sufficiently improved since those requests were made, so I think the bug requests are just about wanting DPL functionality rather than especially DPL2.
Until that extra functionality is written and accepted within MW core, couldn't DPL be turned on for the projects that request it?
regards, Brianna
On 05/05/07, Brianna Laugher brianna.laugher@gmail.com wrote:
The main use is much greater flexibility in working with categories. I believe Wikinews uses it on portals to show the newest stories added to a category that corresponds to that portal topic. Almost all projects could use DPLs like this, to display lists of new pages on particular topics. That saves a lot of fussing that is currently done manually. The other main use relates to maintenance.
So, we want something that does new pages in particular categories? Not the easiest of things to get right, perhaps, since categories aren't versioned as pages are, but I expect we might come up with something decent enough.
DPLs also allow category combining (AND, OR, NOT). I believe some category intersection tool is in the works but I don't know how far that's progressed. (IMO having this functionality in a Special page would suffice - would that be preferred to DPL's current implementation?)
The classic intersections use UNION queries which are too expensive; this is part of the reason we don't have a straightforward intersecting search, and hence are having to investigate other means.
Until that extra functionality is written and accepted within MW core, couldn't DPL be turned on for the projects that request it?
You'll need to keep petitioning a server administrator until someone caves in. If the extension is found to be causing unacceptable load on the database servers, I can guarantee it will be killed and deactivated.
Rob Church
On 05/05/07, Rob Church robchur@gmail.com wrote:
Until that extra functionality is written and accepted within MW core, couldn't DPL be turned on for the projects that request it?
You'll need to keep petitioning a server administrator until someone caves in.
IRC begging...my favourite. This makes me uncomfortable because * it's not a formal channel * I feel like I'm hassling them and interrupting probably more important work - no way to tell if the time is inappropriate or not * I'm usually hassling Brion, Tim or Rob, because they are the names I know, but maybe there are other people I could hassle? I can never figure this out - there is little up to date information about this * It feels like the success rests entirely on if the developer in question, in the moment that I ask, judges me to be too annoying or not.
I just started this page: http://meta.wikimedia.org/wiki/Site_requests . Is it accurate for how such requests are currently fulfilled? Is there any thought about defining a more formal process?
I kind of wonder how users get on that (a) don't know as much about the technical background (e.g. IRC), and especially (b) don't speak English as a first language.
Meta also described a (now defunct) position of Developer Liaison, being "first point of contact between the developers and the Board". I wonder if anyone else thinks it would be worth having a "Community Developer Liaison", who was "first point of contact between the developers and the Wikimedia project communities"?
If the extension is found to be causing unacceptable load on
the database servers, I can guarantee it will be killed and deactivated.
Yeah...I accept that.
thanks, Brianna
Rob Church wrote:
The classic intersections use UNION queries which are too expensive; this is part of the reason we don't have a straightforward intersecting search, and hence are having to investigate other means.
It's not a union actually, it's a straight join. An OR operation is a union, both in set theory and in SQL.
To get an idea of the potential performance problems, I made a list of the categories with the most members on commons.
563713 GFDL 405084 Self-published_work 218472 PD-self 205838 PD_Old 139486 CC-BY-SA-2.5,2.0,1.0 125657 CC-BY-SA-2.5 117172 CC-BY-2.5 38695 PD-user 29312 CC-BY-2.0 22821 Insignia 21902 CC-BY-SA-2.0 20906 User-created_GFDL_images
How long does it take to calculate the intersection between the two biggest categories? On lomaria:
mysql> select count(*) from categorylinks as c1, categorylinks as c2 where c1.cl_to='GFDL' and c2.cl_to='Self-published_work' and c1.cl_from=c2.cl_from; +----------+ | count(*) | +----------+ | 317870 | +----------+ 1 row in set (1.94 sec)
And again immediately afterwards, with a comment to override the query cache, but testing the index cache:
mysql> select /**/ count(*) from categorylinks as c1, categorylinks as c2 where c1.cl_to='GFDL' and c2.cl_to='Self-published_work' and c1.cl_from=c2.cl_from; +----------+ | count(*) | +----------+ | 317870 | +----------+ 1 row in set (1.91 sec)
Which is really not that bad, considering it is the worst case. The index size for commonswiki.categorylinks is 670MB, which our DB servers shouldn't have too much trouble with at present. Indeed, the query would be a lot slower if it were hitting the disk. Maybe it is practical, as long as we put it in a query group so that it doesn't crash the whole s2 cluster if someone decides to put such an intersection on the commons main page.
Intersecting GFDL with various other categories down the list, here is the timing after priming the cache:
PD-self: 1.17s CC-BY-SA-2.5,2.0,1.0: 0.75s PD-user: 0.22s Insignia: 0.15s User-created_GFDL_images: 0.10s
The figures show that the timing depends on the size of each category not on the size of the intersection, which is what you'd expect. The intersection for PD-self had 1442 rows, and the intersection for "CC-BY-SA-2.5,2.0,1.0" had 135107 rows, but the timing was similar.
I use cache-primed figures (i.e. the second attempt), because this better reflects the worst case scenario of high request rates. I'm assuming that the whole index would be held in memory under such circumstances.
A special page would be better, it would reduce the number of unnecessary requests. And there may be other performance problems with DPL that I'm not aware of. But on the basis of these results, it looks feasible.
-- Tim Starling
Tim Starling <tstarling@...> writes (edited for brevity):
It's not a union actually, it's a straight join. An OR operation is a union, both in set theory and in SQL.
To get an idea of the potential performance problems, I made a list of the categories with the most members on commons.
How long does it take to calculate the intersection between the two biggest categories? On lomaria: 1 row in set (1.94 sec) 1 row in set (1.91 sec)
Which is really not that bad, considering it is the worst case. The figures show that the timing depends on the size of each category not on the size of the intersection, which is what you'd expect.
I use cache-primed figures (i.e. the second attempt), because this better reflects the worst case scenario of high request rates. I'm assuming that the whole index would be held in memory under such circumstances.
A special page would be better, it would reduce the number of unnecessary requests. And there may be other performance problems with DPL that I'm not aware of. But on the basis of these results, it looks feasible.
Hi Tim, I did a lot of the same tests, using a both the join sql, and also a count/group by version (they performed comparably) on a copy of categorylinks from En. The worst case scenarios for those was not terribly attractive (I forgot what it was, but I want to say something like 4 or 5 seconds for intersections with "Living People"). Then I had this idea that using MySQL's fulltext index should take a layer of complexity out - after all, what we're doing is *exactly* a boolean search on a phrase, so I made a table having the all categories (with underscores) for each article in a row, and indexed it (so it had to be a MyIsam table). This performed much better, typically returned worst case scenario results in just over a second.
I've got some data at http://aerik.com/wikintslog.txt (query/rows/time/IP - note that my server didn't seem to allow me to suppress the cache - couldn't figure that out - got any ideas?). But it seems like this is likely not to be good enough performance to run on En, for example, sending lots of queries that might run a second or longer is probably a bad idea.
So, for comparison, I was going to try to create a lucene index (Brion's suggestion) of categorylinks. I don't have a java setup and don't know java, so I started using Zend_Search_Lucene, but ran into a bug that has stopped me cold for the time being. I've been in touch with Alexander Veremyev at Zend, but I dont' know if there's any progress.
But I was thinking, Tim, aren't you the guy who set up the Java Lucene index? What do you think of this approach?
Best Regards, Aerik
On 05/05/07, Brianna Laugher brianna.laugher@gmail.com wrote:
The main use is much greater flexibility in working with categories. I DPLs also allow category combining (AND, OR, NOT). I believe some category intersection tool is in the works but I don't know how far that's progressed. Ideally I would like to be able to change how category pages are sorted - e.g. have a drop down box "Sort by name" "Sort by date of creation" "Sort by name of last editor" "Sort by size" etc etc.
The basic reasons Commons, for example, needs this are:
1. The category system just isn't useful for Commons readers (and it's not all that hot for editors); 2. Commons search is still pretty much useless. Mayflower is a bit better but can't be said to have cracked the problem.
When added to the lack of visible interwiki image use information, the basic basic reasons are that for what Commons is trying to do - both as service project media repository for the outside world - the present software is painfully lacking. So people shout at the Commons admins instead, when if they could code a solution they would.
- d.
wikitech-l@lists.wikimedia.org