(cross-posting wikitech-l and commons-l)
I created a new extension ("CategoryIntersection") that allows for quick lookup of pages (and image) in intersecting categories. That would enable wiki(m|p)edia sites to use categories as tags, eliminating the need for oh-so-specialized categories.
Intersection of two categories works very fast, but intersecting more categories is possible, and already implemented; the maximum number can be limited.
I tried it on my (mostly empty) MediaWiki test setup, and it works peachy. However, *I NEED HELP* with * testing it on a large-scale installation * integrating it with MediaWiki more tightly (database wrappers, caching, etc.) * Brionizing the code, so it actually has a chance to be used on Wikipedia and/or Commons
Techinical notes: * This was recently discussed on wikitech-l * More than two intersections are implemented by nesting subqueries * Hash values are implemented as VARCHAR(32). Could easily switch to INTEGER if desirable (less storage, faster lookup, but more false positives) * The hash values will only give good candidates (pages that *might* intersect in these categories). The candidates have then to be checked in a second run, which will have to be optimized; database people to the front! * Table to store hash values has to be created manually; SQL is in the main file * I didn't implement code to fill the table for an existing installation; however, since hash table updates solely hang on the LinksUpdate hook, this should be easy * There is no code covering page moves and deletions yet; do those hang on LinksUpdate as well? * SQL queries are currently "plain text" and not constructed through the DB wrappers; I wan't sure how to do that for the subqueries
Cheers, Magnus
On Thu, Mar 6, 2008 at 10:16 PM, Magnus Manske magnusmanske@googlemail.com wrote:
- There is no code covering page moves and deletions yet; do those
hang on LinksUpdate as well?
No, since those do not update the links. Categorylinks are linked to the page including them via page_id, which does not change during move. If I recall correctly also deletions are not covered by LinksUpdate; their links are manually deleted by Artice::doArticleDelete.
Bryan
On Thu, Mar 6, 2008 at 9:43 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Thu, Mar 6, 2008 at 10:16 PM, Magnus Manske magnusmanske@googlemail.com wrote:
- There is no code covering page moves and deletions yet; do those
hang on LinksUpdate as well?
No, since those do not update the links. Categorylinks are linked to the page including them via page_id, which does not change during move. If I recall correctly also deletions are not covered by LinksUpdate; their links are manually deleted by Artice::doArticleDelete.
OK, one more hook to catch...
Thanks, Magnus
For what it's worth, the extension http://www.mediawiki.org/wiki/DynamicPageList has been in use on various Wikimedia sites for a while now with great success to allow for category intersections, and I think the latest versions support image galleries etc.
-ilya
On Thu, Mar 6, 2008 at 1:16 PM, Magnus Manske magnusmanske@googlemail.com wrote:
(cross-posting wikitech-l and commons-l)
I created a new extension ("CategoryIntersection") that allows for quick lookup of pages (and image) in intersecting categories. That would enable wiki(m|p)edia sites to use categories as tags, eliminating the need for oh-so-specialized categories.
Intersection of two categories works very fast, but intersecting more categories is possible, and already implemented; the maximum number can be limited.
I tried it on my (mostly empty) MediaWiki test setup, and it works peachy. However, *I NEED HELP* with
- testing it on a large-scale installation
- integrating it with MediaWiki more tightly (database wrappers, caching, etc.)
- Brionizing the code, so it actually has a chance to be used on
Wikipedia and/or Commons
Techinical notes:
- This was recently discussed on wikitech-l
- More than two intersections are implemented by nesting subqueries
- Hash values are implemented as VARCHAR(32). Could easily switch to
INTEGER if desirable (less storage, faster lookup, but more false positives)
- The hash values will only give good candidates (pages that *might*
intersect in these categories). The candidates have then to be checked in a second run, which will have to be optimized; database people to the front!
- Table to store hash values has to be created manually; SQL is in the main file
- I didn't implement code to fill the table for an existing
installation; however, since hash table updates solely hang on the LinksUpdate hook, this should be easy
- There is no code covering page moves and deletions yet; do those
hang on LinksUpdate as well?
- SQL queries are currently "plain text" and not constructed through
the DB wrappers; I wan't sure how to do that for the subqueries
Cheers, Magnus
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On 07/03/2008, Ilya Haykinson haykinson@gmail.com wrote:
For what it's worth, the extension http://www.mediawiki.org/wiki/DynamicPageList has been in use on various Wikimedia sites for a while now with great success to allow for category intersections, and I think the latest versions support image galleries etc.
Cool - how would that do category intersections, in the manner that's been discussed?
- d.
You can have a DPL that includes:
- Category:Africa and Category:Musicians to show musicians from Africa -- we can produce a list of images, or a gallery - Category:Ariel Sharon and Category:Political Cartoons but not Category:Controversy - Category:CurrentEvents but not Category:Disputed
The intersections can be formatted as tables, or as lists, or as columns, or as galleries, etc.
This way our landing pages are just listing images using categories as tags. Seems like it's the same idea as what Mangus has started developing, except for the fact that DPL has been in development for years now and is in some use on various WM sites :-) For example, Wikinews uses an older version of the DPL to allow articles to appear on the right pages without anyone linking to them explicitly (e.g. "last 10 articles about science and technology in Africa that are not disputed and have been marked as published")
-ilya
On Fri, Mar 7, 2008 at 11:59 AM, David Gerard dgerard@gmail.com wrote:
On 07/03/2008, Ilya Haykinson haykinson@gmail.com wrote:
For what it's worth, the extension http://www.mediawiki.org/wiki/DynamicPageList has been in use on various Wikimedia sites for a while now with great success to allow for category intersections, and I think the latest versions support image galleries etc.
Cool - how would that do category intersections, in the manner that's been discussed?
- d.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On 08/03/2008, Ilya Haykinson haykinson@gmail.com wrote:
You can have a DPL that includes:
- Category:Africa and Category:Musicians to show musicians from Africa
-- we can produce a list of images, or a gallery
- Category:Ariel Sharon and Category:Political Cartoons but not
Category:Controversy
- Category:CurrentEvents but not Category:Disputed
The intersections can be formatted as tables, or as lists, or as columns, or as galleries, etc.
This way our landing pages are just listing images using categories as tags. Seems like it's the same idea as what Mangus has started developing, except for the fact that DPL has been in development for years now and is in some use on various WM sites :-) For example, Wikinews uses an older version of the DPL to allow articles to appear on the right pages without anyone linking to them explicitly (e.g. "last 10 articles about science and technology in Africa that are not disputed and have been marked as published")
Yeah, but https://bugzilla.wikimedia.org/show_bug.cgi?id=8261 has been open for 15 months and I'm not holding my breath. Keep working...
cheers, Brianna