Flattening a wikimedia category

List overview All Threads
Download

newer

older

Release the content on Wikimedia...

empty .sql.gz dump files

Rayson Ho

4 Feb 2010 4 Feb '10

3:10 a.m.

Seems like it is no easy way to display all the media files under a wikimedia category -- for example if someone wants a picture of a library, he or she will need to go into each sub-category under "Libraries":

http://commons.wikimedia.org/wiki/Category:Libraries

While Wikimedia is not yet the most popular stock photo source, IMO having this flattening functionality would be useful to those who are looking for stock photos.

Rayson

Show replies by date

Daniel Schwen

4 Feb 4 Feb

3:40 a.m.

...

While Wikimedia is not yet the most popular stock photo source, IMO having this flattening functionality would be useful to those who are looking for stock photos.

Just I love this recurring debate sooo much I drop a two more bits:

* atomic categorization would solve this * category intersection would be useful (imagine a user searching for a picture of a library in asia)

open fire!

Aryeh Gregor

2:27 p.m.

On Wed, Feb 3, 2010 at 10:10 PM, Rayson Ho raysonlogin@gmail.com wrote:

...

Seems like it is no easy way to display all the media files under a wikimedia category -- for example if someone wants a picture of a library, he or she will need to go into each sub-category under "Libraries":

http://commons.wikimedia.org/wiki/Category:Libraries

While Wikimedia is not yet the most popular stock photo source, IMO having this flattening functionality would be useful to those who are looking for stock photos.

This is a regular request. There are two major problems:

1) Our database schema is not set up to handle this efficiently for large result sets. At least I don't think so, off the top of my head.

2) In practice, collapsing categories like this can often lead to crazy stuff being included, because subcategory relations aren't used strictly in a "everything in category A is also in category B" sense. It's easy to come up with examples. For instance: [[Category:Punishments in religion]] -> [[Category:Religion and capital punishment]] -> [[Category:People executed for heresy]] -> [[Category:Joan of Arc]] -> [[English claims to the French throne]]. Thus, if you try to get all articles in [[Category:Punishments in religion]] or subcategories, you'll get results like [[English claims to the French throne]].

However, this is definitely on the long-term "it would be nice if someone did this someday" list.

Gregory Maxwell

3:02 p.m.

On Thu, Feb 4, 2010 at 9:27 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

Our database schema is not set up to handle this efficiently for

large result sets. At least I don't think so, off the top of my head.

I've never been able to come up with an acceptable data-structure for flattening on the fly. (I think acceptable is something like O(1) or O(log something) on insert, delete, and no worse then something like O(results log something) on query).

But if you do atomic categories explicitly enumerated on the pages then you get the right properties, and fast search with intersections is the same problem as full text search. I.e. solved.

...

In practice, collapsing categories like this can often lead to

crazy stuff being included, because subcategory relations aren't used strictly in a "everything in category A is also in category B" sense.

Yea, automatic collapsing is mostly good for hilarious results... manual collapsing OTOH.

Aryeh Gregor

3:55 p.m.

On Thu, Feb 4, 2010 at 10:02 AM, Gregory Maxwell gmaxwell@gmail.com wrote:

...

But if you do atomic categories explicitly enumerated on the pages then you get the right properties, and fast search with intersections is the same problem as full text search. I.e. solved.

Right. Supporting category intersection and search in category with better UI (we already sort of support it if you know the right magic terms) is what we should be aiming for here.

Robert Stojnic

4:03 p.m.

Aryeh Gregor wrote:

...

Right. Supporting category intersection and search in category with better UI (we already sort of support it if you know the right magic terms) is what we should be aiming for here.

Last year, just around this time, we came to the exactly same conclusion. And similarly like then, there is no shortage of good opinions on how to do it, but people to actually do the programming.

Aryeh Gregor

4:10 p.m.

On Thu, Feb 4, 2010 at 11:03 AM, Robert Stojnic rainmansr@gmail.com wrote:

...

Last year, just around this time, we came to the exactly same conclusion. And similarly like then, there is no shortage of good opinions on how to do it, but people to actually do the programming.

Yup. Any volunteers? My understanding is that right now, the backend supports category searches as long as the categories are spelled out literally in the wikitext (not via template). That's not a big restriction, so what we could really use right now is UI, which shouldn't require such specialized skills.

So, does anyone want to:

1) Mock up basic UI for category intersections/search in category?

2) Implement it?

After that we can talk about fancy things like automatically suggesting categories to intersect with or whatever . . . we don't even have the most basic UI right now.

Conrad Irwin

4:28 p.m.

On 02/04/2010 04:10 PM, Aryeh Gregor wrote:

...

Yup. Any volunteers? My understanding is that right now, the backend supports category searches as long as the categories are spelled out literally in the wikitext (not via template).

Presumably it would not be too hard to append the full category list to the blob that gets sent to the search engine, (perhaps as part of fixing: https://bugzilla.wikimedia.org/show_bug.cgi?id=18861 -nudge-nudge)

Whether this is a big restriction or not depends a lot on your wiki, I estimate that 90% or more of categories on en.wiktionary are added by templates (but then so's most of our output anyway).

Conrad

Aryeh Gregor

4:51 p.m.

On Thu, Feb 4, 2010 at 11:28 AM, Conrad Irwin conrad.irwin@googlemail.com wrote:

...

Presumably it would not be too hard to append the full category list to the blob that gets sent to the search engine

No, probably not, but it would be even easier to not worry about it yet (unless someone wants to!).

On Thu, Feb 4, 2010 at 11:37 AM, Daniel Schwen lists@schwen.de wrote:

...

Yet category flattening would be a prerequisite to intersections. The only way to get proper intersection is manual flattening i.e. atomic categorization.

Correct. Automatic flattening is not good enough -- manual flattening is necessary. Maybe if we had a better category intersect feature, more wikis would do manual flattening. If they don't, I guess they won't get the feature. Automatic flattening is not a substitute.

Neil Harris

4:23 p.m.

On 04/02/10 16:03, Robert Stojnic wrote:

...

Aryeh Gregor wrote:

...
Right. Supporting category intersection and search in category with better UI (we already sort of support it if you know the right magic terms) is what we should be aiming for here.

Last year, just around this time, we came to the exactly same conclusion. And similarly like then, there is no shortage of good opinions on how to do it, but people to actually do the programming.

r.

I'm working on it.

-- Neil

Daniel Schwen

4:37 p.m.

This is putting the cart in front of the ox yet again. A few mails up Aryeh and Gregory both come to the conclusion that automatic flattening is useless. Yet category flattening would be a prerequisite to intersections. The only way to get proper intersection is manual flattening i.e. atomic categorization. As long as nobody is pushing commons _hard_ to change their categorization system _nothing_ will happen and we'll meet on this list again in about one year repeating the same discussion.

David Gerard

5:18 p.m.

On 4 February 2010 16:37, Daniel Schwen lists@schwen.de wrote:

...

The only way to get proper intersection is manual flattening i.e. atomic categorization. As long as nobody is pushing commons _hard_ to change their categorization system _nothing_ will happen and we'll meet on this list again in about one year repeating the same discussion.

Commons really wants this. LOTS AND LOTS.

But we need the functionality there first, so we can *then* flatten.

- d.

Daniel Schwen

5:38 p.m.

...

But we need the functionality there first, so we can *then* flatten.

Ahh, the good old chicken and egg ;-) I don't let that count. We have plenty of working category intersection tools already. Their usefulness is limited however because the category system is so screwed up. The ball is definitely in the categorization-court!

David Gerard

5:44 p.m.

On 4 February 2010 17:38, Daniel Schwen lists@schwen.de wrote:

...

...
But we need the functionality there first, so we can *then* flatten.

...

Ahh, the good old chicken and egg ;-) I don't let that count. We have plenty of working category intersection tools already.

Yes, but they're not part of the interface.

The technology needs to work with the data - the six million files and their categories, carefully added by hand by humans.

If category intersections worked, they could then be broken down to work better with category intersections.

Demanding that all six million files be de-categorised before you'll even allow a category intersection tool to *possibly* be deployed is backward.

People need to be able to go gradually.

- d.

Daniel Kinzler

5:02 p.m.

Robert Stojnic schrieb:

...

Aryeh Gregor wrote:

...
Right. Supporting category intersection and search in category with better UI (we already sort of support it if you know the right magic terms) is what we should be aiming for here.

Last year, just around this time, we came to the exactly same conclusion. And similarly like then, there is no shortage of good opinions on how to do it, but people to actually do the programming.

r.

Wikimedia Germany has contracted Neil Harris to work on implementing deep category intersection. The goal is basically a rewrite of my sucky CatScan tool. The result is hopefully fast & generic enough so it can be used as a service that integrates with the current search infrastructure.

The project has started, there is funding and a project plan. I expect to see usable results soon. In fact, I hope to present this at the developer meeting in april (neil, contact me about attending) and discuss the integration into lucene search.

I agree that full recursive flattening of the current category structure leads to bad results some times (especially on the english wikipedia, commons is quite bad too), a depth of 5 however is generally useful. One common use case is intersecting a content category with a maintenance category, for organizing editorial work in a wiki project. In that case, at least one category comes from a template.

Atomic categorization aka tagging however also sucks: the tags are either too generic (so it's hard to find stuff) or too specific (you never know what to search for). tags implying/including other tags is very useful. which is exactly what categories with deep intersection will provide.

-- daniel

Magnus Manske

8:38 p.m.

On Thu, Feb 4, 2010 at 5:02 PM, Daniel Kinzler daniel@brightbyte.de wrote:

...

Robert Stojnic schrieb:

...
Aryeh Gregor wrote:

...
Right. Supporting category intersection and search in category with better UI (we already sort of support it if you know the right magic terms) is what we should be aiming for here.

Last year, just around this time, we came to the exactly same conclusion. And similarly like then, there is no shortage of good opinions on how to do it, but people to actually do the programming.

r.

Wikimedia Germany has contracted Neil Harris to work on implementing deep category intersection. The goal is basically a rewrite of my sucky CatScan tool.

In the meantime: http://toolserver.org/~magnus/catscan_rewrite.php

(toolserver seems to have a problem ATM, though...)

Magnus

Daniel Kinzler

8:46 p.m.

Magnus Manske schrieb:

...

In the meantime: http://toolserver.org/~magnus/catscan_rewrite.php

(toolserver seems to have a problem ATM, though...)

Yes, lots more options than my old thingy, thanks magnus :) but still bound to recursive calls to the database, which is what i really want to get rid of. the lookup needs to be snappy.

-- daniel

Tim Landscheidt

11:40 p.m.

Daniel Kinzler daniel@brightbyte.de wrote:

...

...
In the meantime: http://toolserver.org/~magnus/catscan_rewrite.php

...

...
(toolserver seems to have a problem ATM, though...)

...

Yes, lots more options than my old thingy, thanks magnus :) but still bound to recursive calls to the database, which is what i really want to get rid of. the lookup needs to be snappy.

Is there any reason not to have a flatted structure some- where on the toolserver (or, in the long run, in MediaWiki)? A quick look at recentchanges for dewp shows about 22000 changes per month, about one every two minutes. With about 80000 categories in all, it should be feasible to up- date the structure incrementally, with daily/weekly/monthly clean new full "dumps" (or even dispense with up-to-the-se- cond data and just dump the flat structure hourly).

Tim

Gregory Maxwell

5 Feb 5 Feb

12:19 a.m.

On Thu, Feb 4, 2010 at 6:40 PM, Tim Landscheidt tim@tim-landscheidt.de wrote:

...

Is there any reason not to have a flatted structure some- where on the toolserver (or, in the long run, in MediaWiki)? A quick look at recentchanges for dewp shows about 22000 changes per month, about one every two minutes. With about 80000 categories in all, it should be feasible to up- date the structure incrementally, with daily/weekly/monthly clean new full "dumps" (or even dispense with up-to-the-se- cond data and just dump the flat structure hourly).

Incremental updates for a 'flattened copy' aren't especially realistic... as one user operation can produce millions of operations on the server.

I won't bother saying much more, Daniel Schwen pretty much speaks for my view.

Daniel Kinzler

8:57 a.m.

Tim Landscheidt schrieb:

...

Daniel Kinzler daniel@brightbyte.de wrote:

...
...
In the meantime: http://toolserver.org/~magnus/catscan_rewrite.php

...
...
(toolserver seems to have a problem ATM, though...)

...
Yes, lots more options than my old thingy, thanks magnus :) but still bound to recursive calls to the database, which is what i really want to get rid of. the lookup needs to be snappy.

Is there any reason not to have a flatted structure some- where on the toolserver (or, in the long run, in MediaWiki)? A quick look at recentchanges for dewp shows about 22000 changes per month, about one every two minutes. With about 80000 categories in all, it should be feasible to up- date the structure incrementally, with daily/weekly/monthly clean new full "dumps" (or even dispense with up-to-the-se- cond data and just dump the flat structure hourly).

Basically: yes, this is the idea, but detecting categorization changes isn't trivial. also, really keeping a copy of the flat content of each category would be redundant to the extreme. it would result in hundreds of millions of entries, and would be hard to handle. a data structure for fast recursive lookup makes more sense. Neil is working on this.

As to the general approach: I hope that by providing a way to intersect categories, we can get rid of most of the "Foo in Bar" cross-section catgories. I still believe hierarchical structuring/inclusion of categories is useful. Or, to put it differently: let people use "flat tagging", but let's keep the notion of one tag implying another, i.e. math implying science and texas implying america.

-- daniel

Aryeh Gregor

7:17 p.m.

On Fri, Feb 5, 2010 at 3:57 AM, Daniel Kinzler daniel@brightbyte.de wrote:

...

Or, to put it differently: let people use "flat tagging", but let's keep the notion of one tag implying another, i.e. math implying science and texas implying america.

And as for [[Category:People executed for heresy]] -> [[Category:Joan of Arc]] -> [[English claims to the French throne]]? That's only two steps, and it already doesn't make sense. You could argue that [[Category:Joan of Arc]] really means [[Category:Stuff related to Joan of Arc]] and shouldn't be in [[Category:People executed for heresy]], but that sounds like it would take as much recategorization work as just using atomic categories -- and much subtler.

Tei

7:44 p.m.

On 5 February 2010 20:17, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

On Fri, Feb 5, 2010 at 3:57 AM, Daniel Kinzler daniel@brightbyte.de wrote:

...
Or, to put it differently: let people use "flat tagging", but let's keep the notion of one tag implying another, i.e. math implying science and texas implying america.

And as for [[Category:People executed for heresy]] -> [[Category:Joan of Arc]] -> [[English claims to the French throne]]? That's only two steps, and it already doesn't make sense. You could argue that [[Category:Joan of Arc]] really means [[Category:Stuff related to Joan of Arc]] and shouldn't be in [[Category:People executed for heresy]], but that sounds like it would take as much recategorization work as just using atomic categories -- and much subtler.

off-topic

all these "->" make me salivate for a good plot graph (http://www.graphviz.org/?)

-- -- ℱin del ℳensaje.

Andrew Garrett

7 Feb 7 Feb

8:45 a.m.

On 6/02/10 6:44 AM, Tei wrote:

...

On 5 February 2010 20:17, Aryeh GregorSimetrical+wikilist@gmail.com wrote:

...
On Fri, Feb 5, 2010 at 3:57 AM, Daniel Kinzlerdaniel@brightbyte.de wrote:

...
Or, to put it differently: let people use "flat tagging", but let's keep the notion of one tag implying another, i.e. math implying science and texas implying america.

And as for [[Category:People executed for heresy]] -> [[Category:Joan of Arc]] -> [[English claims to the French throne]]? That's only two steps, and it already doesn't make sense. You could argue that [[Category:Joan of Arc]] really means [[Category:Stuff related to Joan of Arc]] and shouldn't be in [[Category:People executed for heresy]], but that sounds like it would take as much recategorization work as just using atomic categories -- and much subtler.

off-topic

Not at all, it's entirely reasonable to discuss the problems associated with the current categorisation system, and what methods we'd like to use to improve it.

-- Andrew Garrett agarrett@wikimedia.org http://werdn.us

David Gerard

12:01 p.m.

On 7 February 2010 08:45, Andrew Garrett agarrett@wikimedia.org wrote:

...

Not at all, it's entirely reasonable to discuss the problems associated with the current categorisation system, and what methods we'd like to use to improve it.

The current categorization system is per-wiki-specific. It's done differently in different places. So it's not clear that you won't require 750 different discussions.

To get back to the topic of category intersections on Commons:

Could the developers please outline, point by point, the precise hoops we need to jump through to get category intersections on Commons? New hoops seem to have been introduced during the currently discussion.

Please make an unambiguous list of the hoops Commons will be required to jump through before this feature can happen, so it's actually clear to all and we're all working from the same page, rather than trying to guess what shrubbery you'll be demanding next.

Thanks!

- d.

Aryeh Gregor

12:14 p.m.

On Sun, Feb 7, 2010 at 7:01 AM, David Gerard dgerard@gmail.com wrote:

...

Could the developers please outline, point by point, the precise hoops we need to jump through to get category intersections on Commons? New hoops seem to have been introduced during the currently discussion.

Right now, I'd try just waiting. As Daniel pointed out in this thread, Neil Harris is already being paid to work on it.

Roan Kattouw

1:01 p.m.

2010/2/7 David Gerard dgerard@gmail.com:

...

Could the developers please outline, point by point, the precise hoops we need to jump through to get category intersections on Commons? New hoops seem to have been introduced during the currently discussion.

Please make an unambiguous list of the hoops Commons will be required to jump through before this feature can happen, so it's actually clear to all and we're all working from the same page, rather than trying to guess what shrubbery you'll be demanding next.

Different implementations have different hoops associated with them. As long as there's no concrete implementation, there's no definitive list of these hoops, only vague generic hoops that apply to any kind of category intersection and hypothetical hoops based on hypothetical implementations.

Like Aryeh said, Neil is currently working on a concrete implementation of category intersections. Only when that implementation is complete (or at least close) will it be possible to provide the definitive, specific and unambiguous list of requirements you asked for.

Roan Kattouw (Catrope)

Daniel Schwen

1:09 p.m.

...

...
Please make an unambiguous list of the hoops Commons will be required to jump through before this feature can happen, so it's actually clear

Way to be acerbic...

...

list of these hoops, only vague generic hoops that apply to any kind

[..]

...

Like Aryeh said, Neil is currently working on a concrete implementation of category intersections. Only when that implementation is complete (or at least close) will it be possible to provide the definitive, specific and unambiguous list of requirements you asked for.

Not really. There are two main points 1) category deep tree traversal (flattening / deep indexing) at runtime is technically unfeasable. 2) automatic flattening produces nonsense results

Ok, lets's say Neil found a way to deal with 10. I give you that this is implementation specific. Number 2) however is independent of any implementation. Here you have your "hoop" (to to stick with your pejorative lingo): Get rid of the crazy category system and go atomic. What is vague about this, what part of this is unclear to you?

David Gerard

1:18 p.m.

On 7 February 2010 13:09, Daniel Schwen lists@schwen.de wrote:

...

Ok, lets's say Neil found a way to deal with 10. I give you that this is implementation specific. Number 2) however is independent of any implementation. Here you have your "hoop" (to to stick with your pejorative lingo): Get rid of the crazy category system and go atomic. What is vague about this, what part of this is unclear to you?

The problem is that doing this before the feature that uses it is in place renders categorisation on Commons even more useless. What this will mean is that you will be requiring a direct reduction in the usability of the wiki content before *possibly* implementing a feature.

In practice, the difference between this and saying "No, never" is telling people to do work that you know can't happen.

Please leave commons-l in the cc: this time, thanks.

- d.

Roan Kattouw

1:27 p.m.

2010/2/7 David Gerard dgerard@gmail.com:

...

The problem is that doing this before the feature that uses it is in place renders categorisation on Commons even more useless. What this will mean is that you will be requiring a direct reduction in the usability of the wiki content before *possibly* implementing a feature.

In practice, the difference between this and saying "No, never" is telling people to do work that you know can't happen.

There's no reason why it couldn't be the other way around: an intersection feature could be written and deployed *first*, *then* the category trees on Commons would be gradually migrated to the new system. Issues like nonsense results for automatic flattening could be migitated by disabling features or making them less visible.

...

Please leave commons-l in the cc: this time, thanks.

I did on my earlier reply, and I got a bounce from commons-l-owner saying my message was rejected because I'm not subscribed to commons-l. I'm not going to subscribe to that list, so I left the Cc: commons-l out this time.

Roan Kattouw (Catrope)

David Gerard

1:42 p.m.

On 7 February 2010 13:27, Roan Kattouw roan.kattouw@gmail.com wrote:

...

There's no reason why it couldn't be the other way around: an intersection feature could be written and deployed *first*, *then* the category trees on Commons would be gradually migrated to the new system. Issues like nonsense results for automatic flattening could be migitated by disabling features or making them less visible.

*Precisely*. This is why the new (and it is new) demand to trash the present category tree before *possibly* implementing a category intersection feature is, in practical terms, indistinguishable from sheer contemptuous obstructionism. Daniel may be terribly offended that I dare to be acerbic about his expression of contempt, but I find his expression of contempt rather more offensive.

- d.

Aryeh Gregor

4:31 p.m.

On Sun, Feb 7, 2010 at 8:42 AM, David Gerard dgerard@gmail.com wrote:

...

*Precisely*. This is why the new (and it is new) demand to trash the present category tree before *possibly* implementing a category intersection feature is, in practical terms, indistinguishable from sheer contemptuous obstructionism.

Nobody "demanded" this except possibly Daniel Schwen, who has never even committed anything to SVN outside of WikiMiniAtlas, let alone representing the opinion of The Developers. If you are incapable of distinguishing the personal opinion of a random person on this list from "demands" by "the developers", maybe you should unsubscribe and save everyone the trouble. That way you'll avoid annoying the developers by posting uninformed and obnoxious things like this, and avoid confusing non-developers by forwarding them irrelevant or incomprehensible wikitech-l posts out of context (and this is not the first time you've done that).

On Sun, Feb 7, 2010 at 9:14 AM, Daniel Schwen lists@schwen.de wrote:

...

I never demanded that. Geez. What I want is the commons community pledges support for a change of the categorization system. Putting intersection in the interface before they do is a _waste of time_. I'm asking for them to show the _tiniest_ sign of support. The programmers have already bent over backwards (including me with my own intersection tool)

Since when do we write features only for Commons? Some wikis already have atomic categories -- e.g., http://de.wikipedia.org/wiki/Kategorie:Frau. It would be a useful feature to any number of users regardless of what Commons does or does not do. In fact, it would be useful to Commons too even without atomic categorization, just not as useful as it could be.

On the other hand, it's thoroughly unreasonable to expect any wiki to change how they do things based on technologies that have been talked about for years and may or may not materialize in the foreseeable future. No, stuff on the toolserver that isn't integrated into the interface (and doesn't have a very nice interface itself) doesn't count.

David Gerard

4:50 p.m.

On 7 February 2010 16:31, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

On Sun, Feb 7, 2010 at 8:42 AM, David Gerard dgerard@gmail.com wrote:

...

...
*Precisely*. This is why the new (and it is new) demand to trash the present category tree before *possibly* implementing a category intersection feature is, in practical terms, indistinguishable from sheer contemptuous obstructionism.

...

Nobody "demanded" this except possibly Daniel Schwen, who has never even committed anything to SVN outside of WikiMiniAtlas, let alone representing the opinion of The Developers.

Thank you for clarifying that.

...

If you are incapable of distinguishing the personal opinion of a random person on this list from "demands" by "the developers", maybe you should unsubscribe and save everyone the trouble. That way you'll avoid annoying the developers by posting uninformed and obnoxious things like this,

It would be nice if you'd bothered to make it clear that Daniel's opinion didn't count. Else your tacit acceptance is the only message being sent. Thank you for finally doing s.

...

and avoid confusing non-developers by forwarding them irrelevant or incomprehensible wikitech-l posts out of context (and this is not the first time you've done that).

Of course I'm going to cc the post to the list for the project the matter concerns. It's ridiculous not to when a demand is being made of said project.

Since you're speaking with more authority - what are the actual next steps to make this a happener?

- d.

Aryeh Gregor

5:17 p.m.

On Sun, Feb 7, 2010 at 11:50 AM, David Gerard dgerard@gmail.com wrote:

...

It would be nice if you'd bothered to make it clear that Daniel's opinion didn't count. Else your tacit acceptance is the only message being sent.

*Nothing* said here is a "message being sent" to anyone other than developers and sysadmins. This is not a medium for official dev announcements to the larger world. If something actually needs to be announced to the projects, it will be announced to the projects, such as by global site notices or such. Taking it upon yourself to broadcast what's said here to the larger world is not useful, because what's said here is not targeted to the larger world.

If you were actually a developer or followed MediaWiki development closely, you would understand perfectly well that Daniel Schwen's opinion doesn't count for anything in this particular case. He is not in a position to say "we won't do X", because he can't stop anyone else from doing it (and nor can most developers). CCing it to third parties as though it were an authoritative statement by The Developers is confusing and irresponsible.

...

Of course I'm going to cc the post to the list for the project the matter concerns. It's ridiculous not to when a demand is being made of said project.

No demand *was* being made of the project, not by anyone who had any say. (No offense to Daniel -- if I said the same thing, my opinion wouldn't have any weight either.) If you don't understand that, then you don't know what you're reading here, and you should stop acting like you do.

...

Since you're speaking with more authority - what are the actual next steps to make this a happener?

The same steps as for every other possible feature that anyone would want. Someone needs to implement the feature, commit it or get it committed, and address any objections that arise to avoid having it reverted or disabled. This includes jumping through any hoops that come up along the way -- and *nobody* knows what those are in advance. It looks like Neil Harris will be doing this over the next few months, with any luck, for this particular feature.

Daniel Schwen

5:33 p.m.

...

say. (No offense to Daniel -- if I said the same thing, my opinion

None taken, especially since these "demands" are purely a product of David's fantasy in the first place.

My intent was to point out the convoluted situation in the commons case. Multiple requests have been made for intersection on commons. But the existing category system is unsuitable for any technical solution. It is far from my mind to dictate what the developers should do with their time, and what not. I was offering an opinion. If that is not welcome, then make this a closed/moderated list or just kick me out.

Andrew Garrett

8 Feb 8 Feb

3:16 a.m.

I think this thread is more heated than it needs to be.

Perhaps we could keep the on-list discussion on-topic, and perhaps appropriate disclaimers and more judicious phrasing could be used to prevent people from getting the wrong idea. I also think that clarifying misunderstandings can be done without calling people ignorant (or implying it in the way a response is phrased).

-- Andrew Garrett agarrett@wikimedia.org http://werdn.us

Daniel Schwen

7 Feb 7 Feb

2:14 p.m.

...

In practice, the difference between this and saying "No, never" is telling people to do work that you know can't happen.

Wow, this is rich. We already had this conversation. A reminder:

...

Demanding that all six million files be de-categorised before you'll even allow a category intersection tool to *possibly* be deployed is backward.

I never demanded that. Geez. What I want is the commons community pledges support for a change of the categorization system. Putting intersection in the interface before they do is a _waste of time_. I'm asking for them to show the _tiniest_ sign of support. The programmers have already bent over backwards (including me with my own intersection tool)

Lars Aronsson

10 Feb 10 Feb

12:20 a.m.

Daniel Kinzler wrote:

...

Basically: yes, this is the idea, but detecting categorization changes isn't trivial. also, really keeping a copy of the flat content of each category would be redundant to the extreme. it would result in hundreds of millions of entries,

When something starts to look overly complicated, perhaps we are addressing the wrong problem?

If people really want flat tagging of photos in Wikimedia Commons, what stops anybody from inventing a simple template: {{tags|Sweden|sunset|swans}} ?

The template could display this as a little box with some Javascript edit functionality (like HotCats).

What happens when you click one of the tags? Maybe jump off to some toolserver application that has indexed all the tags in the previous database dump? Or go to a Google search? Googling for "swans sunset site:..." should bring up pictures that were tagged like that.

Does it need to be more complicated than that? If so, exactly what is missing in my overly simplistic approach?

-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se

Andrew Garrett

7:44 a.m.

On 10/02/10 11:20 AM, Lars Aronsson wrote:

...

Does it need to be more complicated than that? If so, exactly what is missing in my overly simplistic approach?

It's fundamentally an ad-hoc "temporary" solution. The toolserver is great for prototyping, but anything integrated into the actual wiki should really be implemented as software. Toolserver tools also have a reputation for poor visual and functional integration with the rest of MediaWiki, poor quality standards, and general bugginess and ugliness.

To be fair, it's easier to write a toolserver script than a MediaWiki extension (because 99% of developers are lazy, and nobody has to check your code for XSS vectors). It also goes up immediately, as compared to writing a MediaWiki extension.

However, if you want to do it properly, I think it's worth the effort.

-- Andrew Garrett agarrett@wikimedia.org http://werdn.us

5435

Age (days ago)

5441

Last active (days ago)

wikitech-l@lists.wikimedia.org

37 comments

15 participants

tags (0)

participants (15)

Andrew Garrett
Aryeh Gregor
Conrad Irwin
Daniel Kinzler
Daniel Schwen
David Gerard
Gregory Maxwell
Lars Aronsson
Magnus Manske
Neil Harris
Rayson Ho
Roan Kattouw
Robert Stojnic
Tei
Tim Landscheidt