Our categories are broken, very broken

List overview All Threads
Download

newer

older

geograph.org.uk

unprocessed deletion requests

Delphine Ménard

18 Sep 2008 18 Sep '08

6:07 p.m.

I need a road sign. I figure there are a bunch of road signs on Commons, so since I am in Wikipedia, I go to the French page for a sign, hoping to swiftly follow the categories to get the the road sign I need.

That's my starting point: http://fr.wikipedia.org/wiki/Circulation_en_sens_unique

I dare you to give me sensible a way to get here: http://commons.wikimedia.org/wiki/Image:C6.svg

...if you don't speak English.

(in the end, it so happens that the sign I was looking for seems not to exist in SVG, I'll make it, using... a google image I found in 2 seconds).

How did we get into this mess?

Is there a way to start changing this and making something that makes sense?

Or is it simply hopeless?

It's actually the second time in a few days that I have that problem. Sad. :(

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

Show replies by date

Daniel Kinzler

18 Sep 18 Sep

6:19 p.m.

Delphine Ménard schrieb:

...

I dare you to give me sensible a way to get here: http://commons.wikimedia.org/wiki/Image:C6.svg

...if you don't speak English.

[...]

...

How did we get into this mess?

Is there a way to start changing this and making something that makes sense?

Or is it simply hopeless?

The category structure is english only. There's no way around this without changing the software. There have been some experiments with that recently: http://omegawiki.blogspot.com/2008/09/commons-but-now-multi-lingual.html.

Now, even if you *do* speak English, our category structure is far from perfect. That *is* something we can work on right now.

-- daniel

Alexandre NOUVEL

6:20 p.m.

Hi list, bonjour Delphine,

---Selon Delphine Ménard notafishz@gmail.com:

...

I need a road sign. I figure there are a bunch of road signs on Commons, so since I am in Wikipedia, I go to the French page for a sign, hoping to swiftly follow the categories to get the the road sign I need.

That's my starting point: http://fr.wikipedia.org/wiki/Circulation_en_sens_unique

I dare you to give me sensible a way to get here: http://commons.wikimedia.org/wiki/Image:C6.svg

...if you don't speak English.

I did this: - went to http://commons.wikimedia.org (I use Commons French interface) or either clicked on any image in the article and then followed the link to the image description page on Commons - typed in "sens unique" in the Commons search box - this led me to http://commons.wikimedia.org/wiki/Special:Search?search=sens+unique&go=C...

The first item of the list seems to be the good one :)

Best regards from France,

-- Alexandre.NOUVEL@alnoprods.net |-> http://alnoprods.net |-> L'encyclopédie libre et gratuite : http://fr.wikipedia.org \ I hate spam. I kill spammers. Non mais.

Daniel Kinzler

6:51 p.m.

...

I did this:

went to http://commons.wikimedia.org (I use Commons French interface)

or either clicked on any image in the article and then followed the link to the image description page on Commons

typed in "sens unique" in the Commons search box

this led me to

http://commons.wikimedia.org/wiki/Special:Search?search=sens+unique&go=C...

The first item of the list seems to be the good one :)

Hm... that would be http://commons.wikimedia.org/wiki/Image:Sens_unique.JPG - niced find, but the sade fact is: that's an orpahn! no categorization, no gallery :( we have way too many of those.

also, hoping for an image description in fresh to be picked up by the full text search is quite a gamble. compare http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=%22one+way%22+&ns0=1&ns4=1&ns6=1&ns14=1&fulltext=Suchen.

I would like so much to have a language-neutral search on commons. And I know how to build it, too. Also, the OmegaWiki people are working on something like this. Let's hope for the best.

-- daniel

David Gerard

7:33 p.m.

2008/9/18 Daniel Kinzler daniel@brightbyte.de:

...

I would like so much to have a language-neutral search on commons. And I know how to build it, too. Also, the OmegaWiki people are working on something like this. Let's hope for the best.

Categories that work like tags are in progress, don't know about expected time of arrival.

Also, being able to make tags/cats in different languages refer to the same thing. So [[Category:Dog]] and [[Category:Chien]] do the same thing. Redirects to cats don't quite work as we'd want them to.

Commons regulars have begged for this for a fair while now, and there are developers working on this :-)

- d.

Joe Szilagyi

19 Sep 19 Sep

1:47 a.m.

On Thu, Sep 18, 2008 at 4:33 AM, David Gerard dgerard@gmail.com wrote:

...

Also, being able to make tags/cats in different languages refer to the same thing. So [[Category:Dog]] and [[Category:Chien]] do the same thing. Redirects to cats don't quite work as we'd want them to.

Commons regulars have begged for this for a fair while now, and there are developers working on this :-)

This will be killer. Which bug was it?

Easy/trivial renaming of cats would be wonderful as well.

- Joe

Joe Szilagyi

1:49 a.m.

Rereading all of this, would the "best case" scenario for fixing Commons categories be...

1. Easy redirection/multiple names for a given category 2. Easy/trivial renaming of categories 3. Sit us all down on a new project to rebuild the trees from the top down, doing it right?

- Joe

Delphine Ménard

11:25 p.m.

On Thu, Sep 18, 2008 at 19:49, Joe Szilagyi szilagyi@gmail.com wrote:

...

Sit us all down on a new project to rebuild the trees from the top down,

doing it right?

Count me in for that one.

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

David Gerard

20 Sep 20 Sep

7:19 a.m.

2008/9/19 Delphine Ménard notafishz@gmail.com:

...

On Thu, Sep 18, 2008 at 19:49, Joe Szilagyi szilagyi@gmail.com wrote:

...

...

Sit us all down on a new project to rebuild the trees from the top down,

doing it right?

...

Count me in for that one.

The answer is not to. Tags and boolean intersection of tags.

I think en: and commons prove you can't maintain a tree in wiki format.

- d.

Daniel Schwen

10:58 a.m.

...

...
...

Sit us all down on a new project to rebuild the trees from the top

down, doing it right?

Count me in for that one.

The answer is not to. Tags and boolean intersection of tags.

I think en: and commons prove you can't maintain a tree in wiki format.

Amen to that!

-- [[en:User:Dschwen]] [[de:Benutzer:Dschwen]] [[commons:User:Dschwen]]

Daniel Kinzler

3:02 p.m.

...

I think en: and commons prove you can't maintain a tree in wiki format.

Well, enwiki is by *far* the worst. When I analysed category structures of wikipedias, I usually found an average depth ovf about 10 and a couple of cycles with a circumference of about 20. enwp has hundreds of cycles, many of which are huge. The largest one i found has over 300 entries. That's insane.

So, basically: the fact that enwiki got it wrong proves nothing. Most other 'pedias do way better.

But categories as we have them are far from perfect, and it would of course be nice to be able to do "deep intersection" of categories. But "simple" tags would have two big disadvantages: ambiguity, and lack of a navigational structure. If we can get intersection, but retain the ability to categorize categories, that would be great. However, this means we need "deep intersection", which is something relational databases are notoriously bad at. I don't know how or if this can be implemented efficiently.

-- daniel

David Gerard

19 Sep 19 Sep

3:17 a.m.

2008/9/18 Joe Szilagyi szilagyi@gmail.com:

...

On Thu, Sep 18, 2008 at 4:33 AM, David Gerard dgerard@gmail.com wrote:

...

...
Also, being able to make tags/cats in different languages refer to the same thing. So [[Category:Dog]] and [[Category:Chien]] do the same thing. Redirects to cats don't quite work as we'd want them to. Commons regulars have begged for this for a fair while now, and there are developers working on this :-)

...

This will be killer. Which bug was it?

No idea! %-D

There's been some discussion of it on wikitech-l. I know there are people working on it.

The most likely thing we'll see first is categories working like tags, and the impossibly minute categories being turned into Boolean queries on broader cats. I believe that because MySQL sucks, this is actually being implemented with Lucene search. o_0

...

Easy/trivial renaming of cats would be wonderful as well.

This, like the above, is a matter of someone writing code robust enough not to cripple the Wikimedia live servers and get it past Brion and Tim :-)

If I were trying to produce a media repository that did all this stuff, I doubt I'd start with MediaWiki ...

- d.

Brianna Laugher

8:22 a.m.

2008/9/19 David Gerard dgerard@gmail.com:

...

If I were trying to produce a media repository that did all this stuff, I doubt I'd start with MediaWiki ...

What would you use? It is worth remembering that most tagging applications handle the problems we face very poorly. e.g. aggregating all synonymous tags - we basically do this by social coercion :) LibraryThing is the only application I know that allows this with tags. Flickr and delicious don't do anything. e.g. hierarchical organisation/tag linking - AFAIK no tag-based thing does anything like this.

Which is definitely not to say that MediaWiki is great -- it's not, it's pretty poor, but almost everything else is even poorer. The only difference is, with everything else it's *easier*, and often that seems to be enough.

cheers Brianna

-- They've just been waiting in a mountain for the right moment: http://modernthings.org/

Daniel Schwen

8:44 a.m.

...

coercion :) LibraryThing is the only application I know that allows this with tags. Flickr and delicious don't do anything. e.g. hierarchical organisation/tag linking - AFAIK no tag-based thing does anything like this.

Ok, let me make a provocative statement:

Commons image categories are useless the way the are applied today. The hierarchical organisation does not work. Full stop.

It is currently almost impossible to harvest information from the category tree. Database requests to find common super-categories and perform boolean operations are not feasible, since they run on the order of minutes.

Life would be easy for application developers if categories were used as simple tags. We'd get faster db queries (no more category crawling), making real category intersection possible (or let's say plotting of image subsets on GoogleMaps, i.e.: all pictures except food items).

Life would be simpler for end users. Boolean category operations would provide a natural and powerful way of finding content (think AND, OR, NOT).

Life would be easy for uploaders. No worries if you found the right subcategory. No accidental 'overcategorization' because some category happens to be up the tree of another category.

For hierarchical organisation and browsing we could still use galleries. Just stop the subcategory sprawl and use multiple tags.

...

difference is, with everything else it's *easier*, and often that seems to be enough.

Yes. If it is easier on the user, easier on the application developer, and easier on the uploader, than it is more than just "enough".

-- [[en:User:Dschwen]] [[de:Benutzer:Dschwen]] [[commons:User:Dschwen]]

Maarten Dammers

6 p.m.

Daniel Schwen schreef:

...

It is currently almost impossible to harvest information from the category tree. Database requests to find common super-categories and perform boolean operations are not feasible, since they run on the order of minutes.

Almost impossible? Take a look at http://toolserver.org/~multichill/filtercats.php

Maarten

Brianna Laugher

8:34 p.m.

2008/9/19 Maarten Dammers maarten@mdammers.nl:

...

Daniel Schwen schreef:

...
It is currently almost impossible to harvest information from the category tree. Database requests to find common super-categories and perform boolean operations are not feasible, since they run on the order of minutes.

Almost impossible? Take a look at http://toolserver.org/~multichill/filtercats.php

Geez, could that tool have any *less* context?

What are you supposed to input??

Brianna

-- They've just been waiting in a mountain for the right moment: http://modernthings.org/

Daniel Schwen

8:34 p.m.

On Friday 19 September 2008 05:00:19 am Maarten Dammers wrote:

...

Daniel Schwen schreef:

...
It is currently almost impossible to harvest information from the category tree. Database requests to find common super-categories and perform boolean operations are not feasible, since they run on the order of minutes.

Almost impossible? Take a look at http://toolserver.org/~multichill/filtercats.php

Unfortunately you are comparing Apples to Apple Orchards. And maybe I wasn't making myself clear enough. Finding common super-categories is actually not the biggest problem. That direction branches less then gowing _down_ in the tree.

Anyhow, I have been working with categories before. Check out this tool for the english Wikipedia: http://toolserver.org/~dschwen/intersection/

Let's say you want an intersection of all "Actors" in "Germany". Just intersecting those two categories won't cut it, thanks to the myriads of sub categories. The tool has to deep index several levels of subcategories and include their content in the intersection. Try it! The default depth of 2 levels won't be enough in most cases, and the processing time goes up exponentially with level.

In short, multichills super-category lookup works for one image. To perform a propper category intersection You'd have perform a super-category lookup for _every_ image on commons for each intersection!

Gregory Maxwell

20 Sep 20 Sep

11:35 a.m.

On Fri, Sep 19, 2008 at 8:34 AM, Daniel Schwen lists@schwen.de wrote:

...

Unfortunately you are comparing Apples to Apple Orchards. And maybe I wasn't making myself clear enough. Finding common super-categories is actually not the biggest problem. That direction branches less then gowing _down_ in the tree.

[snip]

I just wanted to step up and point out that Daniel Schwen is correct.

I created an example tool sometime back (perhaps some of you remember it?) which provided instant (i.e. like google search, <10ms computation times) results for arbitrary commons category intersections, differences, and unions.

The system worked by taking a dump of the commons categories and treating them like tags and indexing them with a special inverted index which is very good at those kinds of set operations. Because it was using the existing categories the results were not super-useful.

I had hoped that it would spur interest in adopting or changing to some tag system but instead I mostly got people complaining that it didn't pick up subcategories. I pointed that that (1) picking up subcategories is technically infeasible in this kind of fast indexing scheme (imagine you move a huge second level category into another category, now the database must make millions of expensive updates) and that (2) our current system produces utter nonsense if you flatten the categories (a subject I've posted on separately several times).

Mostly (2) got answers suggesting various heuristics like "only traverse N deep", which also fails but in more subtle ways (punishing deep categorization by 'losing' that content, and still producing nonsense results too just less often and less offensively) but most importantly does not solve (1).

I think we've reached a point where many technical people here have thought about this problem for a long time (years in some cases) and over and over again have concluded that we can not have fast lookup/union/intersection/differencing tools for categories *AND* have the tree auto-expanded in the results. If commons were 1/20th its size and not growing at a good rate, then yes, we could. But since the full expanded category tree will not fit in ram even on my 32gbyte system, it just can not work. Categories as a useful lookup tool for users is highly limited for purely technical reasons if nothing else.

If there is any technical person here who thinks otherwise, please feel free to engage me in a sidebar conversation and I'll either convince you that you're wrong, or I'll gladly eat crow. Otherwise, we should consider further significant technical improvements to the existing category system to be technically non-viable.

Regardless of the non-technical arguments against categories the above should be sufficient reason for us to adopt a tagging system. Just so we can build really good search tools. It could run in parallel with the category system, as the category system isn't completely worthless and it will take a long time to get tags applied to everything.

Daniel Kinzler

3:20 p.m.

...

I had hoped that it would spur interest in adopting or changing to some tag system but instead I mostly got people complaining that it didn't pick up subcategories. I pointed that that (1) picking up subcategories is technically infeasible in this kind of fast indexing scheme (imagine you move a huge second level category into another category, now the database must make millions of expensive updates) and that

Just to complete the list of tools that do this, let me point to http://toolserver.org/~daniel/WikiSense/CatScan.php.

...

(2) our current system produces utter nonsense if you flatten the categories (a subject I've posted on separately several times).

Mostly (2) got answers suggesting various heuristics like "only traverse N deep", which also fails but in more subtle ways (punishing deep categorization by 'losing' that content, and still producing nonsense results too just less often and less offensively) but most importantly does not solve (1).

Indeed, as is evident there. I think the only good way to get around this would be to use a facetted categorization scheme.

...

I think we've reached a point where many technical people here have thought about this problem for a long time (years in some cases) and over and over again have concluded that we can not have fast lookup/union/intersection/differencing tools for categories *AND* have the tree auto-expanded in the results.

True, and I have not come up with a good way either, but perhaps we should look some more into what SMW does - it does support this kind of thing, right? How does it scale? And wasn't Magnus working on some intersection thingy?

...

If commons were 1/20th its size and not growing at a good rate, then yes, we could. But since the full expanded category tree will not fit in ram even on my 32gbyte system, it just can not work. Categories as a useful lookup tool for users is highly limited for purely technical reasons if nothing else.

Actually, keeping in RAM the structure of, say, a million categories, where each is itself in three categories, means three million pairs of IDs, each four byte wide, that's 24MB. Not so terrible. I have actually done that to run a cycle detection on the category graphs, it's quite fast (with java, anyway). But it's not fast enough to handle deep intersection of categories for dozents if not hundreds of requests per second, as would have to be expected for wikipedia.

...

Otherwise, we should consider further significant technical improvements to the existing category system to be technically non-viable.

Hm... how fast is the growth of wikipedia in relation to the rate at which computing power is increasing? My impression is that the latter is coming along faster, so more complex operations become feasible with time.

...

Regardless of the non-technical arguments against categories the above should be sufficient reason for us to adopt a tagging system. Just so we can build really good search tools. It could run in parallel with the category system, as the category system isn't completely worthless and it will take a long time to get tags applied to everything.

I think that would be terrible. We would have two messes in stead of one. Two systems that do kind of the say, but don't interact in a meaningful way. Endless arguments, more of the gallery vs. category stuff. Ugh. No. Let's fine *one* good way.

-- daniel

Gregory Maxwell

3:46 p.m.

On Sat, Sep 20, 2008 at 3:20 AM, Daniel Kinzler daniel@brightbyte.de wrote:

...

True, and I have not come up with a good way either, but perhaps we should look some more into what SMW does - it does support this kind of thing, right? How does it scale? And wasn't Magnus working on some intersection thingy?

People keep waving hands on this. It's time to "put up or shut up", it's counterproductive that our less technical community thinks some technical bullet is forth-coming. At least suggest a data-structure which can provide for reasonably fast worst case updates (i.e. nothing worse than we have with changing templates today, O(N) on direct users) and lightning fast intersections, if you can't or no one else can, we need to just consider it impossible until someone does.

We can get roughly constant time lookup for intersections (well really, linear with returned results, but thats fine) but I am not aware of any way to accommodate this for tree data without making the worst cast *update* hideously expensive like O(N*Avg_width^avg_depth) random seeks.

...

Actually, keeping in RAM the structure of, say, a million categories, where each is itself in three categories, means three million pairs of IDs, each four byte wide, that's 24MB. Not so terrible. I have actually done that to run a cycle detection on the category graphs, it's quite fast (with java, anyway). But it's not fast enough to handle deep intersection of categories for dozents if not hundreds of requests per second, as would have to be expected for wikipedia.

Thats cycle detection. Yes, you can do single point graph expansion cheaply enough, but you can't do graph expansion AND intersection cheaply because that expanding the entire tree, not just some subset. Imagine fully expanding the entire hierarchy breadth first (stopping at previously visited nodes to cut cycles), then storing it all in an inverted index posting list. The result is *enormous*. It's fast enough for lookups, but single update actions would potentially require visiting every node many times, it might work purely in ram, but it won't work on disk.

[snip]

...

Hm... how fast is the growth of wikipedia in relation to the rate at which computing power is increasing? My impression is that the latter is coming along faster, so more complex operations become feasible with time.

Doesn't matter if Wikipedia (thought we were talking about commons) has small constant growth if tree expansion makes updates to the index exponential.

...

...
Regardless of the non-technical arguments against categories the above should be sufficient reason for us to adopt a tagging system. Just so we can build really good search tools. It could run in parallel with the category system, as the category system isn't completely worthless and it will take a long time to get tags applied to everything.

I think that would be terrible. We would have two messes in stead of one. Two systems that do kind of the say, but don't interact in a meaningful way. Endless arguments, more of the gallery vs. category stuff. Ugh. No. Let's fine *one* good way.

"One good way" is a common trap for simple minds. Sometimes there simply isn't one way to rule them all. ;)

The obvious way to wed the systems is to simply place the tags into the appropriate categories. The category system then becomes a manual navigation tool to help people find related tags. It wouldn't stop sucking, but the suckage would be less impacting since it would be abstracted a layer back, and people could work directly with tags.

This would be possible already if we simply treated leaf categories as tags, but people obsessively convert an image with [[Category:Bridges]], [[Category:Stone architecture]], [[Category:Transportation in Scottland]], [[Category:Built in 1702]] into [[Category:Stone bridges in scottland build in the 1700s and one used by a transsexual bungee jumper on a Tuesday]] which no one would every find or think to check. :)

Daniel Kinzler

21 Sep 21 Sep

4:59 a.m.

Gregory Maxwell schrieb:

...

People keep waving hands on this. It's time to "put up or shut up", it's counterproductive that our less technical community thinks some technical bullet is forth-coming. At least suggest a data-structure which can provide for reasonably fast worst case updates (i.e. nothing worse than we have with changing templates today, O(N) on direct users) and lightning fast intersections, if you can't or no one else can, we need to just consider it impossible until someone does.

We can get roughly constant time lookup for intersections (well really, linear with returned results, but thats fine) but I am not aware of any way to accommodate this for tree data without making the worst cast *update* hideously expensive like O(N*Avg_width^avg_depth) random seeks.

I don't think there's a magic bullet, but non the less, when thinking about what is possible and what isn't, we should look at the techniques that have been tried so far, and get to know their strenths and limitations. And while I have thought about the problem quite a bit, I have not yet looked at what the two systems in question actually do.

...

...
Actually, keeping in RAM the structure of, say, a million categories, where each is itself in three categories, means three million pairs of IDs, each four byte wide, that's 24MB. Not so terrible. I have actually done that to run a cycle detection on the category graphs, it's quite fast (with java, anyway). But it's not fast enough to handle deep intersection of categories for dozents if not hundreds of requests per second, as would have to be expected for wikipedia.

Thats cycle detection. Yes, you can do single point graph expansion cheaply enough, but you can't do graph expansion AND intersection cheaply because that expanding the entire tree, not just some subset. Imagine fully expanding the entire hierarchy breadth first (stopping at previously visited nodes to cut cycles), then storing it all in an inverted index posting list. The result is *enormous*. It's fast enough for lookups, but single update actions would potentially require visiting every node many times, it might work purely in ram, but it won't work on disk.

Well, that's what I meant by "it's not fast enough to handle deep intersection of categories". you could just build two sets of pages recursively, and then intersect them - don't use much space, but it's of course exponential, even though relatively fast when don im ram. Pre-computed sets would indeed be huge.

...

...
Hm... how fast is the growth of wikipedia in relation to the rate at which computing power is increasing? My impression is that the latter is coming along faster, so more complex operations become feasible with time.

Doesn't matter if Wikipedia (thought we were talking about commons) has small constant growth if tree expansion makes updates to the index exponential.

I was thinking of mediawiki in general, as applied to wikimedia projects in general. Wikipedia as the largest and fastest growing being the main concern to performance. Anyway, all three -- wikipedia, the expanded tree, and computing power -- grow exponentially, so it's a matter of factors. But you are probably right in that full tree expansion will not be compensated by more cycles.

...

"One good way" is a common trap for simple minds. Sometimes there simply isn't one way to rule them all. ;)

True. On the other hand, a patchword of multiple half-working solutions generally makes things worse. Different tools for different things. But not too many (incompatible) ways to do the same thing.

...

The obvious way to wed the systems is to simply place the tags into the appropriate categories. The category system then becomes a manual navigation tool to help people find related tags. It wouldn't stop sucking, but the suckage would be less impacting since it would be abstracted a layer back, and people could work directly with tags.

This would be possible already if we simply treated leaf categories as tags, but people obsessively convert an image with [[Category:Bridges]], [[Category:Stone architecture]], [[Category:Transportation in Scottland]], [[Category:Built in 1702]] into [[Category:Stone bridges in scottland build in the 1700s and one used by a transsexual bungee jumper on a Tuesday]] which no one would every find or think to check. :)

This is something that could be mended at least to some extend by good policy and education. It works ok on the German Wikipedia (not great, but better than on commons). This goes hand in hand with user interface features. Currently, huge categories are useless except for special tools, so they get split up, often into corss-section-categories. If we had a nice UI for category intersections (and a clean way to link to them, etc), big categories would be OK and could be used like tags.

Changing and enhancing the way categories work or are used is what we need. Adding another system to the mix would be fatal, I think.

-- daniel

Platonides

23 Sep 23 Sep

5:27 a.m.

Daniel Kinzler wrote:

...

I don't think there's a magic bullet, but non the less, when thinking about what is possible and what isn't, we should look at the techniques that have been tried so far, and get to know their strenths and limitations. And while I have thought about the problem quite a bit, I have not yet looked at what the two systems in question actually do.

Good point. There should be some page at mediawiki.org about this explaining the different approaches, its problems, the different experiments and its performance... In a word, summarizing these endless mailing list threads. Even if there is a magic bullet, all of us will be trying to invent circular wheels, when perhaps a myriagon performs better. Having a list would be still hard, but at least there would be a chance of finding something new.

...and it would make a fine manual to be read :)

Florian Straub

5 Oct 5 Oct

1 a.m.

Platonides Platonides@gmail.com wrote on mon, 22 sep 2008 23:27:07 +0200:

...

Daniel Kinzler wrote:

...
I don't think there's a magic bullet, but non the less, when thinking about what is possible and what isn't, we should look at the techniques that have been tried so far, and get to know their strenths and limitations. And while I have thought about the problem quite a bit, I have not yet looked at what the two systems in question actually do.

Good point. There should be some page at mediawiki.org about this explaining the different approaches, its problems, the different experiments and its performance... In a word, summarizing these endless mailing list threads. Even if there is a magic bullet, all of us will be trying to invent circular wheels, when perhaps a myriagon performs better. Having a list would be still hard, but at least there would be a chance of finding something new.

...and it would make a fine manual to be read :)

Which name would such an article have?

Regards,

Flo

Delphine Ménard

19 Sep 19 Sep

11:16 p.m.

On Thu, Sep 18, 2008 at 13:33, David Gerard dgerard@gmail.com wrote:

...

2008/9/18 Daniel Kinzler daniel@brightbyte.de:

...
I would like so much to have a language-neutral search on commons. And I know how to build it, too. Also, the OmegaWiki people are working on something like this. Let's hope for the best.

Categories that work like tags are in progress, don't know about expected time of arrival.

Also, being able to make tags/cats in different languages refer to the same thing. So [[Category:Dog]] and [[Category:Chien]] do the same thing. Redirects to cats don't quite work as we'd want them to.

Thing is (I'm not sure that you're talking about that, but let's see), I uploaded my road sign, and entered "panneau de signalisation" as a category. The software changed the category to "Speed limit signs", which is totally wrong. I know this, but how many French people will read the difference?

redirects can be wrong :(

We need a tool to check the accuracy of language redirects.

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

Delphine Ménard

18 Sep 18 Sep

7:54 p.m.

On Thu, Sep 18, 2008 at 12:51, Daniel Kinzler daniel@brightbyte.de wrote:

...

...
I did this:

went to http://commons.wikimedia.org (I use Commons French interface)

or either clicked on any image in the article and then followed the link to the image description page on Commons

typed in "sens unique" in the Commons search box

this led me to

http://commons.wikimedia.org/wiki/Special:Search?search=sens+unique&go=C...

The first item of the list seems to be the good one :)

Hm... that would be http://commons.wikimedia.org/wiki/Image:Sens_unique.JPG - niced find, but the sade fact is: that's an orpahn! no categorization, no gallery :( we have way too many of those.

also, hoping for an image description in fresh to be picked up by the full text search is quite a gamble. compare http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=%22one+way%22+&ns0=1&ns4=1&ns6=1&ns14=1&fulltext=Suchen.

I would like so much to have a language-neutral search on commons. And I know how to build it, too. Also, the OmegaWiki people are working on something like this. Let's hope for the best.

Although language is a bummer, it was not my main concern, as you pointed out in an earlier mail.

The fact that I can't go straight to "Road signs" from "Road signs in Germany that tell a story" and then down again to "Road signs in France that tell a story" rather than have to go through the whole arborescence up and down again makes no sense to me.

So yes, there is much to do. I think that usability studies would be a good first step. I hope we get some feedback from the general survey in basic usability. We'll see.

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

Brianna Laugher

8:45 p.m.

2008/9/18 Delphine Ménard notafishz@gmail.com:

...

Although language is a bummer, it was not my main concern, as you pointed out in an earlier mail.

The fact that I can't go straight to "Road signs" from "Road signs in Germany that tell a story" and then down again to "Road signs in France that tell a story" rather than have to go through the whole arborescence up and down again makes no sense to me.

Thanks to CategoryTree (the little "+" before category names, on category pages, that lets you expand the tree without reloading the page), navigating down from a good tree-top is much easier than it used to be.

The trick is figuring out the good tree-top to go to as a starting point, which is not easy because it depends on your goal and you also need to be fairly familiar with the category tree and naming conventions to begin with.

So there are two problems, developers designing an intuitive-yet-all-purpose interface, which is a big ask, and then us plebs actually putting it to use, in a way that is generic enough and precise enough according to the unknown user's desires. Another big ask. :)

cheers Brianna

-- They've just been waiting in a mountain for the right moment: http://modernthings.org/

David Gerard

8:51 p.m.

2008/9/18 Brianna Laugher brianna.laugher@gmail.com:

...

The trick is figuring out the good tree-top to go to as a starting point, which is not easy because it depends on your goal and you also need to be fairly familiar with the category tree and naming conventions to begin with.

I've been using search on commons a lot lately (source pics for http://notnews.today.com/ ), and I can assure you that finding stuff sucks even just in English.

"Wikimedia Commons - it's like Getty Images but the search really sucks ..."

- d.

geni

8:56 p.m.

2008/9/18 David Gerard dgerard@gmail.com:

...

2008/9/18 Brianna Laugher brianna.laugher@gmail.com:

...
The trick is figuring out the good tree-top to go to as a starting point, which is not easy because it depends on your goal and you also need to be fairly familiar with the category tree and naming conventions to begin with.

I've been using search on commons a lot lately (source pics for http://notnews.today.com/ ), and I can assure you that finding stuff sucks even just in English.

"Wikimedia Commons - it's like Getty Images but the search really sucks ..."

d.

To a large extent the english wikipedia remains the best commons search engine.

If you were actually looking to build a commons search engine at this time you would certainly use the way the images are used in the various wikis as one of the key driving factors for finding and sorting results.

-- geni

Delphine Ménard

19 Sep 19 Sep

11:18 p.m.

On Thu, Sep 18, 2008 at 14:45, Brianna Laugher brianna.laugher@gmail.com wrote: rborescence up and down again makes no sense to me.

...

Thanks to CategoryTree (the little "+" before category names, on category pages, that lets you expand the tree without reloading the page), navigating down from a good tree-top is much easier than it used to be.

The trick is figuring out the good tree-top to go to as a starting point, which is not easy because it depends on your goal and you also need to be fairly familiar with the category tree and naming conventions to begin with.

Yeah, navigating down isn't so much the problem. It's the navigating up that's tricky.

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

Daniel Kinzler

20 Sep 20 Sep

2:56 p.m.

...

...
The trick is figuring out the good tree-top to go to as a starting point, which is not easy because it depends on your goal and you also need to be fairly familiar with the category tree and naming conventions to begin with.

Yeah, navigating down isn't so much the problem. It's the navigating up that's tricky.

I think having a [+] besides the categories listed at the bottom of the page might help - it would work like the dynamic tree on category pages, but "up", not "down". The code is actually there, the problem one of layout: I have not found a way to implement a flowing line of things that can expand to complex blocks, reliably, for all browsers.

I may look into this som more in the future. So much to do - I have to check priorities with WM Germany :)

-- daniel

Maarten Dammers

18 Sep 18 Sep

9:02 p.m.

Hi People,

Daniel Kinzler schreef:

...

The category structure is english only. There's no way around this without changing the software.

We've been tagging wikipedia categories with the Commonscat template. It's similair to interlanguage links. This makes navigating categories easier; you can use your local wikipedia's category structure and hop over to Commons. Most bigger wiki's have been checked. Statistics are available at http://commons.wikimedia.org/wiki/User:Multichill/Commonscat_stats

Daniel Kinzler schreef:

...

Hm... that would be http://commons.wikimedia.org/wiki/Image:Sens_unique.JPG - niced find, but the sade fact is: that's an orpahn! no categorization, no gallery :( we have way too many of those.

At the moment we have about 280.000 uncategorized images (see http://commons.wikimedia.org/wiki/Category:Media_needing_categories ), 95.000 are in galleries so we have about 185.000 orphans. I'm working on a project (http://commons.wikimedia.org/wiki/User:Multichill/Categories) to get more images in categories. First i tagged most uncategorized images and now we're trying to find categories for these images. Currently people are working on a list at http://commons.wikimedia.org/wiki/User:Multichill/Category_suggestions . This is a list of category suggestions based on uncategorized images which are in galleries. We could use some help. You only have to create/find the right category to put the images in from the gallery and tag it with a template (http://commons.wikimedia.org/wiki/Template:Populate_category). A bot will come along and fill the category.

Maarten

Delphine Ménard

19 Sep 19 Sep

11:13 p.m.

On Thu, Sep 18, 2008 at 12:51, Daniel Kinzler daniel@brightbyte.de wrote:

...

I would like so much to have a language-neutral search on commons. And I know how to build it, too.

Wait, I missed that part.

Should the commons folks get together and ask Wikimedia Deutschland to give this to you as a project? ;-)

Delphine

-- ~notafish NB. This gmail address is used for mailing lists. Your emails will get lost. Ceci n'est pas une endive - http://blog.notanendive.org

5829

Age (days ago)

5845

Last active (days ago)

commons-l@lists.wikimedia.org

31 comments

12 participants

tags (0)

participants (12)

Alexandre NOUVEL
Brianna Laugher
Daniel Kinzler
Daniel Schwen
David Gerard
Delphine Ménard
Florian Straub
geni
Gregory Maxwell
Joe Szilagyi
Maarten Dammers
Platonides