Today I attended yet another workshop where a brave volunteer tried to explain the usefulness of Wikimedia Commons to teachers/scientists. During the following exercise, a very computer literate senior geneticist tried to access wikimedia.org and landed on the Wikimedia Foundation website. After this mistake was corrected, she soon found category:genetics, but from there couldn't find anything about DNA. Category:DNA exists, but is hidden some levels down. The immediate subcategories to Genetics are not the obvious ones to a geneticist.
People who are very excited and want to learn, constantly run into these stupid mistakes.
Can we please get rid of the name Wikimedia? The M-and-P confusion is among the very worst. Call it "Wikipedia Foundation". Rename Commons to be "Wikipedia Pictures". These two simple changes would save sooooooooooo much time.
Yes, I know Wikimedia is more than just Wikipedia, that it also covers Wikisource, Wikibooks and all the other side projects. I also know that Commons is more than just pictures. I've been with Wikipedia since May 2001. But the everyday struggle of having to explain M-and-P is taking all the fun out of it. Is it really worth that?
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard. Images that appear directly in top categories (such as Category:Maps of Europe) are a very random mix, and not the most useful generic maps of Europe.
The "next 200" navigation is a total disaster, that not a single newcomer understands. Anything that is beyond the first 200 (e.g. subcategories that start with M-Z) are not found.
Is there any topic category on Commons that is actively maintained for easy searching, i.e. where subcategories are well defined and where new images are systematically monitored and recategorized with enhanced descriptions? If I could find such an example, perhaps it could provide inspiration for other topics where a specialist with some extra time (or a grant application) could improve the actual usefulness of Commons.
On 28 October 2010 21:21, Lars Aronsson lars@aronsson.se wrote:
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard. Images that appear directly in top categories (such as Category:Maps of Europe) are a very random mix, and not the most useful generic maps of Europe.
The relevant search term is "blank map of europe".
In practice if you are thinking of commons as wikipedia pictures by far the best attack line is
http://en.wikipedia.org/wiki/Europe http://en.wikipedia.org/wiki/File:Blank_map_of_Europe_%28polar_stereographic... http://commons.wikimedia.org/wiki/File:Blank_map_of_Europe_%28polar_stereogr... http://commons.wikimedia.org/wiki/Category:Blank_SVG_maps_of_Europe
The "next 200" navigation is a total disaster, that not a single newcomer understands. Anything that is beyond the first 200 (e.g. subcategories that start with M-Z) are not found.
That can I think be addressed through {{category tree}}
Is there any topic category on Commons that is actively maintained for easy searching, i.e. where subcategories are well defined and where new images are systematically monitored and recategorized with enhanced descriptions?
Not categories. Any attempt to do that gets flooded by bots.
Closest would be http://commons.wikimedia.org/wiki/Flowers http://commons.wikimedia.org/wiki/Felis_silvestris_catus
On Fri, Oct 29, 2010 at 7:21 AM, Lars Aronsson lars@aronsson.se wrote:
Today I attended yet another workshop where a brave volunteer tried to explain the usefulness of Wikimedia Commons to teachers/scientists. During the following exercise, a very computer literate senior geneticist tried to access wikimedia.org and landed on the Wikimedia Foundation website.
commons.wikipedia.org works.
.. Can we please get rid of the name Wikimedia? The M-and-P confusion is among the very worst. Call it "Wikipedia Foundation". Rename Commons to be "Wikipedia Pictures". These two simple changes would save sooooooooooo much time.
Has anyone talked to commons.org ?
we could buy wikicommons.org
-- John Vandenberg
As for the Wiki(p|m)edia thing, I have to say I agree 100%, although I don't know what other complications there might be.
On 10/28/10 1:21 PM, Lars Aronsson wrote:
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard.
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
I've been mumbling about creating a design doc or mockups for my ideas to a few people at the WMF... is anyone else interested in working on this?
What I'm thinking: I believe that we have a unique opportunity to make an amazing media curation system, far better than Flickr or Facebook can even dream of doing, and possibly of interest to scientists and other people that need to manage collections of media.
Our main advantage is the wiki model -- anyone can curate, and changes are easily revertible. At the other big image hosting sites, media are private by default and it's *hard* to build community tools, since everyone's concerned about unauthorized changes.
(Of course, our main *disadvantage* is also MediaWiki, since random templates do not add up to a real indexing system. That's the main thing we need to build).
On Fri, Oct 29, 2010 at 12:11 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
As for the Wiki(p|m)edia thing, I have to say I agree 100%, although I don't know what other complications there might be.
On 10/28/10 1:21 PM, Lars Aronsson wrote:
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard.
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
I agree.
I've been mumbling about creating a design doc or mockups for my ideas to a few people at the WMF... is anyone else interested in working on this?
IMO, the problem is not how it looks, but the utility of the information. If the metadata was more accessible, more people would fill it in. see e.g.
http://strategy.wikimedia.org/wiki/Proposal:Dublin_Core
-- John Vandenberg
Galleries need to be encouraged to higher degree, these galleries work well as a guide but also as a way for potential contributors to identify areas where theres a short fall in available media.
http://commons.wikimedia.org/wiki/Banksia --- which then links to the species categories directly
http://commons.wikimedia.org/wiki/Western_Australia then steps down to; http://commons.wikimedia.org/wiki/Perth,_Western_Australia
What about a bot reading descriptions identifying keywords then adding it to those categories as a way to reduce the reliance on editors to select the categories.
The other thing is the search results try
commons search for canoe http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=c... google images search for canoe http://www.google.com.au/images?hl=en&source=imghp&biw=1680&bih=...
Commons search returns a list most people will see the category link and only 4-6 images initially, compared to google 32 images and 5 alternative search options -- the layout of the results on commons could be alot better. Even simple things like the first return is a link to the category:Canoe the information it gives on the category is "*28 B (1 word) - 04:37, 27 September 2009*" thats not really enticing people to even look there, then click on the link and you hit a soft redirect to "Canoes" another click then returns 196 images, plus 8 subcat with a further 10 subcats and 290+ photos. That initial search should have return the category of Canoes(because thats what is used) and the descrition should be something like "*196 images, +27 subcats containing 472 images*" now that would entice people to explore the category
If you search for the intiial example of Geneticshttp://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=Geneticsyou get first on the list a link to the category its description is "2 KB (19 words) - 11:05, 10 November 2009" it does nothing to indicate the depth of information available, click on the category it lists just 5 subcats but theres 14 more if you page to the next 200 each with multiple subcats including one cat that has 25256 files.
We need a search result screen that coveys the availablility information when a search occurs so that people are able to understand whats actually available. Whie the M/P issue is annoying most people once they are enter past the differences wont encounter it again. Commons could benefit with an address that is uniquely commons but commons function is as a repository to all projects maybe call it "Wikilibrary" to
On 29 October 2010 10:11, John Vandenberg jayvdb@gmail.com wrote:
On Fri, Oct 29, 2010 at 12:11 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
As for the Wiki(p|m)edia thing, I have to say I agree 100%, although I don't know what other complications there might be.
On 10/28/10 1:21 PM, Lars Aronsson wrote:
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard.
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
I agree.
I've been mumbling about creating a design doc or mockups for my ideas to a few people at the WMF... is anyone else interested in working on
this?
IMO, the problem is not how it looks, but the utility of the information. If the metadata was more accessible, more people would fill it in. see e.g.
http://strategy.wikimedia.org/wiki/Proposal:Dublin_Core
-- John Vandenberg
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
oops wrong button hit when new message appeared [?]
On 29 October 2010 13:18, Gnangarra gnangarra@gmail.com wrote:
Galleries need to be encouraged to higher degree, these galleries work well as a guide but also as a way for potential contributors to identify areas where theres a short fall in available media.
http://commons.wikimedia.org/wiki/Banksia --- which then links to the species categories directly
http://commons.wikimedia.org/wiki/Western_Australia then steps down to; http://commons.wikimedia.org/wiki/Perth,_Western_Australia
What about a bot reading descriptions identifying keywords then adding it to those categories as a way to reduce the reliance on editors to select the categories.
The other thing is the search results try
commons search for canoe
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=c... google images search for canoe
http://www.google.com.au/images?hl=en&source=imghp&biw=1680&bih=...
Commons search returns a list most people will see the category link and only 4-6 images initially, compared to google 32 images and 5 alternative search options -- the layout of the results on commons could be alot better. Even simple things like the first return is a link to the category:Canoe the information it gives on the category is "*28 B (1 word) - 04:37, 27 September 2009*" thats not really enticing people to even look there, then click on the link and you hit a soft redirect to "Canoes" another click then returns 196 images, plus 8 subcat with a further 10 subcats and 290+ photos. That initial search should have return the category of Canoes(because thats what is used) and the descrition should be something like "*196 images, +27 subcats containing 472 images*" now that would entice people to explore the category
If you search for the intiial example of Geneticshttp://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=Geneticsyou get first on the list a link to the category its description is "2 KB (19 words) - 11:05, 10 November 2009" it does nothing to indicate the depth of information available, click on the category it lists just 5 subcats but theres 14 more if you page to the next 200 each with multiple subcats including one cat that has 25256 files.
We need a search result screen that coveys the availablility information when a search occurs so that people are able to understand whats actually available. Whie the M/P issue is annoying most people once they are enter past the differences wont encounter it again. Commons could benefit with an address that is uniquely commons but commons function is as a repository to all projects maybe call it "Wikilibrary"
or "Wikirepository" which would give some difference to the Foundation to help stop the M/P confusion
On 29 October 2010 10:11, John Vandenberg jayvdb@gmail.com wrote:
On Fri, Oct 29, 2010 at 12:11 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
As for the Wiki(p|m)edia thing, I have to say I agree 100%, although I don't know what other complications there might be.
On 10/28/10 1:21 PM, Lars Aronsson wrote:
Now, the second part. Finding pictures in Commons is really hard. It seems that categories and textual descriptions are added by the uploader, and rarely modified or enhanced by others. Finding a map of bird migration paths across Europe might be easy, but finding a plain and simple map of Europe is hard.
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
I agree.
I've been mumbling about creating a design doc or mockups for my ideas to a few people at the WMF... is anyone else interested in working on
this?
IMO, the problem is not how it looks, but the utility of the information. If the metadata was more accessible, more people would fill it in. see e.g.
http://strategy.wikimedia.org/wiki/Proposal:Dublin_Core
-- John Vandenberg
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
-- GN. Photo Gallery: http://gnangarra.redbubble.com Gn. Blogg: http://gnangarra.wordpress.com
If the metadata was more accessible, more people would fill it in. see e.g.
http://strategy.wikimedia.org/wiki/Proposal:Dublin_Core
I definitely agree.
A good metadata system could really improve the situation: moreover, a Metadata/Dublin Core extension could be implemented also in Wikisource, solving the same issue in two different projects.
Aubrey Wikimedia Italy Board
On 10/28/2010 6:11 PM, Neil Kandalgaonkar wrote:
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
At any rate, I don't know that they can be fixed without at least being forked. Right now, categories are being used to serve two purposes, tagging and hierarchical organization. By being pulled in two directions, it's impossible to do either all that well, and sometimes the tensions have other undesirable consequences. For example, categorization awkwardness gets in the way of good management of potentially offensive images.
--Michael Snow
Michael Snow wrote: On 10/28/2010 6:11 PM, Neil Kandalgaonkar wrote:
I was just talking about this with some other people at the WMF... I don't fully understand the ramifications of the debate, but it seems obvious to me that categories as implemented are not useful.
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
At any rate, I don't know that they can be fixed without at least being forked. Right now, categories are being used to serve two purposes, tagging and hierarchical organization. By being pulled in two directions, it's impossible to do either all that well, and sometimes the tensions have other undesirable consequences. For example, categorization awkwardness gets in the way of good management of potentially offensive images.
--Michael Snow
I assume Commons will be greater if Categories are localised. We should have the possibility to set by fault English language for categories, and add on it the translations as if they are a template.
That's could be really powerfull, the multilingual search on commons will be improve by this way. Currently a search in French, in Italian, in Portuguese or English don't give the same results because of false friend words or inexisting description in another language than English. Since commons is a multilingual web site, "a way of a good management of potentially offensive images" is to improve the search tool.
Florian Farge aka Otourly Sur lesprojets wikimédiens et l'Association française,sur OxyRadio, OSM, et sur MOVIM Socio di Wikimedia Italia
_______________________________________________ Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On 29 October 2010 02:11, Neil Kandalgaonkar neilk@wikimedia.org wrote:
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
:Let me precis several years' wishlisting on the topic:
* Tagging would be vastly helpful. * Tags would need to be able to be queried with possibly quite complex Boolean queries. * Categories as tags would make a transition feasible from the content side. * A mockup in Postgres worked great, but MySQL was crippled by it. * Thus, mooted workarounds using Lucene. * There's something on the toolserver that does this, but it's not integrated into how Commons works at all unless you know it's there.
(Anything I've missed?)
- d.
Hoi, Yes, tags can be translated too. Thanks, GerardM
On 29 October 2010 09:32, David Gerard dgerard@gmail.com wrote:
On 29 October 2010 02:11, Neil Kandalgaonkar neilk@wikimedia.org wrote:
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
:Let me precis several years' wishlisting on the topic:
- Tagging would be vastly helpful.
- Tags would need to be able to be queried with possibly quite complex
Boolean queries.
- Categories as tags would make a transition feasible from the content
side.
- A mockup in Postgres worked great, but MySQL was crippled by it.
- Thus, mooted workarounds using Lucene.
- There's something on the toolserver that does this, but it's not
integrated into how Commons works at all unless you know it's there.
(Anything I've missed?)
- d.
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Fri, Oct 29, 2010 at 8:32 AM, David Gerard dgerard@gmail.com wrote:
On 29 October 2010 02:11, Neil Kandalgaonkar neilk@wikimedia.org wrote:
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
:Let me precis several years' wishlisting on the topic:
- Tagging would be vastly helpful.
- Tags would need to be able to be queried with possibly quite complex
Boolean queries.
- Categories as tags would make a transition feasible from the content side.
Done: http://commons.wikimedia.org/w/index.php?title=File:Vernomia_altissima_(Comp...
Works with categories prefixed with "TAG:". Check "flower" and "pink", click on "subset". This leads to the toolserver, and there's only one result, but that could be "prettyfied" (dialog on Commons, previews, etc.) Click on the "+" to add a tag without reloading the page, and on the "x" after the tag to remove it, also without reloading. Everything's still a "normal" edit, so the usual revert options apply.
Note that it's still experimental, and showing twice for me for some strange reason when I'm logged in, but so what...
Cheers, Magnus
Op 29 okt 2010, om 10:13 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 8:32 AM, David Gerard dgerard@gmail.com wrote:
On 29 October 2010 02:11, Neil Kandalgaonkar neilk@wikimedia.org wrote:
The debate I see on Commons and elsewhere focuses on trying to fix Categories, but frankly IMO it would be better to migrate them to some other systems entirely.
:Let me precis several years' wishlisting on the topic:
- Tagging would be vastly helpful.
- Tags would need to be able to be queried with possibly quite
complex Boolean queries.
- Categories as tags would make a transition feasible from the
content side.
Done: http://commons.wikimedia.org/w/index.php?title=File:Vernomia_altissima_(Comp...
Works with categories prefixed with "TAG:". Check "flower" and "pink", click on "subset". This leads to the toolserver, and there's only one result, but that could be "prettyfied" (dialog on Commons, previews, etc.) Click on the "+" to add a tag without reloading the page, and on the "x" after the tag to remove it, also without reloading. Everything's still a "normal" edit, so the usual revert options apply.
Note that it's still experimental, and showing twice for me for some strange reason when I'm logged in, but so what...
Cheers, Magnus
I've seen the tagging suggestion come by a lot of times and always found it a great idea. The importance is though, in my opinion, that it should not use the category system (not even temporarily). I asumt it was done now as a proof-of-concept.
I think an extension could be written for MediaWiki (if not existing already I might be able to do it myself) that will add tags to MediaWiki, similar to categories but with a few differences. For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will). In there a syntax would be a syntax representing the tagname in different langauges.
Example (dont pay attention to the exact syntax, it's just an idea)
# Tag:100 <tagname> {{en:flower}} {{nl:bloem}} {{de:Blume}} </tagname>
It would generally be saved as a page with wikitext and a history etc. When saving the tagname translations could either be saved to page_props or to a newly created tag_i18n table containing tag_id, tag_lang and tag_name
When on a file page there'd be a sidebar gadget listing the current tags. (WHERE tag_from=pageid, just like categories) And instead of fetching the pagetitle from the page table entry it would either fetch it from page_props or from tag_i18n with fallback to English.
When adding tags autosuggest would be used like the Search-function has. Searching through the different tagnames where tag_lang is 'en' or where tag_lang is the user's language.
Just an idea, let me know what you think
-- Krinkle
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
Bryan
Op 29 okt 2010, om 14:36 heeft Bryan Tong Minh het volgende geschreven:
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
Bryan
Now that I re-read my mesage I see I didn't use the word Namespace litterly but that's what I meant, yes. A seperate Tag: namespace that contains 'pages' that are either titled by numbers or by a name in the main content language. (in contrary to the interwiki-transclusion thing the title of the Tag- pages are pretty much hidden from everybody)
I just read another discussion about seperating license/author information from page-text. This brought me on another idea.
If Tags are stored seperatedly, there's no need to have any 'wiki- text' at all. Atleast not visible to the end user.
A tag:-page could simply contain a form with the current translations (editable) and a [+]-button to add a translation. roughly shaped: Pagename: [[Tag:Flower]] <form> <table> <tr> <td>English</td> <td><input>flower</input></td> </tr> <tr> <td>Nederlands</td> <td><input>boem</input></td> </tr>> <tr> <td>Deutsch</td> <td><input>Blume</input></td> </tr> </table> <select> ...languages not translated yet ...</select> <input type=submit value='Add translation for this language' /> <input type="submit" value="Save" /> </form>
Saving would extract the info and store it in the tag_i18n table (or page_props if that's more appropiate - anyway)
Since we like keeping history of everything it may be wise to also store this as wikitext for the Tag:Flower page as something like: {{en:flower}} {{nl:bloem}} {{de:Blume}} but that would only be visible in diff-views and history.
-- Krinkle
On Fri, Oct 29, 2010 at 1:36 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
First, my using the category system is a "hack" that works right now, and could be converted into a more mature solution at a later date.
As to "tracking", IMHO it would suffice to just add [[Tag:XYZ]] to the pages; that's rather inelegant from a database POV, but as Bryan points out, it would take care of version tracking etc. All one would need to do is to remove them from the rendering and display them separately, like categories.
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Another approach might be to duplicate the "hidden category" mechanism and impose a special meaning there, e.g. no tags are shown on such a category page (no tagging of tags), and these categories would then be shown as tags on e.g. image pages.
Magnus
On Fri, Oct 29, 2010 at 2:51 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Or force the user to select a language when entering the tag. This means though that we either need a pre-save transform to transform [[Tag:en/flower]] and [[Tag:de/Blume]] to their canonical [[Tag:flower]] and reject tags without language specified, or we need to have tags entered separately from the page text. But then run into the history problem again.
Bryan
Op 29 okt 2010, om 15:01 heeft Bryan Tong Minh het volgende geschreven:
On Fri, Oct 29, 2010 at 2:51 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Or force the user to select a language when entering the tag. This means though that we either need a pre-save transform to transform [[Tag:en/flower]] and [[Tag:de/Blume]] to their canonical [[Tag:flower]] and reject tags without language specified, or we need to have tags entered separately from the page text. But then run into the history problem again.
Ah, now I remember again. The reason I used "100" as example in the first mail was to basically avoid this problem. Since they will be primarily changed from the sidebar (or wherever they will appear), where the i18n applies, the page-text doesn't matter that much. In a diff-view they could be rendered aswell in the sidebar so that there isn't the issue of: " I'm in a diff-view and wonder what this means " (just like categories, page-title and wikitext is rendered below diff- view aswell)
So [[Tag:Blume]] would never appear in wiki-text. Instead [[Tag:Flower]] or [[Tag:100]] (depending on the sytem we decide to use). And in sidebar (and whereever else tag_i18n is consulted) the currect language appears.
- Krinkle
On Fri, Oct 29, 2010 at 2:09 PM, Krinkle krinklemail@gmail.com wrote:
So [[Tag:Blume]] would never appear in wiki-text. Instead [[Tag:Flower]] or [[Tag:100]] (depending on the sytem we decide to use). And in sidebar (and whereever else tag_i18n is consulted) the currect language appears.
[[Tag:Flower]] would have the advantage of being human-readable, for the English-speaking world at least. A dozen [[Tag:100]] things would be ... off-putting to many, IMHO. We could just (in)formally agree that the main tag name should always be English.
Magnus
Op 29 okt 2010, om 15:13 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 2:09 PM, Krinkle krinklemail@gmail.com wrote:
So [[Tag:Blume]] would never appear in wiki-text. Instead [[Tag:Flower]] or [[Tag:100]] (depending on the sytem we decide to use). And in sidebar (and whereever else tag_i18n is consulted) the currect language appears.
[[Tag:Flower]] would have the advantage of being human-readable, for the English-speaking world at least. A dozen [[Tag:100]] things would be ... off-putting to many, IMHO. We could just (in)formally agree that the main tag name should always be English.
Magnus
It does bring the issue of renaming. Although Tags are (hopefully) less common to be renamed then categories or pages. It could happen (*) in which case numbers is easier.
*On the subject, perhaps we don't want that at all. Move could be hidden / disabled on Tag:-pages. It does make sense as, well, they are tags. If you rename it it's a different tag.
-- Krinkle
Op 29 okt 2010, om 14:51 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 1:36 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
As to "tracking", IMHO it would suffice to just add [[Tag:XYZ]] to the pages; that's rather inelegant from a database POV, but as Bryan points out, it would take care of version tracking etc. All one would need to do is to remove them from the rendering and display them separately, like categories.
@Bryan: Ah, excuse me. You were referring to the history of the File- page. I was under the impression it was about the version tracking of the Tags. Yeah, adding to the page would be similar to categories and extracted from wikitext to a taglinks table.
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Though that's certainly not a bad option. I was thinking not to use page-title search when searching through tags. Instead the tag-table itself which contains all the i18n versions. An option on the tag-search page could specify whether the search should cover English( and/or user-language), or all languages.
Reason being that using redirects would practically mean we are supposed to keep all redirects in-sync with the tag-translations, which seems double work.
-- Krinkle
On Fri, Oct 29, 2010 at 3:03 PM, Krinkle krinklemail@gmail.com wrote:
Op 29 okt 2010, om 14:51 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 1:36 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
As to "tracking", IMHO it would suffice to just add [[Tag:XYZ]] to the pages; that's rather inelegant from a database POV, but as Bryan points out, it would take care of version tracking etc. All one would need to do is to remove them from the rendering and display them separately, like categories.
@Bryan: Ah, excuse me. You were referring to the history of the File- page. I was under the impression it was about the version tracking of the Tags. Yeah, adding to the page would be similar to categories and extracted from wikitext to a taglinks table.
No you were right, I was refering to the history in the Tag namespace, but history tracking equally applies to the use of tags on file pages as well.
Bryan
On Fri, Oct 29, 2010 at 2:03 PM, Krinkle krinklemail@gmail.com wrote:
Op 29 okt 2010, om 14:51 heeft Magnus Manske het volgende geschreven:
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Though that's certainly not a bad option. I was thinking not to use page-title search when searching through tags. Instead the tag-table itself which contains all the i18n versions. An option on the tag-search page could specify whether the search should cover English( and/or user-language), or all languages.
Reason being that using redirects would practically mean we are supposed to keep all redirects in-sync with the tag-translations, which seems double work.
A separate database table derived from the wikitext on [[Tag:Flower]] would certainly work as well.
Another idea: Instead of inventing new syntax or pseudo-HTML tags, why not use language links? On [[Tag:Flower]] [[en:Flower]] [[de:Blume]]
That is well-known syntax, and could be nicely parsed for some other applications (On [[de:Blume]] "Search for images on commons relating to Blume" generated automatically, or something)
However, it would preclude synonyms within the same language, as there is only one language:title pair stored AFAIK.
Magnus
Op 29 okt 2010, om 15:10 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 2:03 PM, Krinkle krinklemail@gmail.com wrote:
Op 29 okt 2010, om 14:51 heeft Magnus Manske het volgende geschreven:
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Though that's certainly not a bad option. I was thinking not to use page-title search when searching through tags. Instead the tag-table itself which contains all the i18n versions. An option on the tag-search page could specify whether the search should cover English( and/or user-language), or all languages.
Reason being that using redirects would practically mean we are supposed to keep all redirects in-sync with the tag-translations, which seems double work.
A separate database table derived from the wikitext on [[Tag:Flower]] would certainly work as well.
Another idea: Instead of inventing new syntax or pseudo-HTML tags, why not use language links? On [[Tag:Flower]] [[en:Flower]] [[de:Blume]]
That is well-known syntax, and could be nicely parsed for some other applications (On [[de:Blume]] "Search for images on commons relating to Blume" generated automatically, or something)
However, it would preclude synonyms within the same language, as there is only one language:title pair stored AFAIK.
Magnus
Sure, that works too (I was aiming at the #tag syntax, which I believe is also fairly easy parsed).
Whether [[xx:Words here]] or {{#en:Words here}}. The former might confuse the few users that check the Diff (instead of editing the page with the form) as I asume the Tag:-page would not actually have langlinks.
-- Krinkle
On Fri, Oct 29, 2010 at 3:10 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Another idea: Instead of inventing new syntax or pseudo-HTML tags, why not use language links? On [[Tag:Flower]] [[en:Flower]] [[de:Blume]]
Because you want to avoid using a single syntax for multiple purposes if different contexts. We should not tell our users "[[en:*" is an interlanguage link, except when it occurs on a page that starts with [[Tag:. It is already too damn confusing that [[en: is an interlanguage link and [[w:en: an interwiki link. (Where I should add that interlanguage and interwiki links are generally used in wikispeak as synonyms for the same concept, but where the concept is ambiguous.)
Hoi, Typically I am happy with hacks but this is an exception. The terminology used in the categories are not optimal. The use of plural, the use of latin names for organisms.. It is better for people to add new tags because otherwise they think "it has to be like that". Thank, GerardM
On 29 October 2010 14:51, Magnus Manske magnusmanske@googlemail.com wrote:
On Fri, Oct 29, 2010 at 1:36 PM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Fri, Oct 29, 2010 at 2:25 PM, Krinkle krinklemail@gmail.com wrote:
For one, tags would not be hierarchical and not stored under a name, rather a number (an id if you will).
I would store the tag-i18n definitions in a separate Tag: namespace. Then you don't need to create the history tracking etc. all by yourself. You will need a unique identifier though, but I don't see a problem making the unique identifier equal to the content language.
First, my using the category system is a "hack" that works right now, and could be converted into a more mature solution at a later date.
As to "tracking", IMHO it would suffice to just add [[Tag:XYZ]] to the pages; that's rather inelegant from a database POV, but as Bryan points out, it would take care of version tracking etc. All one would need to do is to remove them from the rendering and display them separately, like categories.
Multilanguage tags, as well as synonyms, could simply be implemented as #REDIRECTs, maybe. [[Tag:Blume]] would redirect to [[Tag:Flower]], so a search for "Blume" would know about "Flower" easily. However, the system would then have to search for both Blume and Flower internally. Also, that could be a problem with "false friends" (en:Gift=present vs. de:Gift=poison).
Another approach might be to duplicate the "hidden category" mechanism and impose a special meaning there, e.g. no tags are shown on such a category page (no tagging of tags), and these categories would then be shown as tags on e.g. image pages.
Magnus
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Fri, Oct 29, 2010 at 2:21 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically I am happy with hacks but this is an exception. The terminology used in the categories are not optimal. The use of plural, the use of latin names for organisms.. It is better for people to add new tags because otherwise they think "it has to be like that".
Not sure I understand. My hack is using the category /system/, not the existing categories. Each tag is a special new category like [[Category:TAG:Flower]].
It's not pretty, and a dedicated extension like Krinkle's is much better, but it works right now, and has the pleasant side effect that it can be queried for both tags and categories simultaneously, if desired (e.g. tag X in category tree Y).
Magnus
Op 29 okt 2010, om 15:29 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 2:21 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically I am happy with hacks but this is an exception. The terminology used in the categories are not optimal. The use of plural, the use of latin names for organisms.. It is better for people to add new tags because otherwise they think "it has to be like that".
Not sure I understand. My hack is using the category /system/, not the existing categories. Each tag is a special new category like [[Category:TAG:Flower]].
It's not pretty, and a dedicated extension like Krinkle's is much better, but it works right now, and has the pleasant side effect that it can be queried for both tags and categories simultaneously, if desired (e.g. tag X in category tree Y).
A very basic UI concept: http://toolserver.org/~krinkle/MWTags/ (not functional)
On Fri, Oct 29, 2010 at 3:31 PM, Krinkle krinklemail@gmail.com wrote:
Op 29 okt 2010, om 15:29 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 2:21 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically I am happy with hacks but this is an exception. The terminology used in the categories are not optimal. The use of plural, the use of latin names for organisms.. It is better for people to add new tags because otherwise they think "it has to be like that".
Not sure I understand. My hack is using the category /system/, not the existing categories. Each tag is a special new category like [[Category:TAG:Flower]].
It's not pretty, and a dedicated extension like Krinkle's is much better, but it works right now, and has the pleasant side effect that it can be queried for both tags and categories simultaneously, if desired (e.g. tag X in category tree Y).
A very basic UI concept: http://toolserver.org/~krinkle/MWTags/ (not functional)
Nice!
Random thoughts: * Since the changes will be very limited (add a line or two at most, or change one), summary could be generated automatically * Likewise, a "minor" distinction wouldn't apply, clearing the interface even more * IMHO tags should be forced case-insensitive, at least during search, maybe even force lowercase in the entry form? * flickr has "machine tags" like "group:key=value". Should we plan for that as well? * Allow for multiple values per language to catch synonyms? * Same for common typos, like on en.wp? * Upon search, show other tags commonly used with pages in the results, for easy refinement
Also, should we somehow encode tag "subgroupings"? I know that sounds like it's getting us back to the category system, but tagging an image only as "mare" will not show it when searching for "horse". We'd have to add lots of tags, which is basically no problem, but hard to do consistently, meaning we'll miss some. Associating "Mare" with "Horse" (no tree structure, more like a "see also") could then * for a search, give a list of "see also" (tag cloud?) * auto-branch into associated tags if the original search fails (a smarter variant of "did you mean...")
Magnus
Very nice indeed!
I'd add to Magnus's thoughts the following:
* By combining tags with categories in search we can achieve "subgrouping" without actually going through tags and doing categories all over again. Then we can use parent and sub-categories for "see also", "did you mean.." or "narrow your search" etc.
* We should certainly allow for multiple values per language. Other than the reasons mentioned by Magnus there's the fact that names of people and places in non-Latin-script languages are sometimes transliterated in several ways, and the opposite is true. We could make one tag only visible (article title on WP?) and hide the other variants (redirects on WP).
* For searches that need disambiguation, we can use something like "clarify your search" used on iStockphoto.
My hopes rest with our valued coders to turn this into reality. It will make life on Commons a whole lot easier.
On Fri, Oct 29, 2010 at 7:06 PM, Magnus Manske magnusmanske@googlemail.comwrote:
On Fri, Oct 29, 2010 at 3:31 PM, Krinkle krinklemail@gmail.com wrote:
Op 29 okt 2010, om 15:29 heeft Magnus Manske het volgende geschreven:
On Fri, Oct 29, 2010 at 2:21 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically I am happy with hacks but this is an exception. The terminology used in the categories are not optimal. The use of plural, the use of latin names for organisms.. It is better for people to add new tags because otherwise they think "it has to be like that".
Not sure I understand. My hack is using the category /system/, not the existing categories. Each tag is a special new category like [[Category:TAG:Flower]].
It's not pretty, and a dedicated extension like Krinkle's is much better, but it works right now, and has the pleasant side effect that it can be queried for both tags and categories simultaneously, if desired (e.g. tag X in category tree Y).
A very basic UI concept: http://toolserver.org/~krinkle/MWTags/ (not functional)
Nice!
Random thoughts:
- Since the changes will be very limited (add a line or two at most,
or change one), summary could be generated automatically
- Likewise, a "minor" distinction wouldn't apply, clearing the
interface even more
- IMHO tags should be forced case-insensitive, at least during search,
maybe even force lowercase in the entry form?
- flickr has "machine tags" like "group:key=value". Should we plan for
that as well?
- Allow for multiple values per language to catch synonyms?
- Same for common typos, like on en.wp?
- Upon search, show other tags commonly used with pages in the
results, for easy refinement
Also, should we somehow encode tag "subgroupings"? I know that sounds like it's getting us back to the category system, but tagging an image only as "mare" will not show it when searching for "horse". We'd have to add lots of tags, which is basically no problem, but hard to do consistently, meaning we'll miss some. Associating "Mare" with "Horse" (no tree structure, more like a "see also") could then
- for a search, give a list of "see also" (tag cloud?)
- auto-branch into associated tags if the original search fails (a
smarter variant of "did you mean...")
Magnus
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
On Fri, Oct 29, 2010 at 3:11 AM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
(Of course, our main *disadvantage* is also MediaWiki, since random templates do not add up to a real indexing system. That's the main thing we need to build).
Related to that: https://bugzilla.wikimedia.org/show_bug.cgi?id=25624 Store author and license information in the database. If anybody has comments about how to implement this, please comment on the bug.
Bryan