Hi Nick,
Nick Jenkins wrote:
Hi Julien,
I was wondering if there is projects to help *detect* categories and then to help editors by *suggesting* categories ?
I think it's a good idea.
The closest thing I can see is at: http://en.wikipedia.org/wiki/Wikipedia:Auto-categorization (although I'm not 100% clear on what that project was doing with categories, so maybe it's not as related as it sounds, but I thought it best to mention it).
Thank you for the link, it is very interesting. I will try to contact the author of this page.
Once you have the clusters, you need to try labelling them with a category :
- give to the user the role of identifying the category name
- use the words space to find the better words that describe this set
of articles
- ...
Then you can run this algorithm on a category to try to split it in sub categories.
How are you going to apply the categories? E.g. leave a message on the talk page / automatically apply them with a bot / an external web page where people can see possible categories / or something integrated into MediaWiki?
I think a proof of concept on a external page is better to see if results are good enough to be integrated in MediaWiki.
If it's something interactive, you can maybe produce a checkbox list of the top 10 possible categories, have the user tick all the categories that apply. Then for the categories that apply, you could maybe expand those and show the subcategories, and have the user tick the ones that apply, and keep on expanding the subcategories that apply until the user has gotten as specific as they can.
Clustering is really an interactive process, but needs quite a lot of cpu time. I don't think I will produce a interactive result, a least at the beginning.
The second idea is really more simple and easier to implement : When you edit an article, you can suggest categories of linked articles (can be replaced by an other graph-exploration algorithm).
Can it maybe also suggest which stubs to use? I can never remember what the right stub to use is (so I just use the standard "{{stub}}"), but there has got to be a better way than having to remember the full list of stub types.
Yes, you can extend it to stubs with the same kind of processing.
Best Regards. Julien Lemoine