My bot at
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
has been struggling with both under- and overcategorization, as it
attempts to categorize its uploads based on the metadata provided by
the journals. There is little we can do against undercategorization at
upload, but I would like to use CategorizationBot 's approach of
inferring categories from pages using the file, and if there were a
usable filter against overcategorization that took into account
existing Commons categories, I would certainly give it a try.
Another challenge with the bot was that thousands of the categories it
had proposed did not exist on Commons yet, and separating the ones
that were compatible with Commons category structures from those that
were not was not trivial - in the end, I created (and categorized)
most of the necessary categories manually. You can see some remnants
of that in the subcategories of
https://commons.wikimedia.org/wiki/Category:Uploaded_with_Open_Access_Media…
.
Any thoughts about that would be appreciated.
Daniel
--
http://www.naturkundemuseum-berlin.de/en/institution/mitarbeiter/mietchen-d…
https://en.wikipedia.org/wiki/User:Daniel_Mietchen/Publications
http://okfn.org
http://wikimedia.org
On Sun, Dec 15, 2013 at 12:30 PM, Maarten Dammers <maarten(a)mdammers.nl> wrote:
Hi everyone,
I've been playing around with the structure of category graph for quite some
time for
https://commons.wikimedia.org/wiki/User:CategorizationBot . This
bot takes an uncategorized image, looks where it is used and tries to find
relevant categories at Commons. It applies some filters and one of them is
the filter against over categorization
(
https://commons.wikimedia.org/wiki/Commons:OVERCAT). For this I create a
simple child->parent table that used to live on the Toolserver, but now
appears to have vanished.
I would love to have some sort of dump or (even better) a central service I
can query. It should contain for all Wikimedia projects:
* Page links (page A links to page B)
* Category links (page A is in category C)
* Image links (page A uses image I)
* Interlanguage links (page A in language en links to page A' in language
nl)
* Interproject links (page A in the English Wikipedia links to page A' on
Wikimedia Commons)
And to make it really complete:
* Wikidata claims (item A has a claim pointing to item B)
Maarten
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics