It's not so much "topics", as I wrote in the email subject, but more like "types", as I wrote in the email body. Sorry about the confusion.

We are getting serious about analyzing how do people translate articles. Basically, all articles are worth translating, but we may find, for example, that Wikipedia has 60% biographies, 30% articles about places and 10% articles about math, but of the translated articles, 80% are about places, and biographies and and math are 10% each. So if this will be the case, we may want to understand why don't people translate articles about biographies and math more - are they simply less interesting? is it harder for some social reason? for some technical reason? If there is something that we can do to make translation easier, we may want to do it.

This example is, of course, highly simplified and the numbers are completely made up, but I hope that it explains the intention.

Now when I say "biographies", "articles about places" and "articles about math", it's immediately clear and intuitive to a person what I'm talking about. I am asking whether there is a known easy way for software to understand such things.


--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬


2014-03-18 14:34 GMT+02:00 Ziko van Dijk <zvandijk@gmail.com>:
Hello Amir,
The question rising would be for me: what do you use the
classification for? Depending on that you can get a lot different
answers. The biography of Otto von Bismarck may be in the category
"history", the biography of Justin Bieber in "entertainment".
Kind regards
Ziko

2014-03-18 8:31 GMT+01:00 Maik Anderka <maik.anderka@uni-paderborn.de>:
> Dear Amir,
>
> two years ago, we have utilized Wikipedia categories to analyze the
> distribution of articles over a set of main topics. We used the 24 direct
> subcategories of "Category:Main topic classifications" as main topics. For
> further information, see Section 4.2 in this paper:
> http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012d.pdf
>
> Best regards,
> Maik
>
> --
> Maik Anderka
> Research Group "Knowledge-Based Systems"
> Department of Computer Science
> University of Paderborn, Germany
> http://www.uni-paderborn.de/cs/ag-klbue
>
>
> Am 17.03.2014 16:21, schrieb Amir E. Aharoni:
>
> Hallo,
>
> Is there any known easy way to classify Wikipedia articles into a relatively
> small number of types?
>
> By "relatively small" I mean no more than twenty, and by "types" I mean
> things that are intuitively clear to readers, for example:
> * Biographies
> * Articles about scientific phenomena (can be sub-grouped to math,
> astronomy, physics, geology, medicine)
> * Articles about works of art (paintings, movies, books, records, statues)
> * Articles about places
> * Articles about historical events
> * Articles about biological species
> * Articles that mostly present data, such as demography or results of
> competitions (sports, elections, game shows)
>
> There are a few more, but not much. I hope that you get the idea.
>
> We have categories, but I'm not sure that it's easy to use categories for
> such things because of the very loose category structure. For example,
> [[Eurovision 2007]] is somewhere under [[Category:Humans]], even though it's
> not an article about a human.
>
> Such information can be useful for study about the types of articles that
> different people write. In particular, I thought about it in the context of
> analyzing the types of articles that people are translating now (manually)
> and will translate in the future using the ContentTranslation, which is in
> its early stages of development.
>
> Thanks,
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l